Deploy Sklearn Model with Pipeline and joblib

Posted by Xiaoye's Blog on April 2, 2020

TOC

  1. TOC
  2. Introduction
  3. Training and Deployment with Pipeline and joblib in sklearn
  4. Error & Solution
    1. Error: module __main__ has no attribute

Introduction

This post illustrates how to train and deploy sklearn model with sklearn Pipeline and joblib

Training and Deployment with Pipeline and joblib in sklearn

Training model with pipeline:

from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import MultinomialNB

def process_text(text):
    clean_words = jieba.lcut(str(text))
    return clean_words

pipeline = Pipeline([
    ('bow',CountVectorizer(analyzer=process_text)), # converts strings to integer counts
    ('tfidf',TfidfTransformer()), # converts integer counts to weighted TF-IDF scores
    ('classifier',MultinomialNB()) # train on TF-IDF vectors with Naive Bayes classifier
])
pipeline.fit(...)
pipeline.predict(...)

Saving and Loading model with joblib:

joblib.dump(
    value=pipeline,
    filename=os.path.join(model_dir, model_file))

new_pipeline = joblib.load(
    filename=file_name
)

Error & Solution

Error: module __main__ has no attribute

When deploying the model with joblib, I encountered this error:

Refer to this stackoverflow answer. It’s because of the way I saved the model: Pickle does not dump the actual code classes and functions, only their names. It includes the name of the module each one was defined. If dumpping a class defined in a module you’re running as a script, it will dump the name main as the module name. In this case, function process_text will be saved as main.process_text.

I solved this errors with the following steps:

  1. put process_text, pipeline in a script named model.py
  2. import process_text, pipeline to another script, training the model there