Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Persisting models with joblib

In this recipe, we're going to show how you can keep your model around for a later usage. For example, you might want to actually use a model to predict the outcome and automatically make a decision.

Getting ready

In this recipe, we will perform the following tasks:

Fit the model that we will persist.
Import joblib and save the model.

How to do it...

To persist models with joblib, the following code can be used:

>>> from sklearn import datasets, tree

>>> X, y = datasets.make_classification()
>>> dt = tree.DecisionTreeClassifier()
>>> dt.fit(X, y)

DecisionTreeClassifier(compute_importances=None, criterion='gini', 
                       max_depth=None, max_features=None, 
                       max_leaf_nodes=None, min_density=None, 
                       min_samples_leaf=1, min_samples_split=2, 
                       random_state=None, splitter='best')

>>> from sklearn.externals import joblib


>>> joblib.dump(dt, "dtree.clf")

['dtree.clf',
 'dtree.clf_01.npy',
 'dtree.clf_02.npy',
 'dtree.clf_03.npy',
 'dtree.clf_04.npy']

How it works...

The preceding code works by saving the state of the object that can be reloaded into a scikit-learn object. It's important to note that the state of model will have varying levels of complexity, given the model type.

For simplicity sake, consider that all we'd need to save is the way to predict the outcome for the given inputs. Well, for regression that would be easy, a little matrix algebra and we're done. However, for models like random forest, where we could have many trees, and those trees could be of various complexity levels, regression is difficult.

There's more...

We can check the size of decision tree versus random forest:

>>> from sklearn import ensemble

>>> rf = ensemble.RandomForestClassifier()
>>> rf.fit(X, y)

RandomForestClassifier(bootstrap=True, compute_importances=None, 
                       criterion='gini', max_depth=None, 
                       max_features='auto', max_leaf_nodes=None, 
                       min_density=None, min_samples_leaf=1, 
                       min_samples_split=2, n_estimators=10, 
                       n_jobs=1, oob_score=False, random_state=None, 
                       verbose=0)

I'm going to omit the output, but in total, there we were 52 files outputted on my machine:

>>> joblib.dump(rf, "rf.clf")
['rf.clf',
 'rf.clf_01.npy',
 'rf.clf_02.npy',
 'rf.clf_03.npy',
 'rf.clf_04.npy',
 'rf.clf_05.npy',
 'rf.clf_06.npy',…]

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Persisting models with joblib

Create new playlist

Sign In

Sign Up

Persisting models with joblib

Getting ready

How to do it...

How it works...

There's more...

Table of Contents for
Persisting models with joblib