How to create the model file using scikit-learn

This section will explain how we are going to create the linear regression model file using scikit-learn and also convert it into the .mlmodel file that is compatible with Core ML. We are going to use the Boston dataset for the model creation. The following is a simple Python program, which creates a simple linear regression model using scikit-learn using the Boston dataset. Then the Core ML tools convert it into the model file compatible with Core ML. Let's go through the program in detail.

First, we need to import the required packages needed for the program:

# importing required packages
 import numpy as np

The preceding lines import the NumPy package. NumPy is the fundamental package for scientific computing with Python. It contains a powerful N-dimensional array object. This numpy array will be used in this program for storing the dataset, which has 14 dimensions:

import pandas as pd
 from pandas.core import series

The preceding line imports the pandas package, an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Using pandas, we can create a data frame. You can assume a pandas data frame as an Excel spreadsheet in which every sheet has headings and data:

import coremltools
 from coremltools.converters.sklearn import _linear_regression

The preceding lines import the Core ML Tools conversion package for the linear regression model that we have used in this program. Core ML Tools is a Python package for creating, examining, and testing models in the .mlmodel format. In particular, it can be used to do the following:

Convert existing models to the .mlmodel format from popular machine learning tools including Keras, Caffe, scikit-learn, libsvm, and XGBoost
Express models in .mlmodel format through a simple API
Make predictions with an .mlmodel (on select platforms for testing purposes):

from sklearn import datasets, linear_model
 from sklearn.metrics import mean_squared_error, r2_score

The preceding lines import the sklearn packages. Data sets are used to import built-in datasets in the sklearn package. In this program, we are using the Boston housing price dataset that was explained in the previous section. The linear_model package is used to get access to the linear regression function, and the metrics package is used to calculate the testing metrics of our model, such as the mean squared error:

boston = datasets.load_boston()

The preceding line loads the Boston dataset from the sklearn datasets package:

 bos = pd.DataFrame(boston.data)

Now, from the entire dataset, we need to extract the data:


 bos.columns = boston.feature_names

Get the column names, that is, the headings for that data:

bos['price'] = boston.target

Now, let's define the target column that we want to predict. The column defined as the target will be the one that will be predicted:

 x = bos.drop('price', axis=1)

Once we define the target column, we will remove the data from the target column, so that it becomes x:

 y = bos.price

Since we defined price as the target column, y is the price column in the dataset's data:

 X_train,X_test,Y_train,Y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.3,random_state=5)

We then split the data into training and test data as per the 70/30 rule:

 lm = sklearn.linear_model.LinearRegression()

Once we have the training and test data, we can initiate a linear regression object:

 lm.fit(X_train, Y_train)

With the linear regression object that has been initialized, we just have to feed the training and the test data into the regression model:

Y_pred = lm.predict(X_test)

The preceding line predicts the target:

mse = sklearn.metrics.mean_squared_error(Y_test, Y_pred)
print(mse);

The preceding lines will calculate the mean squared error in our fitted model and predicted results.

Because a regression predictive model predicts a quantity, the skill of the model must be reported as an error in those predictions.

There are many ways to estimate the skill of a regression predictive model, but the most common is to calculate the root mean squared error (RMSE).

For example, if a regression predictive model made two predictions, one of 1.5 where the expected value is 1.0 and another of 3.3 and the expected value is 3.0, then the RMSE would be as follows:

1	RMSE = sqrt(average(error^2))
2	RMSE = sqrt(((1.0 - 1.5)^2 + (3.0 - 3.3)^2) / 2)
3	RMSE = sqrt((0.25 + 0.09) / 2)
4	RMSE = sqrt(0.17)
5	RMSE = 0.412

A benefit of RMSE is that the units of the error score are in the same units as the predicted value:

 model = coremltools.converters.sklearn.convert(
     sk_obj=lm,input_features=boston.feature_names,
     output_feature_names='price')

In the preceding line, we are converting the fitted model to the Core ML format. Basically, this is the line where the .mlmodel file is created. And we are also specifying the input and output column names:

 model.save('HousePricer.mlmodel')

In the preceding line, we are saving the model to the disk. This can later be used in our iOS program.

Table of Contents for How&#xA0;to create the model file using scikit-learn

Create new playlist

Sign In

Sign Up

Table of Contents for
How to create the model file using scikit-learn