We'll be diving into the basic principles of machine learning and demonstrate the use of these principles via the scikit-learn
basic API.
The scikit-learn
library has an estimator interface. We illustrate it by using a linear regression model. For example, consider the following:
In [3]: from sklearn.linear_model import LinearRegression
The estimator interface is instantiated to create a model, which is a linear regression model in this case:
In [4]: model = LinearRegression(normalize=True) In [6]: print model LinearRegression(copy_X=True, fit_intercept=True, normalize=True)
Here, we specify normalize=True
, indicating that the x-values will be normalized before regression. Hyperparameters (estimator parameters) are passed on as arguments in the model creation. This is an example of creating a model with tunable parameters.
The estimated parameters are obtained from the data when the data is fitted with an estimator. Let us first create some sample training data that is normally distributed about y = x/2
. We first generate our x
and y
values:
In [51]: sample_size=500 x = [] y = [] for i in range(sample_size): newVal = random.normalvariate(100,10) x.append(newVal) y.append(newVal / 2.0 + random.normalvariate(50,5))
sklearn
takes a 2D array of num_samples × num_features
as input, so we convert our x data into a 2D array:
In [67]: X = np.array(x)[:,np.newaxis] X.shape Out[67]: (500, 1)
In this case, we have 500 samples and 1 feature, x. We now train/fit the model and display the slope (coefficient) and the intercept of the regression line, which is the prediction:
In [71]: model.fit(X,y) print "coeff=%s, intercept=%s" % (model.coef_,model.intercept_) coeff=[ 0.47071289], intercept=52.7456611783
This can be visualized as follows:
In [65]: plt.title("Plot of linear regression line and training data") plt.xlabel('x') plt.ylabel('y') plt.scatter(X,y,marker='o', color='green', label='training data'), plt.plot(X,model.predict(X), color='red', label='regression line') plt.legend(loc=2) Out[65]: [<matplotlib.lines.Line2D at 0x7f11b0752350]
To summarize the basic use of estimator interface, follow these steps:
normalize=True
as specified earlier.fit(..)
method on the model defined in the previous step.predict(..)
method on test data in order to make predictions or estimations.predict(X)
method is given unlabeled observations X
and returns predicted labels y
.For extra reference, please see the following: http://bit.ly/1FU7mXj and http://bit.ly/1QqFN2V.