Doing ridge regression

An alternate way to improve prediction quality is to do ridge regression. In lasso, a lot of the features get their coefficients set to zero and, therefore, eliminated from the equation. In ridge, predictors or features are penalized, but never set to zero. How to do it...

Start the Spark shell:

$ spark-shell

Import the statistics and related classes:

scala> import org.apache.spark.ml.linalg.Vectors
scala> import org.apache.spark.ml.regression.LinearRegression

Create the dataset with the value we created earlier:

scala>  val points = spark.createDataFrame(Seq(
    (1d,Vectors.dense(5,3,1,2,1,3,2,2,1)),
    (2d,Vectors.dense(9,8,8,9,7,9,8,7,9))
)).toDF("label","features")

Initialize the linear regression estimator with elastic net param 1 (means ridge or L2 regularization):

scala> val lr = new LinearRegression().setMaxIter(10).setRegParam(.3).setFitIntercept(false).setElasticNetParam(0.0)

Train a model:

scala> val model = lr.fit(points)

Check how many predictors have their coefficients set to zero:

scala> model.coefficients
  org.apache.spark.ml.linalg.Vector = [0.1132933163345012,0.039370733000466666,0.002369276442275222,
  0.01041698759881142,0.004328988574203182,0.026236646722551202,
  0.015282817648377045,0.023597219133656675,0.0011928984792447484]

As you can see, unlike lasso, ridge regression does not assign any predictor coefficients the value zero, but it does assign some very close to zero.

Table of Contents for Doing ridge regression

Create new playlist

Sign In

Sign Up

Table of Contents for
Doing ridge regression