Doing ridge regression

An alternate way to improve prediction quality is to do ridge regression. In lasso, a lot of the features get their coefficients set to zero and, therefore, eliminated from the equation. In ridge, predictors or features are penalized, but never set to zero. How to do it...

  1. Start the Spark shell:
$ spark-shell
  1. Import the statistics and related classes:
scala> import org.apache.spark.ml.linalg.Vectors
scala> import org.apache.spark.ml.regression.LinearRegression
  1. Create the dataset with the value we created earlier:
scala>  val points = spark.createDataFrame(Seq(
(1d,Vectors.dense(5,3,1,2,1,3,2,2,1)),
(2d,Vectors.dense(9,8,8,9,7,9,8,7,9))
)).toDF("label","features")
  1. Initialize the linear regression estimator with elastic net param 1 (means ridge or L2 regularization):
scala> val lr = new LinearRegression().setMaxIter(10).setRegParam(.3).setFitIntercept(false).setElasticNetParam(0.0)
  1. Train a model:
scala> val model = lr.fit(points)
  1. Check how many predictors have their coefficients set to zero:
scala> model.coefficients
org.apache.spark.ml.linalg.Vector = [0.1132933163345012,0.039370733000466666,0.002369276442275222,
0.01041698759881142,0.004328988574203182,0.026236646722551202,
0.015282817648377045,0.023597219133656675,0.0011928984792447484]

As you can see, unlike lasso, ridge regression does not assign any predictor coefficients the value zero, but it does assign some very close to zero.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset