How to do it...

  1. Start Spark shell:
        $ spark-shell
  1. Do the necessary imports:
        scala> import org.apache.spark.ml.regression.LinearRegression
scala> import org.apache.spark.ml.evaluation.RegressionEvaluator
scala> import org.apache.spark.ml.tuning.{ParamGridBuilder,
TrainValidationSplit}
  1. Load data as DataFrame:
        scala> val data = 
spark.read.format("libsvm").load
("s3a://sparkcookbook/housingdata/realestate.libsvm")
  1. Split data into training and test sets:
        scala> val Array(training, test) = data.randomSplit
(Array(0.7, 0.3))
  1. Instantiate linear regression:
        scala> val lr = new LinearRegression().setMaxIter(10)
  1. Create a parameter grid:
        scala> val paramGrid = new ParamGridBuilder()
.addGrid(lr.regParam, Array(0.1,0.01))
.addGrid(lr.fitIntercept)
.addGrid(lr.elasticNetParam, Array(0.0, 0.5, 1.0))
.build()
  1. Create a training validation split:
        scala> val trainValidationSplit = new TrainValidationSplit()
.setEstimator(lr)
.setEvaluator(new RegressionEvaluator)
.setEstimatorParamMaps(paramGrid)
.setTrainRatio(0.8)
  1. Train the model:
        scala> val model = trainValidationSplit.fit(training)
  1. Do the predictions on the test dataset:
        scala> val predictions = model.transform(test)
  1. Evaluate the predictions:
        scala> val evaluator = new RegressionEvaluator()
scala> evaluator.evaluate(predictions)
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset