How to do it...

  1. Start the Spark shell:
        $ spark-shell
  1. Perform the required imports:
        scala> import org.apache.spark.ml.classification.{GBTClassificationModel,
GBTClassifier}

scala> import
org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
  1. Load and parse the data:
        scala> val data = 
spark.read.format("libsvm").load("s3a://sparkcookbook/patientdata")
  1. Split the data into training and test datasets:
        scala> val Array(training, test) = data.randomSplit(Array(0.7, 0.3))
  1. Create a classification as a boosting strategy and set the number of iterations to 3:
        scala> val gbt = new GBTClassifier().setMaxIter(10)
  1. Train the model:
        scala> val model = gbt.fit(training)
  1. Evaluate the model on the test instances and compute the test error:
        scala> val predictions = model.transform(test)
scala> val evaluator = new
MulticlassClassificationEvaluator().setMetricName("accuracy")
scala> val accuracy = evaluator.evaluate(predictions)

In this case, the accuracy of the model is 75 percent, which is almost the same as what we got for a random forest. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset