How to do it...

Start the Spark shell:

         $ spark-shell

Perform the required imports:

        scala> import org.apache.spark.ml.classification.
        {RandomForestClassificationModel,RandomForestClassifier}
        scala> import 
        org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator

Load and parse the data:

        scala> val data = 
        spark.read.format("libsvm").load("s3a://sparkcookbook/rf")

Split the data into training and test datasets:

        scala> val Array(training, test) = data.randomSplit(Array(0.7, 0.3))

Create a classification as a tree strategy (random forest also supports regression):

        scala> val rf = new RandomForestClassifier().setNumTrees(3)

Train the model:

        scala> val model = rf.fit(training)

Evaluate the model on test instances and compute the test error:

        scala> val predictions = model.transform(test)
        scala> val evaluator = new 
        MulticlassClassificationEvaluator().setMetricName("accuracy")
        scala> val accuracy = evaluator.evaluate(predictions)

Check the model:

        scala> model.toDebugString
        "RandomForestClassificationModel (uid=rfc_ac46ea5af585) with 3 trees
        Tree 0 (weight 1.0):
        If (feature 1 <= 0.0)
        Predict: 0.0
        Else (feature 1 > 0.0)
        Predict: 1.0
        Tree 1 (weight 1.0):
        If (feature 5 <= 0.0)
        Predict: 1.0
        Else (feature 5 > 0.0)
        Predict: 0.0
        Tree 2 (weight 1.0):
        If (feature 5 <= 0.0)
        Predict: 1.0
        Else (feature 5 > 0.0)
        Predict: 0.0
        "

We used toy data to illustrate the value of random forest, but now, let's do the same exercise on the diabetes data by replacing step 3 with the following and running steps 4 to 7 again:

        scala> val data = 
        spark.read.format("libsvm").load("s3a://sparkcookbook/patientdata")

Now the accuracy has reached 74.6 percent.

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...