Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Doing classification using decision trees

How to do it...

Start the Spark shell:

        $ spark-shell

Do the required imports:

        scala> import org.apache.spark.mllib.util.MLUtils
        scala> import org.apache.spark.mllib.classification.SVMWithSGD

Load the data as an RDD:

        scala> val data = 
        MLUtils.loadLibSVMFile(sc,"s3a://sparkcookbook
        /medicaldata/diabetes.libsvm")

Count the number of records:

        scala> data.count

Divide the dataset into equal halves of training data and testing data:

        scala> val trainingAndTest = data.randomSplit(Array(0.5,0.5))

Assign the training and test data:

        scala> val trainingData = trainingAndTest(0)
scala> val testData = trainingAndTest(1)

Train the algorithm and build the model for 100 iterations (you can try different iterations, but at a certain point of inflection , you'll see that the results start to converge and that point of inflection is the right number of iterations to choose):

        scala> val model = SVMWithSGD.train(trainingData,100)

Now you can use this model to predict a label for any dataset. Predict the label for the first point in the test data:

        scala> val label = model.predict(testData.first.features)

Create a tuple that has the first value as a prediction for the test data and the second value as the actual label, which will help us compute the accuracy of our algorithm:

        scala> val predictionsAndLabels = testData.map( r => 
        (model.predict(r.features),r.label))

You can count how many records have predictions and actual label mismatches:

        scala> predictionsAndLabels.filter(p => p._1 != p._2).count

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.