Let's apply the decision tree to the diabetes dataset we worked on in the previous recipe:

  1. Start the Spark shell or the Databricks Cloud shell and do the necessary imports:
        $ spark-shell
scala> import
scala> import
  1. Read the diabetes data as a DataFrame:
        scala> val data ="libsvm").option("inferschema","true")
  1. Split it into training and test datasets:
        scala> val Array(trainingData, testData) = 
data.randomSplit(Array(0.7, 0.3))
  1. Initialize the decision tree classifier:
        scala> val dt = new DecisionTreeClassifier()
  1. Train the model using the training data:
        scala> val model =
  1. Do predictions on the test dataset:
        scala> val predictions = model.transform(testData)
  1. Initialize the evaluator:
        scala> val evaluator = new BinaryClassificationEvaluator()
  1. Evaluate the predictions:
        scala> val auroc = evaluator.evaluate(predictions)
  1. Print the area under the curve:
        scala> println(s"Area under ROC = $auroc")
Area under ROC = 0.7624556737588652
We used the decision tree classifier here without tweaking a hyperparameter and got 76 percent of the area under the curve. Why don't you tweak hyperparameters yourselves and see whether you can improve it even further? 
