How to do it...

  1. Start the Spark shell:
        $ spark-shell
  1. Do the imports:
        scala> import org.apache.spark.ml.classification.LogisticRegression
scala> import org.apache.spark.ml.linalg.{Vector, Vectors}
  1. Create a tuple for Lebron, who is a basketball player, is 80 inches tall, and weighs 250 lbs:
        scala> val lebron = (1.0,Vectors.dense(80.0,250.0))
  1. Create a tuple for Tim, who is not a basketball player, is 70 inches tall, and weighs 150 lbs:
        scala> val tim = (0.0,Vectors.dense(70.0,150.0))
  1. Create a tuple for Brittany, who is a basketball player, is 80 inches tall, and weighs 207 lbs:
        scala> val brittany = (1.0,Vectors.dense(80.0,207.0))
  1. Create a tuple for Stacey, who is not a basketball player, is 65 inches tall, and weighs 120 lbs:
        scala> val stacey = (0.0,Vectors.dense(65.0,120.0))
  1. Create a training DataFrame:
        scala> val training = spark.createDataFrame(Seq
(lebron,tim,brittany,stacey)).toDF("label","features")
  1. Create a LogisticRegression estimator:
        scala> val estimator = new LogisticRegression
  1. Create a transformer by fitting the estimator with the training DataFrame:
        scala> val transformer = estimator.fit(training)
  1. Now, let's create a test data—John is 90 inches tall, weighs 270 lbs, and is a basketball player:
        scala> val john = Vectors.dense(90.0,270.0)
  1. Create more test data—Tom is 62 inches tall, weighs 150 lbs, and is not a basketball player:
        scala> val tom = Vectors.dense(62.0,120.0)
  1. Create a test data DataFrame:
        scala> val test = spark.createDataFrame(Seq(
(1.0, john),
(0.0, tom)
)).toDF("label", "features")
  1. Do the prediction using the transformer:
        scala> val results = transformer.transform(test)
  1. Print the schema of the results DataFrame:
            scala> results.printSchema

root
|-- label: double (nullable = false)
|-- features: vector (nullable = true)
|-- rawPrediction: vector (nullable = true)
|-- probability: vector (nullable = true)
|-- prediction: double (nullable = true)

As you can see, besides prediction, the transformer has also added rawPrediction and a probability column.

  1. Print the DataFrame results:
            scala> results.show

+-----+------------+----------------+--------------------+-------+

|label| features| rawPrediction| probability|prediction|
+-----+------------+----------------+--------------------+-------+
| 1.0|[90.0,270.0]|[-61.884758625897...|[1.32981373684616...| 1.0|
| 0.0|[62.0,120.0]|[31.4607691062275...|[0.99999999999997...| 0.0|
+-----+------------+----------------+--------------------+-------+
  1. Let's select only features and prediction:
          scala> val predictions = results.select
("features","prediction").show

+------------+----------+

| features|prediction|
+------------+----------+
|[90.0,270.0]| 1.0|
|[62.0,120.0]| 0.0|
+------------+----------+
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset