How to do it...

Start the Spark shell:

        $ spark-shell

Do the imports:

        scala> import org.apache.spark.ml.classification.LogisticRegression
        scala> import org.apache.spark.ml.linalg.{Vector, Vectors}

Create a tuple for Lebron, who is a basketball player, is 80 inches tall, and weighs 250 lbs:

        scala> val lebron = (1.0,Vectors.dense(80.0,250.0))

Create a tuple for Tim, who is not a basketball player, is 70 inches tall, and weighs 150 lbs:

        scala> val tim = (0.0,Vectors.dense(70.0,150.0))

Create a tuple for Brittany, who is a basketball player, is 80 inches tall, and weighs 207 lbs:

        scala> val brittany = (1.0,Vectors.dense(80.0,207.0))

Create a tuple for Stacey, who is not a basketball player, is 65 inches tall, and weighs 120 lbs:

        scala> val stacey = (0.0,Vectors.dense(65.0,120.0))

Create a training DataFrame:

        scala> val training = spark.createDataFrame(Seq
        (lebron,tim,brittany,stacey)).toDF("label","features")

Create a LogisticRegression estimator:

        scala> val estimator = new LogisticRegression

Create a transformer by fitting the estimator with the training DataFrame:

        scala> val transformer = estimator.fit(training)

Now, let's create a test data—John is 90 inches tall, weighs 270 lbs, and is a basketball player:

        scala> val john = Vectors.dense(90.0,270.0)

Create more test data—Tom is 62 inches tall, weighs 150 lbs, and is not a basketball player:

        scala> val tom = Vectors.dense(62.0,120.0)

Create a test data DataFrame:

        scala> val test = spark.createDataFrame(Seq(
          (1.0, john),
          (0.0, tom)
        )).toDF("label", "features")

Do the prediction using the transformer:

        scala> val results = transformer.transform(test)

Print the schema of the results DataFrame:

            scala> results.printSchema

        root
        |-- label: double (nullable = false)
        |-- features: vector (nullable = true)
        |-- rawPrediction: vector (nullable = true)
        |-- probability: vector (nullable = true)
        |-- prediction: double (nullable = true)

As you can see, besides prediction, the transformer has also added rawPrediction and a probability column.

Print the DataFrame results:

            scala> results.show

        +-----+------------+----------------+--------------------+-------+
        |label| features| rawPrediction| probability|prediction|
        +-----+------------+----------------+--------------------+-------+
        | 1.0|[90.0,270.0]|[-61.884758625897...|[1.32981373684616...| 1.0|
        | 0.0|[62.0,120.0]|[31.4607691062275...|[0.99999999999997...| 0.0|
        +-----+------------+----------------+--------------------+-------+

Let's select only features and prediction:

          scala> val predictions = results.select
          ("features","prediction").show

        +------------+----------+
        | features|prediction|
        +------------+----------+
        |[90.0,270.0]| 1.0|
        |[62.0,120.0]| 0.0|
        +------------+----------+

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...