Step 1 - Import packages, load, parse, and explore the movie and rating dataset

We will load, parse, and do some exploratory analysis. However, before that, let's import the necessary packages and libraries:

package com.packt.ScalaML.MovieRecommendation 
import org.apache.spark.sql.SparkSession import org.apache.spark.mllib.recommendation.ALS import org.apache.spark.mllib.recommendation.MatrixFactorizationModel import org.apache.spark.mllib.recommendation.Rating import scala.Tuple2 import org.apache.spark.rdd.RDD

This code segment should return you the DataFrame of the ratings:

val ratigsFile = "data/ratings.csv"
val df1 = spark.read.format("com.databricks.spark.csv").option("header", true).load(ratigsFile)
val ratingsDF = df1.select(df1.col("userId"), df1.col("movieId"), df1.col("rating"), df1.col("timestamp"))
ratingsDF.show(false)

The following code segment shows you the DataFrame of the movies:

val moviesFile = "data/movies.csv"
val df2 = spark.read.format("com.databricks.spark.csv").option("header", "true").load(moviesFile)
val moviesDF = df2.select(df2.col("movieId"), df2.col("title"), df2.col("genres"))
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset