How to do it...

  1. Load scaledhousedata.csv to HDFS:
        $ hdfs dfs -put scaledhousedata.csv scaledhousedata.csv
  1. Start the Spark shell:
        $ spark-shell
  1. Import the statistics and related classes:
        scala> import org.apache.spark.mllib.linalg.Vectors
scala> import org.apache.spark.mllib.linalg.distributed.RowMatrix
  1. Load scaledhousedata.csv as an RDD:
        scala> val data = sc.textFile("scaledhousedata.csv")
  1. For your convenience, data has already been loaded into S3, and you can load it using the following command:
        scala> val data = 
sc.textFile("s3a://sparkcookbook/saratoga/scaledhousedata.csv")
  1. Transform the data into an RDD of dense vectors:
        scala> val parsedData = data.map( line =>
Vectors.dense(line.split(',').map(_.toDouble)))
  1. Create RowMatrix from parsedData:
        scala> val mat = new RowMatrix(parsedData)
  1. Compute one principal component:
        scala> val pc= mat.computePrincipalComponents(1)
  1. Project the rows to the linear space spanned by the principal component:
        scala> val projected = mat.multiply(pc)
  1. Convert the projected RowMatrix object back to the RDD:
        scala> val projectedRDD = projected.rows
  1. Save projectedRDD to HDFS:
        scala> projectedRDD.saveAsTextFile("phdata")

Now we will use this projected feature, which we have decided to call housing density, and plot it against the house price and see whether any new pattern emerges:

  1. Download the HDFS directory phdata to the local directory phdata:
        scala> hdfs dfs -get phdata phdata
  1. Trim the start and end brackets in the data and load it into MS Excel, next to the house price. The following is the plot of the house price versus the housing density:

Let's draw some patterns of this data as follows:

What patterns do we see here? For moving from very high-density to low-density housing, people are ready to pay a heavy premium. As the housing density reduces, the premium flattens out. For example, people will pay a heavy premium to move from condominiums and town homes to a single-family home, but the premium for a single-family home with a three-acre lot size is not going to be much different from a single-family house with a two-acre lot size in a comparable built-up area.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset