How to do it...

  1. Load pres.csv to HDFS:
        $ hdfs dfs -put pres.csv 
  1. Start the Spark shell:
        $ spark-shell
  1. Import the statistics and related classes:
        scala> import org.apache.spark.mllib.linalg.Vectors
scala> import org.apache.spark.mllib.linalg.distributed.RowMatrix
  1. Load pres.csv as an RDD:
        scala> val data = sc.textFile("pres.csv")
  1. Transform data into an RDD of dense vectors:
        scala> val parsedData = data.map( line => 
Vectors.dense(line.split(',').map(_.toDouble)))
  1. Create RowMatrix from parsedData:
        scala> val mat = new RowMatrix(parsedData)
  1. Compute svd:
        scala> val svd = mat.computeSVD(2,true)
  1. Calculate the U factor (eigenvector):
        scala> val U = svd.U
  1. Calculate the matrix of singular values (eigenvalues):
        scala> val s = svd.s
  1. Calculate the V factor (eigenvector):
        scala> val s = svd.s

If you look at s, you will realize that it gave a much higher score to the Npr article than the Fox article.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset