How to do it...

  1. Load pres.csv to HDFS:
        $ hdfs dfs -put pres.csv 
  1. Start the Spark shell:
        $ spark-shell
  1. Import the statistics and related classes:
        scala> import org.apache.spark.mllib.linalg.Vectors
scala> import org.apache.spark.mllib.linalg.distributed.RowMatrix
  1. Load pres.csv as an RDD:
        scala> val data = sc.textFile("pres.csv")
  1. Transform data into an RDD of dense vectors:
        scala> val parsedData = line => 
  1. Create RowMatrix from parsedData:
        scala> val mat = new RowMatrix(parsedData)
  1. Compute svd:
        scala> val svd = mat.computeSVD(2,true)
  1. Calculate the U factor (eigenvector):
        scala> val U = svd.U
  1. Calculate the matrix of singular values (eigenvalues):
        scala> val s = svd.s
  1. Calculate the V factor (eigenvector):
If you look at s, you will realize that it gave a much higher score to the Npr article than the Fox article.

