Let's look at an example of a term-document matrix. We are going to look at two news items about the US presidential elections.
The following are the links to the two documents:
- Fox: http://www.foxnews.com/politics/2015/03/08/top-2016-gop-presidential-hopefuls-return-to-iowa-to-hone-message-including/
- Npr: http://www.npr.org/blogs/itsallpolitics/2015/03/09/391704815/in-iowa-2016-has-begun-at-least-for-the-republican-party
Let's build the presidential candidate matrix out of these two news items:
Let's put this matrix in a CSV file and then put it in HDFS. We will apply SVD to this matrix and analyze the results.