Chapter 1. Two important technologies: Spark and graphs
1.1. Spark: the step beyond Hadoop MapReduce
1.1.1. The elusive definition of Big Data
1.2. Graphs: finding meaning from relationships
1.3. Putting them together for lightning fast graph processing: Spark GraphX
1.3.1. Property graph: adding richness
1.3.2. Graph partitioning: graphs meet Big Data
1.3.3. GraphX lets you choose: graph parallel or data parallel
1.3.4. Various ways GraphX fits into a processing flow
1.3.5. GraphX vs. other systems
1.3.6. Storing the graphs: distributed file storage vs. graph database
2.1. Getting set up and getting data
3.1. Scala, the native language of Spark
3.1.1. Scala’s philosophy: conciseness and expressiveness
3.2.1. Distributed in-memory data: RDDs
4.3. Serialization/deserialization
Chapter 5. Built-in algorithms
5.1. Seek out authoritative nodes: PageRank
5.1.1. PageRank algorithm explained
5.2. Measuring connectedness: Triangle Count
5.3. Find the fewest hops: ShortestPaths
5.4. Finding isolated populations: Connected Components
5.5. Reciprocated love only, please: Strongly Connected Components
Chapter 6. Other useful graph algorithms
6.1. Your own GPS: Shortest Paths with Weights
6.2. Travelling Salesman: greedy algorithm
6.3. Route utilities: Minimum Spanning Trees
6.3.1. Deriving taxonomies with Word2Vec and Minimum Spanning Trees
7.1. Supervised, unsupervised, and semi-supervised learning
7.2. Recommend a movie: SVDPlusPlus
7.3.1. Determine topics: Latent Dirichlet Allocation
7.3.2. Detect spam: LogisticRegressionWithSGD
7.3.3. Image segmentation (for computer vision) using Power Iteration Clustering
7.4. Poor man’s training data: graph-based semi-supervised learning
Chapter 8. The missing algorithms
8.1. Missing basic graph operations
8.2.1. Matching vertices and constructing the graph
8.2.2. Improving performance with IndexedRDD, the RDD HashMap
8.3. Poor man’s graph isomorphism: finding missing Wikipedia infobox items
Chapter 9. Performance and monitoring
9.1. Monitoring your Spark application
9.1.1. How Spark runs your application
9.1.2. Understanding your application runtime with Spark monitoring
Chapter 10. Other languages and tools
10.1. Using languages other than Scala with GraphX
10.1.1. Using GraphX with Java 7
10.1.2. Using GraphX with Java 8
10.1.3. Whether GraphX may gain Python or R bindings in the future
10.2. Another visualization tool: Apache Zeppelin plus d3.js
10.3. Almost a database: Spark Job Server
10.3.1. Example: Query Slashdot friends degree of separation
10.4. Using SQL with Spark graphs with GraphFrames
10.4.1. Getting GraphFrames, plus GraphX interoperability
10.4.2. Using SQL for convenience and performance
A.1. On a local virtual machine: CDH QuickStart VM
A.2. Onto your laptop and Hadoopless: Linux or OS X
Appendix B. Gephi visualization software
B.1. Laying out your environment
Appendix C. Resources: where to go for more
Appendix D. List of Scala tips in this book