Preface

What can graphs—the things with edges and vertices, not the things with axes and tick marks—do and how can they be used with Spark? These are the questions we try to answer in this book.

Frequently it is said, “Graphs can do anything,” or at least, “There are a bunch of different things you can do with graphs.” That says nothing, of course, so in this book we show a number of specific, real-life ways you can apply graphs and talk about how to implement such solutions in Spark GraphX.

A lot of technology buzzwords are applicable to this book: Big Data, Hadoop, Spark, graphs, machine learning, Scala, and functional programming. We break it all down for you. Even though we end up in some fairly advanced areas, we don’t assume anything more than an ability to program in some language such as Java.

This chart from Google Trends shows the relative interest in these buzzwords through early 2016:

Note that for the generic terms spark and graphs we had to substitute the overly specific Apache Spark and edges and vertices, but the trends can still be seen. A couple of these technologies, machine learning and graphs, have long histories within academic computer science and are attracting new interest in the commercial realm as the availability of Big Data is now mainstreaming these technologies. If you studied these technologies in school as theory, the world is ready now for you to put them into practice.

A lot of companies, including the ones we work for and have worked for in the past, have put Spark—though not necessarily GraphX—into production. This makes it more than just a little convenient when embarking on prototyping graph solutions to try GraphX first. If you have a Spark cluster already, or if you decide to spin up a Spark cluster in the cloud, such as with Databricks or Amazon, you can get started with graphs without having to set up a new graph-specific cluster or technology, and you can use your Spark skills in the GraphX API. As more and more applications of graphs hit the newsstands—from rooting out terrorist networks on Twitter to fraud detection in credit card transaction data—GraphX becomes an easy platform choice for trying them out.

In this book, we simultaneously take on two ambitious goals: to cover everything possible about Spark GraphX, and to assume little to no expertise about any of the technologies represented by the aforementioned buzzwords. The biggest challenge was the hefty amount of prerequisites to get into GraphX—specifically, Spark, Scala, and graphs. Other challenges were the extensive GraphX API and the many different ways graphs can be used. The result is an In Action book that differs a bit from others: it takes a while to get started, with the first five chapters laying the groundwork, and there are a number of interesting examples rather than one that gradually gets built up over the course of the book. In books about other technologies the reader might come with a problem to solve; this book attempts to demystify graphs by showing precisely what problems graphs can solve. And it does so without assuming a lot of background knowledge and experience.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset