Chapter 4. Transforming and Shaping Up Graphs to Your Needs

In this chapter, we will learn to transform graphs using different sets of operators. In particular, we will cover graph-specific operators that either change the properties of graph elements or modify the structure of graphs. In other words, all the operators that we use here are methods that are invoked on a graph and return a new graph. In addition, we will use join methods to combine graph data with other datasets. Using real-world datasets, you will understand when and how to:

  • Use property operators to modify vertex or edge properties
  • Use structural operators to modify the shape of a graph
  • Join additional RDD collections with a property graph

Transforming the vertex and edge attributes

The map operator is a core method for transforming distributed datasets or RDDs in Spark. Similarly, property graphs also have three map operators defined as follows:

class Graph[VD, ED] {
  def mapVertices[VD2](mapFun: (VertexId, VD) => VD2): Graph[VD2, ED]
  def mapEdges[ED2](mapFun: Edge[ED] => ED2): Graph[VD, ED2]
  def mapTriplets[ED2](mapFun: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
}

Each of these methods is called on a property graph with vertex attribute type VD and edge attribute type ED. Each of them also takes a user-defined mapping function mapFun that performs one of the following:

  • For mapVertices, mapFun takes a pair of (VertexId, VD) as input and returns a transformed vertex attribute of type VD2.
  • For mapEdges, mapFun takes an Edge object as input and returns a transformed edge attribute of type ED2.
  • For mapTriplets, mapFun takes an EdgeTriplet object as input and returns a transformed edge attribute of type ED2.

Note

In each case, the graph structure remains intact, meaning these map operators never change the links between the vertices or their vertex indices. This is one key advantage of these operators compared to the basic RDD map operator. Although the latter can be used to achieve the same result, the former is also more efficient, thanks to the GraphX system optimization. Therefore, these three mapping operators should always be used if you just want to transform a graph's attributes without modifying its structure.

The difference between mapEdges and mapTriplets is that, for the latter, both the edge and source attributes are available in the triplet input of mapFun to create a new edge attribute. In contrast, the mapFun in mapEdges has access to only the edge attribute.

Now, let's see them in action through some simple examples.

mapVertices

Consider a social graph between people, where the vertex attribute has a type Person and the edge attribute has a type Link. First, let's create these Scala types as follows:

case class Person(first: String, last: String, age: Int)
case class Link(relationship: String, duration: Float)  

Suppose we build the graph from VertexRDD called people and an EdgeRDD collection named links:

val inputGraph: Graph[Person, Link] = Graph(people, links)

If we want, we can transform the attributes of the people to contain only their name using mapVertices:

val outputGraph: Graph[String, Link] = 
inputGraph.mapVertices((_, person) => person.first + person.last)

The new outputGraph now has a vertex attribute of type String instead of Person. The links between the people remain unchanged.

mapEdges

Similarly, suppose we are interested only in the nature of relationships, not their duration. This time, we can use mapEdges to change the edge attribute as follows:

val outputGraph: Graph[Person, String] = 
inputGraph.mapEdges(link => link.relationship)

mapTriplets

Finally, suppose we want to keep track of the people's ages from when they first met and add this information into the edge attribute. We can do that by using mapTriplets:

val outputGraph: Graph[Person, (Int, Int)] = 
inputGraph.mapTriplets(t => (t.srcAttr.age - t.attr.duration, 
t.dstAttr.age - t.attr.duration))

If we want to change both the edge and vertex attributes of a graph, we can simply chain mapEdges or mapTriplets with mapVertices since each of these methods always returns a property graph.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset