Inner join

Inner join requires the left and right tables to have the same column. If you have duplicate or multiple copies of the keys on either left or right side, the join will quickly blow up into sort of a cartesian join, taking lot longer to complete than if designed correctly to minimize the multiple keys:

Now, we are ready to perform an inner join to join the two DataSets of tuples as shown in the following code:

cities2.join(temp2)
.where(0)
.equalTo(0)
.first(10).print()

The output of this job is as follows showing the tuples from the two DataSets where cityID exists in both DataSets:

 ((1,Boston),(1,21))
((2,New York),(2,22))
((3,Chicago),(3,23))
((4,Philadelphia),(4,24))
((5,San Francisco),(5,25))
((1,Boston),(1,23))
((2,New York),(2,24))
((3,Chicago),(3,25))
((4,Philadelphia),(4,26))
((5,San Francisco),(5,18))

Now, if we apply aggregation and add the temperatures for each city, we will get the total temperature per city. You can do this by writing the code as shown in the following code:

cities2
.join(temp2)
.where(0)
.equalTo(0)
.map(x=> (x._1._2, x._2._2.toInt))
.groupBy(0)
.sum(1)
.first(10).print()

This shows the following result:

(Boston,111)
(Chicago,116)
(New York,119)
(Philadelphia,116)
(San Francisco,113)

The job can be seen in flink UI:

The join() API is defined as follows:


/**
* Initiates a Join transformation.
*
* <p>A Join transformation joins the elements of two
* {@link DataSet DataSets} on key equality and provides multiple ways to combine
* joining elements into one DataSet.
*
* <p>This method returns a {@link JoinOperatorSets} on which one of the {@code where} methods
* can be called to define the join key of the first joining (i.e., this) DataSet.
*
* @param other The other DataSet with which this DataSet is joined.
* @return A JoinOperatorSets to continue the definition of the Join transformation.
*
* @see JoinOperatorSets
* @see DataSet
*/
public <R> JoinOperatorSets<T, R> join(DataSet<R> other) {
return new JoinOperatorSets<>(this, other);
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset