Inner join

Inner join requires the left and right tables to have the same column. If you have duplicate or multiple copies of the keys on either the left or right side, the join will quickly blow up into a sort of cartesian join, taking a lot longer to complete than if designed correctly, to minimize the multiple keys:

We will consider the cities and temperatures only if the cityID has both records as shown in the following code:

private static class InnerJoinReducer
extends Reducer<Text, Text, Text, IntWritable> {
private IntWritable result = new IntWritable();
private Text cityName = new Text("Unknown");
public void reduce(Text key, Iterable<Text> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
int n = 0;

for (Text val : values) {
String strVal = val.toString();
if (strVal.length() <=3)
{
sum += Integer.parseInt(strVal);
n +=1;
} else {
cityName = new Text(strVal);
}
}
if (n!=0 && cityName.toString().compareTo("Unknown") !=0) {
result.set(sum / n);
context.write(cityName, result);
}
}
}

The output will be as shown in the following code (without city-6 or Las Vegas, as shown earlier in original output):

Boston 22
New York 23
Chicago 23
Philadelphia 23
San Francisco 22
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset