Using a map-side join in Apache Hive to analyze geographical events
Using optimized full outer joins in Apache Hive to analyze geographical events
Joining data using an external key-value store (Redis)
Introduction
In most processing environments, there will be a need to join multiple datasets to produce some final result. Unfortunately, joins in MapReduce are non-trivial and can be an expensive operation. This chapter will demonstrate different approaches to joining data in Hadoop using a number of tools, including Java MapReduce, Apache Pig, and Apache Hive. In addition, this chapter will demonstrate how to leverage external memory resources using Hadoop MapReduce.