Introducing MapReduce | 95
7. Can the number of reducers be set to zero?
a. Yes
b. No
c. Not applicable
d. None of the above
8. Where is the Map Output (intermediate
key-value data) stored?
a. HDFS
b. Local File System
c. Name Node
d. Data Node
9. When does the Reduce start in a Map
Reduce?
a. Before any Map job starts
b. When first Map job is completed
c. When all the child Map job is completed
d. None of the above
Short-answer Type Questions (5 Marks Questions)
1. Name the most common input formats
defined in Hadoop. Which one is default?
2. Rearrange the main configuration param-
eters that the user need to specify to run
Mapreduce Job.
a. Job’s input locations in the distributed
file system.
b. Input format
c. Class containing the map function.
d. Output format
e. Job’s output location in the distributed
file system.
f. Class containing the reduce function.
g. Application JAR file containing the
mapper, reducer and driver classes for
execution and deployment.
3. What is InputSplit in Hadoop? Please
explain.
4. Assume that Hadoop spawned 100 tasks
for a job and one of the tasks failed. What
will Hadoop MapReduce framework do?
5. What is the difference between an Input
Split and HDFS Block? Please explain.
6. Explain the difference between Job.sub-
mit() and waitForCompletion().
7. What will happen if we run a MapReduce
job with an output directory that already
exists? Please explain the root cause here.
8. How an input file is made ready from
HDFS by MapReduce framework. Please
explain.
9. How are the keys grouped before reaching
the Reduce phase? Explain in detail.
10. What will be the problem if the Reducer
function does not receive the values (com-
ing values from Map) in a List? Why it is
needed so? Please explain.
11. What are the main configuration parame-
ters specified in MapReduce?
Long-answer Type Questions (10 Marks Questions)
1. What is shuffling and sorting in
MapReduce? Please explain in detail.
2. Explain the internal flow of a MapReduce
job with a diagram.
3. What is Speculative Execution MapReduce?
What is the main reason behind it and how
does MapReduce framework handle it?
4. How can you troubleshoot a MapReduce
job after getting an exception? Please
explain in detail.
5. Explain in detail how Yarn schedules a
MapReduce job in the job queue.
6. How can we troubleshoot a MapReduce
job? What will be the action you take if a
M04 Big Data Simplified XXXX 01.indd 95 5/10/2019 9:58:28 AM