Introducing Hadoop | 63
• By default, HDFS is replicate by a factor
of 3, which means they are in a cluster of
at least 3 copies of each file ingested into
the cluster. This factor is fully configurable
and it can be increased or decreased inside
Hadoop configuration file or through com-
mand line using Hadoop command.
• Replicate files from an external source
into HDFS using put or copyFromLocal
commands. Difference in between put
and copyFromLocal is, using put several
remote machines local file can be copied
but through copyFromLocal, only specific
machine’s local file can be placed in HDFS.
• In an enterprise business application, a
commercial Hadoop Distribution Package
is used to ensure high levels of resource
management and job monitoring. In terms
of manual configuration and monitoring of
each and every process, service, the job is
very tedious and it is considered as an inef-
ficient approach in Apache Hadoop distri-
bution package. It is due to the fact that
there is no customized monitoring tool or
user interface to do this efficiently. In this
factor, all the enterprise cluster is generally
configured with commercial Hadoop dis-
tribution package, such as Hortonworks,
Cloudera and Datastax.
Multiple-choice Questions (1 Mark Questions)
1. Data locality feature in Hadoop means
a. Store the same data across multiple
nodes.
b. Relocate the data from one node to
another.
c. Co-locate the data with the computing
nodes in local file system.
d. Distribute the data across multiple
nodes.
2. What mechanisms does Hadoop use to
make NameNode resilient to failure?
a. Take backup of file system metadata to
a local disk.
b. Store the filesystem metadata in cloud.
c. Use a machine with at least 12 CPUs.
d. Using expensive and reliable hardware.
3. Which one of the following is not true
regarding Hadoop?
a. It is a distributed framework.
b. The main algorithm used in it is
MapReduce.
c. It runs with commodity hardware.
d. All are true.
4. When a client communicates with the HDFS
file system, it needs to communicate with
a. Only the NameNode
b. Only the DataNode
c. Both the NameNodeand DataNode
d. None of these
5. The role of a journal node is to
a. Report the location of the blocks in a
DataNode.
b. Report the edit log information of the
blocks in the DataNode.
c. Report the schedules when the jobs are
going to run.
d. Report the activity of various compo-
nents handled by resource manager.
6. In a HDFS system with block size 64 MB
we store a file which is less than 64MB.
Which of the following is true?
a. The file will consume 64 MB.
b. The file will consume more than 64 MB.
c. The file will consume less than 64 MB.
d. Cannot be predicted. Depends on the
decision of NameNode.
M03 Big Data Simplified XXXX 01.indd 63 5/10/2019 9:57:33 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset