Other BigData Tools andTechnologies | 189
Short-answer Type Questions (5 Marks Questions)
1. Explain what is Sqoop in Hadoop? Please
explain the usage.
2. What are the components used in Hive
query processor?
3. What is Bucket in Hive?
4. For each Sqoop copying into HDFS, how
many MapReduce jobs and tasks will be
submitted? Please explain.
5. I am having around 500 tables in a data-
base. I want to import all the tables from
the database except the tables named Table
498, Table 323 and Table 199. How can
we do this without having to import the
tables one by one?
6. Explain the significance of using split-by
clause in Apache Sqoop.
7. I want to see the present working directory
in UNIX from Hive. Is it possible to run
this command from Hive?
8. What is the use of explode in Hive?
9. Is it possible to change the default location
of managed tables in Hive, if so how?
10. Why do we need Hive?
Long-answer Type Questions (10 Marks Questions)
1. What is partitioning? When we may need
to customize the default partition? Please
explain the scenario with an example.
2. If you run a select * query in Hive, why does
it not run MapReduce? Please explain it.
3. What is the difference between external
table and managed table?
4. Why do we perform partitioning in Hive?
Please explain the advantage of it.
5. Suppose, we create a table that contains
details of all the transactions done by the
customers of year 2018. CREATE TABLE
customer_transaction_details (cust_id INT,
amount FLOAT, month STRING, country
STRING) ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘,’ ;
Now, after inserting 50,000 tuples in this
table, we want to know the total revenue
generated for each month. But the problem
is, Hive is taking too much time in pro-
cessing this query. How will you solve this
problem and list the steps that we will be
taking in order to do so?
6. Explain the data flow in Hive with a dia-
gram. Please describe each and every step.
7. What is the usage of Metastore in Hive. If
Metastore is not present in Hive, then what
will be the problem?
8. How will you update the rows that are
already exported? Write Sqoop command
to show all the databases in MySQL server.
9. I am getting connection failure exception
during connecting to MySQL through
Sqoop, what is the root cause and fix for
this error scenario?
10. How to create a table in MySQL and how
to insert the values into the table? Please
import this table into Hive/HDFS using
Apache Sqoop.
11. Please explain how apache Flume works.
Also please describe a flow about how
to extract a log file from source path and
ingest into HDFS by Flume.
M07 Big Data Simplified XXXX 01.indd 189 5/17/2019 2:50:16 PM