HBase is a highly scalable NoSQL data store that supports columnar-style data storage. As we will see in the next recipe, it works very closely with Hadoop.
The preceding screenshot depicts the HBase data model. As shown, HBase includes several tables. Each table has zero or more rows where a row consists of a single row ID and multiple name-value pairs. For an example, the first row has the row ID Foundation
, and several name-value pairs such as author
with value asimov
. Although the data model has some similarities with the relational data model, unlike the relational data model, different rows in HBase data model may have different columns. For instance, the second row may contain completely different name-value pairs from the first one. You can find more details about the data model from Google's Bigtable paper http://research.google.com/archive/bigtable.html.
Hadoop by default loads data from flat files, and it is a responsibility of the MapReduce job to read and parse the data through data formatters. However, often there are use cases where the data is already in a structured form. Although it is possible to export this data into flat files, parsing and processing the use cases using conventional MapReduce jobs leads to several disadvantages:
HBase addresses these concerns by enabling users to read data directly from HBase and write results directly to HBase without having to convert them to flat files.
This section demonstrates how to install HBase.
HBASE_HOME
.>tarxfz hbase-0.94.2-SNAPSHOT.tar.gz
>cd $HBASE_HOME >mkdirhbase-data
HBASE_HOME/conf/hbase-site.xml
file.<configuration> <property> <name>hbase.rootdir</name> <value>file:///Users/srinath/playground/hadoop-book/hbase-0.94.2/hbase-data </value> </property> </configuration>
HBASE_HOME
:>./bin/start-hbase.sh
HBASE_HOME
:>bin/hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Type "exit<RETURN>" to leave the HBase Shell Version 0.92.1, r1298924, Fri Mar 9 16:58:34 UTC 2012
test
table and list its content using the following commands:hbase(main):001:0> create 'test', 'cf' 0 row(s) in 1.8630 seconds hbase(main):002:0> list 'test' TABLE test 1 row(s) in 0.0180 seconds
row1
, for row ID, column name test
, and value val1
to the test
table using the following commands:hbase(main):004:0> put 'test', 'row1', 'cf:a', 'val1' 0 row(s) in 0.0680 seconds
hbase(main):005:0> scan 'test' ROW COLUMN+CELL row1column=cf:a, timestamp=1338485017447, value=val1 1 row(s) in 0.0320 seconds
row1
as row ID and test
as the column ID:hbase(main):006:0> get 'test', 'row1' COLUMN CELL cf:atimestamp=1338485017447, value=val1 1 row(s) in 0.0130 seconds hbase(main):007:0> exit
HBASE_HOME
:> ./bin/stop-hbase.sh stoppinghbase..............
The preceding steps configure and run the HBase in the local mode. The server start command starts the HBase server, and HBase shell connects to the server and issues the commands.
The preceding commands show how to run HBase in the local mode. The link http://hbase.apache.org/book/standalone_dist.html#distributed explains how to run HBase in the distributed mode.