Introducing Big Data | 29
Indeed, one should write SQL queries for them. Also, unlike MapReduce, there is no Reduce
step, although the queries can always do aggregational work. Even though they are really a grid
of multiple databases, they present an interface where one sends them a single query and one
gets back a single result set. Thus, it abstracts the complexity of parallel processing. Typically,
MPP data warehouses are packaged as physical appliances, and typically, they do not use direct
attached storage, but instead they use a more enterprise-oriented network storage scheme.
2.7.5 NoSQL
NoSQL stands for ‘not only SQL’. NoSQL databases are different from relational databases, where
they do not mandate a predened structured model for the data. This makes it a better choice to
store any information that does not render itself easily into the relational table and record format.
The key differentiating characteristics of NoSQL databases are as follows.
Most NoSQL databases allow the schema from row to row to differ quite substantially.
They do not have the concept of referential integrity. In fact, most NoSQL databases do not
even support an ACID compliance. From the perspective of consistency, NoSQL databases
do not support immediate consistency in the database. However, by doing that, they do allow
for greater availability. They allow write’s to be buffered and thus, read’s to be less blocked.
NoSQL databases are quite popular with a wide variety of industrial applications. There are four
types of NoSQL databases and they are briey explained as follows.
Key-value stores: In key-value stores, the data is stored in key-value pairs. Look at the exam-
ple in Figure 2.8. Here, we have a Customer table and an Order table. Look at the Customer
table. For a Row ID of 101, we have a few keys, such as First_Name, Last_Name, Address
and Last_Order_ID. Then, there is another row with an ID of 102.
FIGURE 2.8 Example of key-value stores
Database
Table: Customers
Table: Orders
Row ID: 101
First_Name: John
Last_Name: Doe
Address: 123 Park Street
Last_Order_ID: 1701
Row ID: 102
First_Name: Jane
Last_Name: Doe
Address: 456 Green Street
Last_Order_ID: 1702
Row ID: 1701
Price: 1000 USD
Item_ID: 2345
Item_ID: 7890
Row ID: 1702
Price: 700 USD
Item_ID: 4321
Item_ID: 5446
M02 Big Data Simplified XXXX 01.indd 29 5/10/2019 9:56:53 AM
30 | Big Data Simplied
Document stores: In this case, instead of having rows, there are documents. Conceptually,
the documents in a Document Store are similar to rows. The documents contain key and
value pairs. The slight difference here is that the value of a key can itself be a document.
The documents can be JavaScript objects and they are encoded using JavaScript Object
Notation or JSON. JavaScript language ends up being used as the internal language for these
databases as well. The documents can also be in XML or other semi-structured formats. All
these technologies are extremely familiar to a web developer. As such, the document stores
tend to be highly used in web applications, with the usage of this technology, the static con-
tent in a website and the actual data-driven content end up having a lot in common.
As depicted in the example in Figure 2.9, Customers and Orders are stand-alone contain-
ers that have no overall parent. In this case, they are called databases. Instead of containing
rows, they contain documents. The Customer document 101 has an Address key, and the
value of that key is itself a document, with keys and values for House Number and Street
Name. You will also see that the Orders key has a document as its value, and that document,
in turn, has a key called Last_Order_ID. The value of that key points to a document in
another database, in this case, the Order document 1701 in the Orders database.
Graph databases: They are primarily used for tracking relationships and therefore, such
data is considered good for social media applications where friends, networks and their
comments that relate to specific status messages all need to be tracked and analysed. They
really focus on the relationships between entities, rather than especially being focused on
FIGURE 2.9 Example of a document store
Database: Customers Database: Orders
Document ID: 101
First_Name: John
Second_Name: Doe
Address:
House_No: 123
Street_Name: Park Street
Orders:
Last_Order_ID: 1701
Document ID: 102
First_Name: Jane
Second_Name: Doe
Address:
House_No: 456
Street_Name: Green Street
Orders:
Last_Order_ID: 1702
Document ID: 1701
Price: 1000 USD
Item_ID: 2345
Item_ID: 7890
Document ID: 1702
Price: 700 USD
Item_ID: 4321
Item_ID: 5446
M02 Big Data Simplified XXXX 01.indd 30 5/10/2019 9:56:53 AM
Introducing Big Data | 31
the representation of the entities themselves. Graph databases tend to be extremely useful
for social media applications, for example, to track relationships between people in social
networks.
As depicted in the example in Figure 2.10, there is a database with a number of objects
call nodes. There is a node for John Smith. There is another node which has four different
properties in it, namely, House No, Street Name, City, State and Zip. Between two nodes
there is an edge. The edge is given a relationship name of Address. This is a kind of edge that
represents a property. So, John Smith has an address, and the address in turn has four prop-
erties and values of its own. But John Smith can also have a simple relationship to another
node in the database that represents another person. In this case, John Smith is a friend of
John Doe. John Smith has also sent an invitation to Jane Doe in his social network. Also,
John Doe has commented on a photo by Jane Smith in a social network. So, as observed in
the diagram, there are heterogeneous types of data which can be related through the edges.
There can also be a hierarchical relationship. In this case, John Smith placed an order, and
that order has its own properties and values as well. Evidently, this type of database is highly
appropriate for social media applications.
Wide column stores: It is found that, HDFS based databases like HBase and Cassandra are
wide column or column family stores, and so the column family store category is used in Big
Data scenarios frequently. Let us now take a look at wide column stores with an example.
FIGURE 2.10 Example of a graph database
Database
House No: 123
Street: Park Street
City: New York
State: NY
Zip Code: 10014
Jane Smith
John Doe
John Smith
Jane Doe
Order ID: 1701
Total Price: 1000
USD
Item ID: 2345
Type: Dress
Colour: Red
Item ID: 7890
Type: Shirt
Colour: Blue
Address
Friend
Commented
on a photo
Sent a friend
request
Placed an
onine order
Order
contains
Order
contains
M02 Big Data Simplified XXXX 01.indd 31 5/10/2019 9:56:54 AM
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset