Working with NoSQL databases

In the previous section of this chapter, you learned the basics of relational databases and how to use SQL to query data. Relational data is mostly organized in a tabular form, that is, as a collection of tables with relations.

However, when the volume of data exceeds the capacity of a server, problems occur because the traditional model of relational databases does not easily support horizontal scalability, that is, storing data in a cluster of servers instead of a single one. This adds a new layer of complexibility of database management as the data is stored in a distributed form while still accessible as one logical database. 

In recent years, NoSQL, or non-relational databases, have become much more popular than before due to the introduction of new database models and the remarkable performance they exhibit in big data analytics and real-time applications. Some non-relational databases are designed for high availability, scalability, and flexibility, and some for high performance.

The difference in storage model between relational databases and non-relational databases is notable. For example, for a shopping website, the goods and comments can be stored in a relational database with two tables: goods and comments. All the information of goods is stored in one table, and all comments on each good are stored in the other. The following code shows the basic structure of such tables:

products: 
code,name,type,price,amount 
A0000001,Product-A,Type-I,29.5,500 

Each comment has a field that points to the product it is subject to:

comments: 
code,user,score,text 
A0000001,david,8,"This is a good product" 
A0000001,jenny,5,"Just so so" 

When a product has many related tables and the number of records is so large that the database must be distributed across a great number of servers, it would be hard to query such a database because executing a simple query can be extremely inefficient. If we use MongoDB to store such data, each good will stored as a document and all comments of this good are stored in an array as a field of the document. As a result, it would be easy to query the data, and the database can be easily distributed to a large number of servers.

Working with MongoDB

MongoDB is a popular non-relational database that provides a document-oriented way of storing data. Each product is a document in a collection. The product has some fields of descriptive information and has a field that is an array of comments. All comments are subdocuments so that each logical item can be stored in their own logical form.

Here is a JSON (https://en.wikipedia.org/wiki/JSON) representation of a good in the collection:

{ 
  "code":"A0000001", 
  "name":"Product-A", 
  "type":"Type-I", 
  "price":29.5, 
  "amount":500, 
  "comments":[ 
    { 
      "user":"david", 
      "score":8, 
      "text":"This is a good product" 
    }, 
    { 
      "user":"jenny", 
      "score":5, 
      "text":"Just so so" 
    } 
  ] 
} 

A relational database may contain many schemas. Each schema (or database) may consist of many tables. Each table may contain many records. Similarly, a MongoDB instance can host many databases. Each database can include many collections. Each collection may contain many documents. The main difference is that the records in a table of a relational database need to have the same structure, but a document in a collection of a MongoDB database is schema-less and is flexible enough to have nested structures.

In the preceding JSON code, for example, a good is represented by such a document in which codenametypeprice, and amount are data fields with simple data types while comments is an array of objects. Each comment is represented by an object in comments and has a structure of userscore, and text. All comments of a good are stored as an object in comments. Therefore, a good is highly self-contained in terms of product information and comments. If we need information of a product, we no longer need to join two tables but pick out several fields.

To install MongoDB, visit https://docs.mongodb.com/manual/installation/ and follow the instructions. It supports nearly all major platforms.

Querying data from MongoDB

Suppose we have a working MongoDB instance running on a local machine. We can use the mongolite package to work with MongoDB. To install the package, run the following code:

install.packages("mongolite") 

Once we have the package installed, we can create a Mongo connection by specifying the collection, database, and MongoDB address:

library(mongolite) 
m <- mongo("students", "test", "mongodb://localhost") 

First, we will create a connection to the local MongoDB instance. Initially, the products collection has no documents:

m$count() 
## [1] 0 

To insert the product with comments, we can directly supply the JSON document as a string to m$insert():

m$insert(' 
{ 
  "code": "A0000001", 
  "name": "Product-A", 
  "type": "Type-I", 
  "price": 29.5, 
  "amount": 500, 
  "comments": [ 
    { 
      "user": "david", 
      "score": 8, 
      "text": "This is a good product" 
    }, 
    { 
      "user": "jenny", 
      "score": 5, 
      "text": "Just so so" 
    } 
  ] 
}') 

Now, the collection has one document:

m$count() 
## [1] 1 

Alternatively, we can use list object in R to represent the same structure. The following code inserts the second product with list:

m$insert(list( 
  code = "A0000002", 
  name = "Product-B", 
  type = "Type-II", 
  price = 59.9, 
  amount = 200L, 
  comments = list( 
    list(user = "tom", score = 6L, 
      text = "Just fine"), 
    list(user = "mike", score = 9L, 
      text = "great product!") 
  ) 
), auto_unbox = TRUE) 

Note that R does not provide a scalar type so that, by default, all vectors are interpreted as JSON arrays in MongoDB, unless auto_unbox = TRUE, which turns one-element vectors into scalars in JSON. Without auto_unbox = TRUE, one has to use either jsonlite::unbox() to ensure scalar output or I() to ensure array output.

Now, the collection has two documents:

m$count() 
## [1] 2 

Then, we can use m$find() to retrieve all documents in the collection, and the results are automatically simplified into a data frame for easier data manipulation:

products <- m$find() 
##  
 Found 2 records... 
 Imported 2 records. Simplifying into dataframe... 
str(products) 
## 'data.frame':    2 obs. of  6 variables: 
##  $ code    : chr  "A0000001" "A0000002" 
##  $ name    : chr  "Product-A" "Product-B" 
##  $ type    : chr  "Type-I" "Type-II" 
##  $ price   : num  29.5 59.9 
##  $ amount  : int  500 200 
##  $ comments:List of 2 
##   ..$ :'data.frame': 2 obs. of  3 variables: 
##   .. ..$ user : chr  "david" "jenny" 
##   .. ..$ score: int  8 5 
##   .. ..$ text : chr  "This is a good product" "Just so so" 
##   ..$ :'data.frame': 2 obs. of  3 variables: 
##   .. ..$ user : chr  "tom" "mike" 
##   .. ..$ score: int  6 9 
##   .. ..$ text : chr  "Just fine" "great product!" 

To avoid the automatic conversion, we can use m$iterate() to iterate over the collection and get list objects that represent the original form of storage:

iter <- m$iterate() 
products <- iter$batch(2) 
str(products) 
## List of 2 
##  $ :List of 6 
##   ..$ code    : chr "A0000001" 
##   ..$ name    : chr "Product-A" 
##   ..$ type    : chr "Type-I" 
##   ..$ price   : num 29.5 
##   ..$ amount  : int 500 
##   ..$ comments:List of 2 
##   .. ..$ :List of 3 
##   .. .. ..$ user : chr "david" 
##   .. .. ..$ score: int 8 
##   .. .. ..$ text : chr "This is a good product" 
##   .. ..$ :List of 3 
##   .. .. ..$ user : chr "jenny" 
##   .. .. ..$ score: int 5 
##   .. .. ..$ text : chr "Just so so" 
##  $ :List of 6 
##   ..$ code    : chr "A0000002" 
##   ..$ name    : chr "Product-B" 
##   ..$ type    : chr "Type-II" 
##   ..$ price   : num 59.9 
##   ..$ amount  : int 200 
##   ..$ comments:List of 2 
##   .. ..$ :List of 3 
##   .. .. ..$ user : chr "tom" 
##   .. .. ..$ score: int 6 
##   .. .. ..$ text : chr "Just fine" 
##   .. ..$ :List of 3 
##   .. .. ..$ user : chr "mike" 
##   .. .. ..$ score: int 9 
##   .. .. ..$ text : chr "great product!" 

To filter the collection, we can specify the conditional query and fields in m$find().

First, we will query documents with code of A0000001 and retrieve the nameprice, and amount fields:

m$find('{ "code": "A0000001" }',  
'{ "_id": 0, "name": 1, "price": 1, "amount": 1 }') 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##        name price amount 
## 1 Product-A  29.5    500 

Then, we will query documents with price greater than or equal to 40, which is done by the $gte operator in the conditional query:

m$find('{ "price": { "$gte": 40 } }', 
'{ "_id": 0, "name": 1, "price": 1, "amount": 1 }') 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##        name price amount 
## 1 Product-B  59.9    200 

We can not only query the document fields, but also the object fields in an array field. The following code retrieves all documents with any comment that gives a 9-point score:

m$find('{ "comments.score": 9 }',  
'{ "_id": 0, "code": 1, "name": 1}') 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##       code      name 
## 1 A0000002 Product-B 

Similarly, the following code retrieves all documents with any comment that gives a score less than 6:

m$find('{ "comments.score": { "$lt": 6 }}', 
'{ "_id": 0, "code": 1, "name": 1}') 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##       code      name 
## 1 A0000001 Product-A 

Note that accessing the field of a subdocument is easily done by the . notation, which makes it pretty easy to work with nested structures:

## [1] TRUE 

The m$insert() function also works with data frames in R. Now, we will create a new MongoDB connection to another collection:

m <- mongo("students", "test", "mongodb://localhost") 

We will create a MongoDB connection, m, to work with the students collection in the test database in a local MongoDB instance:

m$count() 
## [1] 0 

Initially, the collection has no documents. To insert some data, we will create a simple data frame:

students <- data.frame( 
  name = c("David", "Jenny", "Sara", "John"), 
  age = c(25, 23, 26, 23), 
  major = c("Statistics", "Physics", "Computer Science", "Statistics"), 
  projects = c(2, 1, 3, 1), 
  stringsAsFactors = FALSE 
) 
students 
##    name age            major projects 
## 1 David  25       Statistics        2 
## 2 Jenny  23          Physics        1 
## 3  Sara  26 Computer Science        3 
## 4  John  23       Statistics        1 

Then, we will insert the rows as documents into the collection:

m$insert(students) 
##  
Complete! Processed total of 4 rows. 

Now, the collection has some documents:

m$count() 
## [1] 4 

We can retrieve all the documents from the collection using find():

m$find() 
##  
 Found 4 records... 
 Imported 4 records. Simplifying into dataframe... 
##    name age            major projects 
## 1 David  25       Statistics        2 
## 2 Jenny  23          Physics        1 
## 3  Sara  26 Computer Science        3 
## 4  John  23       Statistics        1 

As we mentioned in the previous example, the way in which documents are stored in a MongoDB collection is different from the way columns are stored in a table of a relational database. A document in a MongoDB collection is more like a JSON document, but in fact, it is stored in binary form to achieve super performance and compactness. The m$find() function first retrieves the data in a JSON-like form and simplifies it into a data form for easy data manipulation.

To filter the data, we can specify the query condition by supplying documents to find(). For example, we want to find all documents whose name is Jenny:

m$find('{ "name": "Jenny" }') 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##    name age   major projects 
## 1 Jenny  23 Physics        1 

The results are automatically coerced to a data frame to make it easier to use. Then, we will query all documents with a number of projects greater or equal to 2:

m$find('{ "projects": { "$gte": 2 }}') 
##  
 Found 2 records... 
 Imported 2 records. Simplifying into dataframe... 
##    name age            major projects 
## 1 David  25       Statistics        2 
## 2  Sara  26 Computer Science        3 

To select fields, we will specify the fields argument of find():

m$find('{ "projects": { "$gte": 2 }}',  
'{ "_id": 0, "name": 1, "major": 1 }') 
##  
 Found 2 records... 
 Imported 2 records. Simplifying into dataframe... 
##    name            major 
## 1 David       Statistics 
## 2  Sara Computer Science 

We can also sort the data by specifying the sort argument:

m$find('{ "projects": { "$gte": 2 }}',  
fields ='{ "_id": 0, "name": 1, "age": 1 }', 
sort ='{ "age": -1 }') 
##  
 Found 2 records... 
 Imported 2 records. Simplifying into dataframe... 
##    name age 
## 1  Sara  26 
## 2 David  25 

To limit the documents returned, we will specify limit:

m$find('{ "projects": { "$gte": 2 }}',  
fields ='{ "_id": 0, "name": 1, "age": 1 }', 
sort ='{ "age": -1 }', 
limit =1) 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##   name age 
## 1 Sara  26 

Also, we can get all distinct values of a certain field of all documents:

m$distinct("major") 
## [1] "Statistics"       "Physics"          "Computer Science" 

We can get the distinct values with a condition:

m$distinct("major", '{ "projects": { "$gte": 2 } }') 
## [1] "Statistics"       "Computer Science" 

To update a document, we will call update(), find the documents in selection, and set the values of certain fields:

m$update('{ "name": "Jenny" }', '{ "$set": { "age": 24 } }') 
## [1] TRUE 
m$find() 
##  
 Found 4 records... 
 Imported 4 records. Simplifying into dataframe... 
##    name age            major projects 
## 1 David  25       Statistics        2 
## 2 Jenny  24          Physics        1 
## 3  Sara  26 Computer Science        3 
## 4  John  23       Statistics        1 

Creating and removing indexes

Like relational databases, MongoDB also supports indexes. Each collection may have multiple indexes, and the fields of indexes are cached in memory for fast lookup. Properly created indexes can make document lookup extremely efficient.

Creating indexes in MongoDB with mongolite is easy. It can be done before or after we import data into the collection. However, if we already imported billions of documents, it can be time consuming to create an index. If we create many indexes before pouring any documents into the collection, the performance of inserting documents may be harmed.

Here, we will create an index for the students collection:

m$index('{ "name": 1 }') 
##   v key._id key.name   name            ns 
## 1 1       1       NA   _id_ test.students 
## 2 1      NA        1 name_1 test.students 

Now, if we find a document with the indexed field, the performance is super:

m$find('{ "name": "Sara" }') 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##   name age            major projects 
## 1 Sara  26 Computer Science        3 

If no document satisfies the condition, an empty data frame will be returned:

m$find('{ "name": "Jane" }') 
##  
 Imported 0 records. Simplifying into dataframe... 
## data frame with 0 columns and 0 rows 

Finally, the collection can be abandoned with drop():

m$drop() 
## [1] TRUE 

The performance boost of using an index is definitively not obvious if the amount of data is small. In the next example, we will create a data frame with many rows so that we can compare the performance of finding documents between using an index and not using one.

Here, we will use expand.grid() to create a data frame that exhausts all possible combinations of the provided vectors in the arguments:

set.seed(123) 
m <- mongo("simulation", "test") 
sim_data <- expand.grid( 
type = c("A", "B", "C", "D", "E"),  
category = c("P-1", "P-2", "P-3"), 
group = 1:20000,  
stringsAsFactors = FALSE) 
head(sim_data) 
##   type category group 
## 1    A      P-1     1 
## 2    B      P-1     1 
## 3    C      P-1     1 
## 4    D      P-1     1 
## 5    E      P-1     1 
## 6    A      P-2     1 

The index columns are created. Next, we need to simulate some random numbers:

sim_data$score1 <- rnorm(nrow(sim_data), 10, 3) 
sim_data$test1 <- rbinom(nrow(sim_data), 100, 0.8) 

The data frame now looks like this:

head(sim_data) 
##   type category group    score1 test1 
## 1    A      P-1     1  8.318573    80 
## 2    B      P-1     1  9.309468    75 
## 3    C      P-1     1 14.676125    77 
## 4    D      P-1     1 10.211525    79 
## 5    E      P-1     1 10.387863    80 
## 6    A      P-2     1 15.145195    76 

Then, we will insert all the data into the simulation collection:

m$insert(sim_data) 
Complete! Processed total of 300000 rows. 
[1] TRUE 

The first test is trying to answer how long it takes to query a document without any index:

system.time(rec <- m$find('{ "type": "C", "category": "P-3", "group": 87 }')) 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##    user  system elapsed  
##   0.000   0.000   0.104 
rec 
##   type category  group   score1 test1 
## 1    C      P-3    87  6.556688    72 

The second test is about the performance of finding documents with joint conditions:

system.time({ 
  recs <- m$find('{ "type": { "$in": ["B", "D"]  },  
    "category": { "$in": ["P-1", "P-2"] },  
    "group": { "$gte": 25, "$lte": 75 } }') 
}) 
##  
Found 204 records... 
 Imported 204 records. Simplifying into dataframe... 
##    user  system elapsed  
##   0.004   0.000   0.094 

Then, the resulting data frame looks like this:

head(recs) 
##   type category group    score1 test1 
## 1    B      P-1    25 11.953580    80 
## 2    D      P-1    25 13.074020    84 
## 3    B      P-2    25 11.134503    76 
## 4    D      P-2    25 12.570769    74 
## 5    B      P-1    26  7.009658    77 
## 6    D      P-1    26  9.957078    85 

The third test is about the performance of finding documents using a non-index field:

system.time(recs2 <- m$find('{ "score1": { "$gte": 20 } }')) 
##  
Found 158 records... 
 Imported 158 records. Simplifying into dataframe... 
##    user  system elapsed  
##   0.000   0.000   0.096 

The resulting data frame looks like this:

head(recs2) 
##   type category group   score1 test1 
## 1    D      P-1    89 20.17111    76 
## 2    B      P-3   199 20.26328    80 
## 3    E      P-2   294 20.33798    75 
## 4    E      P-2   400 21.14716    83 
## 5    A      P-3   544 21.54330    73 
## 6    A      P-1   545 20.19368    80 

All three tests are done without creating an index for the collection. To make a contrast, we will now create an index:

m$index('{ "type": 1, "category": 1, "group": 1 }') 
##   v key._id key.type key.category key.group 
## 1 1       1       NA           NA        NA 
## 2 1      NA        1            1         1 
##                        name              ns 
## 1                      _id_ test.simulation 
## 2 type_1_category_1_group_1 test.simulation 

Once the index is created, the query of the first test with index fields is quick:

system.time({ 
  rec <- m$find('{ "type": "C", "category": "P-3", "group": 87 }') 
}) 
##  
 Found 1 records... 
 Imported 1 records. Simplifying into dataframe... 
##    user  system elapsed  
##   0.000   0.000   0.001 

The second test also yields results quickly:

system.time({ 
  recs <- m$find('{ "type": { "$in": ["B", "D"]  },  
    "category": { "$in": ["P-1", "P-2"] },  
    "group": { "$gte": 25, "$lte": 75 } }') 
}) 
##  
 Found 204 records... 
 Imported 204 records. Simplifying into dataframe... 
##    user  system elapsed  
##   0.000   0.000   0.002 

However, the non-index fields do not contribute to the index search for documents:

system.time({ 
  recs2 <- m$find('{ "score1": { "$gte": 20 } }') 
}) 
##  
 Found 158 records... 
 Imported 158 records. Simplifying into dataframe... 
##    user  system elapsed  
##   0.000   0.000   0.095 

Another important feature of MongoDB is its aggregation pipeline. When we aggregate data, we supply an array of aggregate operations so that they are scheduled by the MongoDB instance. For example, the following code groups the data by type. Each group has a field count, average score, min test score, and max test score. Since the output can be long, we don't print it here. You may execute the code yourself and see the results:

m$aggregate('[ 
  { "$group": {  
      "_id": "$type",  
      "count": { "$sum": 1 }, 
      "avg_score": { "$avg": "$score1" }, 
      "min_test": { "$min": "$test1" }, 
      "max_test": { "$max": "$test1" } 
    } 
  } 
]') 

We can also use multiple fields as the key of a group, which is similar to group by A, B in SQL:

m$aggregate('[ 
  { "$group": {  
      "_id": { "type": "$type", "category": "$category" },  
      "count": { "$sum": 1 }, 
      "avg_score": { "$avg": "$score1" }, 
      "min_test": { "$min": "$test1" }, 
      "max_test": { "$max": "$test1" } 
    } 
  } 
]') 

The aggregation pipeline supports running aggregate operations in a streamline:

m$aggregate('[ 
  { "$group": {  
      "_id": { "type": "$type", "category": "$category" },  
      "count": { "$sum": 1 }, 
      "avg_score": { "$avg": "$score1" }, 
      "min_test": { "$min": "$test1" }, 
      "max_test": { "$max": "$test1" } 
    } 
  },  
  { 
    "$sort": { "_id.type": 1, "avg_score": -1 } 
  } 
]') 

We can lengthen the pipeline by adding more operations. For example, the following code creates groups and aggregate data. Then, it sorts the documents with average score in the descending order, takes out the top three documents, and projects the fields into something useful:

m$aggregate('[ 
  { "$group": {  
      "_id": { "type": "$type", "category": "$category" },  
      "count": { "$sum": 1 }, 
      "avg_score": { "$avg": "$score1" }, 
      "min_test": { "$min": "$test1" }, 
      "max_test": { "$max": "$test1" } 
    } 
  },  
  { 
    "$sort": { "avg_score": -1 } 
  },  
  { 
    "$limit": 3 
  },  
  { 
    "$project": {  
      "_id.type": 1,  
      "_id.category": 1,  
      "avg_score": 1,  
      "test_range": { "$subtract": ["$max_test", "$min_test"] } 
    } 
  } 
]') 

In addition to the aggregate operators we used in the example, there are many other operators that are more powerful. For more details, visit https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/ and https://docs.mongodb.com/manual/reference/operator/aggregation-arithmetic/.

Another important feature of MongoDB is that it supports MapReduce (https://en.wikipedia.org/wiki/MapReduce) at an internal level. The MapReduce model is widely used in big data analytics in distributed clusters. In our environment, we can write an extremely simple MapReduce code that tries to produce a histogram of certain data:

bins <- m$mapreduce( 
map = 'function() { 
    emit(Math.floor(this.score1 / 2.5) * 2.5, 1); 
  }', 
reduce = 'function(id, counts) { 
    return Array.sum(counts); 
  }' 
) 

The first step of MapReduce is map. In this step, all values are mapped to a key-value pair. Then, the reduce step aggregates the key-value pair. In the preceding example, we simply calculated the number of records for each bin:

bins 
##     _id  value 
## 1  -5.0     6 
## 2  -2.5   126 
## 3   0.0  1747 
## 4   2.5 12476 
## 5   5.0 46248 
## 6   7.5 89086 
## 7  10.0 89489 
## 8  12.5 46357 
## 9  15.0 12603 
## 10 17.5  1704 
## 11 20.0   153 
## 12 22.5     5 

We can also create a bar plot from bins:

with(bins, barplot(value /sum(value), names.arg = `_id`, 
main = "Histogram of scores",  
xlab = "score1", ylab = "Percentage")) 

The plot generated is shown as follows:

Creating and removing indexes

If the collection is no longer used, then we can use the drop() function to drop it:

m$drop() 
## [1] TRUE 

Since this section is at the introductory level, the more advanced use of MongoDB is beyond the scope of this book. If you are interested in MongoDB, go through the official tutorial at https://docs.mongodb.com/manual/tutorial/.

Using Redis

Redis (http://redis.io/), unlike SQLite that stores data in tabular form or MongoDB that allows to store and query nested structures, is an in-memory data structure store. It stores key-values in memory and thus has very high performance of key lookup. However, it does not support query languages as used in SQL databases or MongoDB.

Redis is usually used as a high-performance data cache. We can store and manipulate a range of basic data structures in it. To install Redis, visit http://redis.io/download. Unfortunately, the Windows operating system is not officially supported, but the Microsoft Open Tech group develops and maintains a Win64 port of Redis at https://github.com/MSOpenTech/redis.

While SQL database stores tables and MongoDB stores documents, Redis stores key-value pairs as follows:

name: Something 
type: 1 
grade: A 

The value can be more complex data structures (for example, hashmap, set, and sorted set) rather than simple values, and Redis provides a simple interface to work with these data structures in high performance and low latency.

Accessing Redis from R

To access a Redis instance from R, we can use the rredis package that provides simple functions to work with Redis. To install the package, run the following code:

install.packages("rredis") 

Once the package is ready, we can connect to a Redis instance:

library(rredis) 
redisConnect() 

If we leave the arguments blank, it connects to the local Redis instance by default. It also lets us connect to a remote instance.

Setting and getting values from the Redis server

The most basic use of Redis is to store a value by calling redisSet(key, value). In R, the value is, by default, serialized so that we can store any R objects in Redis:

redisSet("num1", 100) 
## [1] "OK" 

Now that the command has succeeded, we can retrieve the value with the same key:

redisGet("num1") 
## [1] 100 

We can store an integer vector:

redisSet("vec1", 1:5) 
## [1] "OK" 
redisGet("vec1") 
## [1] 1 2 3 4 5 

We can even store a data frame:

redisSet("mtcars_head", head(mtcars, 3)) 
## [1] "OK" 
redisGet("mtcars_head") 
##                mpg cyl disp  hp drat    wt  qsec vs am gear 
## Mazda RX4     21.0   6  160 110 3.90 2.620 16.46  0  1    4 
## Mazda RX4 Wag 21.0   6  160 110 3.90 2.875 17.02  0  1    4 
## Datsun 710    22.8   4  108  93 3.85 2.320 18.61  1  1    4 
##               carb 
## Mazda RX4        4 
## Mazda RX4 Wag    4 
## Datsun 710       1 

In fact, if other computers have access to your Redis instance, they will get the same data in R using redisGet():

However, we can only get NULL if the key does not exist at all:

redisGet("something") 
## NULL 

Instead of getting NULL, we can use redisExists() to detect whether a key is defined:

redisExists("something") 
## [1] FALSE 
redisExists("num1") 
## [1] TRUE 

If we no longer need a key, we can delete it with redisDelete():

redisDelete("num1") 
## [1] "1" 
## attr(,"redis string value") 
## [1] TRUE 
redisExists("num1") 
## [1] FALSE 

In addition to plain key-value pairs, Redis also supports more advanced data structures. For example, we can use redisHSet() to create a hash map of fruits in which different fruits have different numbers:

redisHSet("fruits", "apple", 5) 
## [1] "1" 
## attr(,"redis string value") 
## [1] TRUE 
redisHSet("fruits", "pear", 2) 
## [1] "1" 
## attr(,"redis string value") 
## [1] TRUE 
redisHSet("fruits", "banana", 9) 
## [1] "1" 
## attr(,"redis string value") 
## [1] TRUE 

We can call redisHGet() to get the value of a field of a hash map:

redisHGet("fruits", "banana") 
## [1] 9 

We can also get a list to represent the structure of the hash map:

redisHGetAll("fruits") 
## $apple 
## [1] 5 
##  
## $pear 
## [1] 2 
##  
## $banana 
## [1] 9 

Alternatively, we can get the keys of the hash map:

redisHKeys("fruits") 
## [[1]] 
## [1] "apple" 
## attr(,"redis string value") 
## [1] TRUE 
##  
## [[2]] 
## [1] "pear" 
## attr(,"redis string value") 
## [1] TRUE 
##  
## [[3]] 
## [1] "banana" 
## attr(,"redis string value") 
## [1] TRUE 

We can also get only the values of the hash map:

redisHVals("fruits") 
## [[1]] 
## [1] 5 
##  
## [[2]] 
## [1] 2 
##  
## [[3]] 
## [1] 9 

Aditionally, we can simply get the number of fields in the hash map:

redisHLen("fruits") 
## [1] "3" 
## attr(,"redis string value") 
## [1] TRUE 

We can get the values of multiple fields at once:

redisHMGet("fruits", c("apple", "banana")) 
## $apple 
## [1] 5 
##  
## $banana 
## [1] 9 

We can also set the values of multiple fields by supplying a list:

redisHMSet("fruits", list(apple = 4, pear = 1)) 
## [1] "OK" 

Now, the values of the fields are updated:

redisHGetAll("fruits") 
## $apple 
## [1] 4 
##  
## $pear 
## [1] 1 
##  
## $banana 
## [1] 9 

In addition to the hash map, Redis also supports queue. We can push values from either left-hand side or the right-hand side of the queue. For example, we push integers from 1 to 3 from the right-hand side of a queue:

for (qi in 1:3) { 
  redisRPush("queue", qi)   
} 

We can get the current length of the queue with redisLLen():

redisLLen("queue") 
## [1] "3" 
## attr(,"redis string value") 
## [1] TRUE 

Now, the queue has three elements. Note that the value is a character vector rather than an integer. Therefore, we need to convert it if we need to use it as a number in other places.

Then, we can keep popping values from the left-hand side of the queue:

redisLPop("queue") 
## [1] 1 
redisLPop("queue") 
## [1] 2 
redisLPop("queue") 
## [1] 3 
redisLPop("queue") 
## NULL 

Note that the queue only has three elements to pop out. The fourth attempt returns NULL, which can be a criterion to check whether the queue is empty.

Finally, we should close the connection to Redis to release all resources:

redisClose() 

Redis has more advanced features that are beyond the scope of this chapter. It supports not only data structure store, but also message broker, that is, we can use it to pass messages between different programs. For more advanced usage, read the official documentation at http://redis.io/documentation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset