Interacting with data in MongoDB

Many applications require more robust storage systems then text files, which is why many applications use databases to store data. There are many kinds of databases, but there are two broad categories: relational databases, which support a standard declarative language called SQL, and so called NoSQL databases, which are often able to work without a predefined schema and where a data instance is more properly described as a document, rather as a row.

MongoDB is a kind of NoSQL database that stores data as documents, which are grouped together in collections. Documents are expressed as JSON objects. It is fast and scalable in storing, and also flexible in querying, data. To use MongoDB in Python, we need to import the pymongo package and open a connection to the database by passing a hostname and port. We suppose that we have a MongoDB instance, running on the default host (localhost) and port (27017):

>>> import pymongo
>>> conn = pymongo.MongoClient(host='localhost', port=27017)

If we do not put any parameters into the pymongo.MongoClient() function, it will automatically use the default host and port.

In the next step, we will interact with databases inside the MongoDB instance. We can list all databases that are available in the instance:

>>> conn.database_names()
['local']
>>> lc = conn.local
>>> lc
Database(MongoClient('localhost', 27017), 'local')

The preceding snippet says that our MongoDB instance only has one database, named 'local'. If the databases and collections we point to do not exist, MongoDB will create them as necessary:

>>> db = conn.db
>>> db
Database(MongoClient('localhost', 27017), 'db')

Each database contains groups of documents, called collections. We can understand them as tables in a relational database. To list all existing collections in a database, we use collection_names() function:

>>> lc.collection_names()
['startup_log', 'system.indexes']
>>> db.collection_names()
[]

Our db database does not have any collections yet. Let's create a collection, named person, and insert data from a DataFrame object to it:

>>> collection = db.person
>>> collection
Collection(Database(MongoClient('localhost', 27017), 'db'), 'person')
>>> # insert df_ex2 DataFrame into created collection
>>> import json
>>> records = json.load(df_ex2.T.to_json()).values()
>>> records
dict_values([{'2': 3, '3': 'male', '1': 39, '4': 'vl', '0': 'Vinh'}, {'2': 3, '3': 'male', '1': 26, '4': 'dn', '0': 'Nghia'}, {'2': 4, '3': 'female', '1': 28, '4': 'dn', '0': 'Hong'}, {'2': 3, '3': 'female', '1': 25, '4': 'hn', '0': 'Lan'}, {'2': 3, '3': 'male', '1': 42, '4': 'tn', '0': 'Hung'}, {'2': 1, '3':'male', '1': 7, '4': 'hcm', '0': 'Nam'}, {'2': 1, '3': 'female', '1': 11, '4': 'hcm', '0': 'Mai'}])
>>> collection.insert(records)
[ObjectId('557da218f21c761d7c176a40'),
 ObjectId('557da218f21c761d7c176a41'),
 ObjectId('557da218f21c761d7c176a42'),
 ObjectId('557da218f21c761d7c176a43'),
 ObjectId('557da218f21c761d7c176a44'),
 ObjectId('557da218f21c761d7c176a45'),
 ObjectId('557da218f21c761d7c176a46')]

The df_ex2 is transposed and converted to a JSON string before loading into a dictionary. The insert() function receives our created dictionary from df_ex2 and saves it to the collection.

If we want to list all data inside the collection, we can execute the following commands:

>>> for cur in collection.find():
>>>     print(cur)
{'4': 'vl', '2': 3, '3': 'male', '1': 39, '_id': ObjectId('557da218f21c761d7c176
a40'), '0': 'Vinh'}
{'4': 'dn', '2': 3, '3': 'male', '1': 26, '_id': ObjectId('557da218f21c761d7c176
a41'), '0': 'Nghia'}
{'4': 'dn', '2': 4, '3': 'female', '1': 28, '_id': ObjectId('557da218f21c761d7c1
76a42'), '0': 'Hong'}
{'4': 'hn', '2': 3, '3': 'female', '1': 25, '_id': ObjectId('557da218f21c761d7c1
76a43'), '0': 'Lan'}
{'4': 'tn', '2': 3, '3': 'male', '1': 42, '_id': ObjectId('557da218f21c761d7c176
a44'), '0': 'Hung'}
{'4': 'hcm', '2': 1, '3': 'male', '1': 7, '_id': ObjectId('557da218f21c761d7c176
a45'), '0': 'Nam'}
{'4': 'hcm', '2': 1, '3': 'female', '1': 11, '_id': ObjectId('557da218f21c761d7c
176a46'), '0': 'Mai'}

If we want to query data from the created collection with some conditions, we can use the find() function and pass in a dictionary describing the documents we want to retrieve. The returned result is a cursor type, which supports the iterator protocol:

>>> cur = collection.find({'3' : 'male'})
>>> type(cur)
pymongo.cursor.Cursor
>>> result = pd.DataFrame(list(cur))
>>> result
       0   1  2     3    4                       _id
0   Vinh  39  3  male   vl  557da218f21c761d7c176a40
1  Nghia  26  3  male   dn  557da218f21c761d7c176a41
2   Hung  42  3  male   tn  557da218f21c761d7c176a44
3    Nam   7  1  male  hcm  557da218f21c761d7c176a45

Sometimes, we want to delete data in MongdoDB. All we need to do is to pass a query to the remove() method on the collection:

>>> # before removing data
>>> pd.DataFrame(list(collection.find()))
       0   1  2       3    4                       _id
0   Vinh  39  3    male   vl  557da218f21c761d7c176a40
1  Nghia  26  3    male   dn  557da218f21c761d7c176a41
2   Hong  28  4  female   dn  557da218f21c761d7c176a42
3    Lan  25  3  female   hn  557da218f21c761d7c176a43
4   Hung  42  3    male   tn  557da218f21c761d7c176a44
5    Nam   7  1    male  hcm  557da218f21c761d7c176a45
6    Mai  11  1  female  hcm  557da218f21c761d7c176a46

>>> # after removing records which have '2' column as 1 and '3' column as 'male'
>>> collection.remove({'2': 1, '3': 'male'})
{'n': 1, 'ok': 1}
>>> cur_all = collection.find();
>>> pd.DataFrame(list(cur_all))
       0   1  2       3    4                       _id
0   Vinh  39  3    male   vl  557da218f21c761d7c176a40
1  Nghia  26  3    male   dn  557da218f21c761d7c176a41
2   Hong  28  4  female   dn  557da218f21c761d7c176a42
3    Lan  25  3  female   hn  557da218f21c761d7c176a43
4   Hung  42  3    male   tn  557da218f21c761d7c176a44
5    Mai  11  1  female  hcm  557da218f21c761d7c176a46

We learned step by step how to insert, query and delete data in a collection. Now, we will show how to update existing data in a collection in MongoDB:

>>> doc = collection.find_one({'1' : 42})
>>> doc['4'] = 'hcm'
>>> collection.save(doc)
ObjectId('557da218f21c761d7c176a44')
>>> pd.DataFrame(list(collection.find()))
       0   1  2       3    4                       _id
0   Vinh  39  3    male   vl  557da218f21c761d7c176a40
1  Nghia  26  3    male   dn  557da218f21c761d7c176a41
2   Hong  28  4  female   dn  557da218f21c761d7c176a42
3    Lan  25  3  female   hn  557da218f21c761d7c176a43
4   Hung  42  3    male  hcm  557da218f21c761d7c176a44
5    Mai  11  1  female  hcm  557da218f21c761d7c176a46

The following table shows methods that provide shortcuts to manipulate documents in MongoDB:

Update Method

Description

inc()

Increment a numeric field

set()

Set certain fields to new values

unset()

Remove a field from the document

push()

Append a value onto an array in the document

pushAll()

Append several values onto an array in the document

addToSet()

Add a value to an array, only if it does not exist

pop()

Remove the last value of an array

pull()

Remove all occurrences of a value from an array

pullAll()

Remove all occurrences of any set of values from an array

rename()

Rename a field

bit()

Update a value by bitwise operation

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset