In the recipe Storing large data in Mongo using GridFS, we saw what GridFS is and how it could be used to store the large files in MongoDB. In the previous recipe, we saw to use GridFS API from a Java client. In this recipe, we will see how to store image data into MongoDB using GridFS from a Python program.
Refer to the recipe Connecting to the single node using a Java client from Chapter 1, Installing and Starting the Server, for all the necessary setup for this recipe. If you are interested in more detail on Python drivers refer to the following recipes: Executing query and insert operations with PyMongo and Executing update and delete operations using PyMongo in Chapter 3, Programming Language Drivers. Download and save the image glimpse_of_universe-wide.jpg
from the downloadable bundle available with the book from the Packt site to local filesystem as we did in the previous recipe.
glimpse_of_universe-wide.jpg
is placed:$ python
>>>import pymongo >>>import gridfs
MongoClient
and a database object to the test database as follows:>>>client = pymongo.MongoClient('mongodb://localhost:27017') >>>db = client.test
>>> db.fs.files.drop() >>> db.fs.chunks.drop()
>>>fs = gridfs.GridFS(db)
>>>file = open('glimpse_of_universe-wide.jpg', 'rb')
>>>fs.put(file, filename='universe.jpg')
put
, we should see the ObjectID for the file uploaded. This would be same as the _id
field of the fs.files
collection for this file.dict
object with the details of the upload. Verify the contents>>> db.fs.files.find_one()
GridOut
instance representing the object to read the data out of GridFS as follows:>>> gout = fs.get_last_version('universe.jpg')
>>> fout = open('universe.jpg', 'wb')
>>>fout.write(gout.read()) >>>fout.close() >>>gout.close()
universe.jpg
will be created with same number of bytes as the source present in it. Verify it by opening it in an image viewer.Let's look at the steps we executed. In the Python shell, we import two packages, pymongo
and gridfs
, and instantiate the pymongo.MongoClient
and gridfs.GridFS
instances. The constructor of the class gridfs.GridFS
takes on an argument, which is the instance of pymongo.Database
.
We open a file in binary mode using the open
function and pass the file object to the GridFS put
method. There is an additional argument called filename
passed, which would be the name of the file put into GridFS. The first parameter need not be a file object but any object with a read
method defined.
Once the put
operation succeeds, the return
value is an ObjectID for the uploaded document in fs.files
collection. A query on fs.files
can confirm that the file is uploaded. Verify that the size of the data uploaded matches the size of the file.
Our next objective is to get the file from GridFS on to the local filesystem. Intuitively, one would imagine if the method to put a file in GridFS is put
, then the method to get a file would be get
. True, the method is indeed get
, however, it will get only based on the ObjectId
that was returned by the put
method. So, if you are okay to fetch by ObjectId
, get
is the method for you. However, if you want to get by the filename, the method to use is get_last_version
. It accepts the name of the filename that we uploaded and the return type of this method is of type gridfs.gridfs_file.GridOut
. This class contains the method read
, which will read out all the bytes from the uploaded file to GridFS. We open a file called universe.jpg
for writing in binary mode and write all the bytes read from the GridOut
object.