In the Storing large data in MongoDB using GridFS recipe, we saw what GridFS is and how it can be used to store large files in MongoDB. In the previous recipe, we saw how to use GridFS API from a Java client. In this recipe, we will see how to store image data into MongoDB using GridFS from a Python program.
Refer to the Connecting to a single node from a Java client recipe from Chapter 1, Installing and Starting the MongoDB Server, for all the necessary setup for this recipe. If you are interested in more details on Python drivers, refer to the following recipes in Chapter 3, Programming Language Drivers:
Download and save the glimpse_of_universe-wide.jpg
image file from the downloadable code bundle, available on the book's website, to the local filesystem, as we did in the previous recipe.
glimpse_of_universe-wide.jpg
is placed):$ python
>>> import pymongo >>> import gridfs
MongoClient
and database object to the test
database as follows:>>> client = pymongo.MongoClient('mongodb://localhost:27017') >>> db = client.test
>>> db.fs.files.drop() >>> db.fs.chunks.drop()
>>> fs = gridfs.GridFS(db)
file
object as follows:>>> file = open('glimpse_of_universe-wide.jpg', 'rb')
>>> fs.put(file, filename='universe.jpg')
put
command, we should see ObjectId
for the file uploaded. This would be same as the _id
field of the fs.files
collection for this file.dict
object with the details of the upload. Verify the contents and cross-check by executing the following query:>>> db.fs.files.find_one()
GridOut
instance representing the object, to read the data out of GridFS as follows:>>> gout = fs.get_last_version('universe.jpg')
>>> fout = open('universe.jpg', 'wb')
>>> fout.write(gout.read()) >>> fout.close() >>> gout.close()
universe.jpg
will be created with the same number of bytes as the source present in it. Verify it by opening it in an image viewer.Let us look in detail at the steps we executed. In the Python shell, we import two packages, pymongo
and gridfs
, and instantiate the pymongo.MongoClient
and gridfs.GridFS
instances. The constructor of the gridfs.GridFS
class takes an argument, which is the instance of pymongo.Database
.
We open a file in binary mode using the open
function and pass the file object to the GridFS's put
method. There is an additional argument passed, called filename
, which is the name of the file put into GridFS. The first parameter, in fact, need not be a file
object, but any object with a read method defined.
Once the put
operation succeeds, the return value is an ObjectId
for the uploaded document in the fs.files
collection. A query on fs.files
can confirm that the file is uploaded. Verify that the size of the data uploaded matches the size of the file.
Our next objective is to get the file from GridFS on to the local filesystem. Intuitively, one would imagine that if the method to put a file in GridFS is put
, then the method to get the file would be get
. True, the method is indeed get
. However, it will get only based on the ObjectId
, which was returned by the put
method. So if you are ok to get by ObjectId
, the method for you is get
. However, if you want to get by the filename, the method to use is get_last_version
, which accepts the name of the file that we uploaded, and the return type of this method is gridfs.gridfs_file.GridOut
. This class contains the method read, which will read out all the bytes from the uploaded file to GridFS. We open a file called universe.jpg
to write in binary mode and write all the bytes read from the GridOut
object.