Storing data to GridFS from Python client

In the recipe Storing large data in Mongo using GridFS, we saw what GridFS is and how it could be used to store the large files in MongoDB. In the previous recipe, we saw to use GridFS API from a Java client. In this recipe, we will see how to store image data into MongoDB using GridFS from a Python program.

Getting ready

Refer to the recipe Connecting to the single node using a Java client from Chapter 1, Installing and Starting the Server, for all the necessary setup for this recipe. If you are interested in more detail on Python drivers refer to the following recipes: Executing query and insert operations with PyMongo and Executing update and delete operations using PyMongo in Chapter 3, Programming Language Drivers. Download and save the image glimpse_of_universe-wide.jpg from the downloadable bundle available with the book from the Packt site to local filesystem as we did in the previous recipe.

How to do it…

  1. Open a Python interpreter by typing in the following in the operating system shell. Note that the current directory is same as the directory where the image file glimpse_of_universe-wide.jpg is placed:
    $ python
    
  2. Import the required packages as follows:
    >>>import pymongo
    >>>import gridfs
    
  3. Once the Python shell is opened, create a MongoClient and a database object to the test database as follows:
    >>>client = pymongo.MongoClient('mongodb://localhost:27017')
    >>>db = client.test
    
  4. To clear the GridFS-related collections execute the following:
    >>> db.fs.files.drop()
    >>> db.fs.chunks.drop()
    
  5. Create the instance of GridFS as follows:
    >>>fs = gridfs.GridFS(db)
    
  6. Now, we will read the file and upload its contents to GridFS. First, create the file object as follows:
    >>>file = open('glimpse_of_universe-wide.jpg', 'rb')
    
  7. Now put the file into GridFS as follows
    >>>fs.put(file, filename='universe.jpg')
    
  8. On successfully executing put, we should see the ObjectID for the file uploaded. This would be same as the _id field of the fs.files collection for this file.
  9. Execute the following query from the Python shell. It should print out the dict object with the details of the upload. Verify the contents
    >>> db.fs.files.find_one()
    
  10. Now, we will get the uploaded content and write it to a file on the local filesystem. Let's get the GridOut instance representing the object to read the data out of GridFS as follows:
    >>> gout = fs.get_last_version('universe.jpg')
    
  11. With this instance available, let's write the data to the file to a file on local filesystem as follows. First, open a handle to the file on local filesystem to write to as follows:
    >>> fout = open('universe.jpg', 'wb')
    
  12. We will then write content to it as follows:
    >>>fout.write(gout.read())
    >>>fout.close()
    >>>gout.close()
    
  13. Now verify the file on the current directory on the local filesystem. A new file called universe.jpg will be created with same number of bytes as the source present in it. Verify it by opening it in an image viewer.

How it works…

Let's look at the steps we executed. In the Python shell, we import two packages, pymongo and gridfs, and instantiate the pymongo.MongoClient and gridfs.GridFS instances. The constructor of the class gridfs.GridFS takes on an argument, which is the instance of pymongo.Database.

We open a file in binary mode using the open function and pass the file object to the GridFS put method. There is an additional argument called filename passed, which would be the name of the file put into GridFS. The first parameter need not be a file object but any object with a read method defined.

Once the put operation succeeds, the return value is an ObjectID for the uploaded document in fs.files collection. A query on fs.files can confirm that the file is uploaded. Verify that the size of the data uploaded matches the size of the file.

Our next objective is to get the file from GridFS on to the local filesystem. Intuitively, one would imagine if the method to put a file in GridFS is put, then the method to get a file would be get. True, the method is indeed get, however, it will get only based on the ObjectId that was returned by the put method. So, if you are okay to fetch by ObjectId, get is the method for you. However, if you want to get by the filename, the method to use is get_last_version. It accepts the name of the filename that we uploaded and the return type of this method is of type gridfs.gridfs_file.GridOut. This class contains the method read, which will read out all the bytes from the uploaded file to GridFS. We open a file called universe.jpg for writing in binary mode and write all the bytes read from the GridOut object.

See also

You can refer to the following recipes:

  • Storing binary data in Mongo
  • Storing data to GridFS from Java client
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset