Creating customizable Excel reports using XlsxWriter

While creating an Excel file from a Pandas DataFrame is super easy, you cannot easily customize it. In this recipe, you will learn how to use the XlsxWriter Python library to create a highly customizable report in Excel.

How to do it…

  1. First, import the Python libraries that you need:
    from pymongo import MongoClient
    import pandas as pd
    from time import strftime
  2. Next, create a connection to MongoDB, and specify the accidents collection:
    client = MongoClient('localhost', 27017)
    db = client.pythonbicookbook
    collection = db.accidents
  3. Once you have created the connection, run a query to retrieve the first 1000 records in which an accident happened on a Friday:
    data = collection.find({"Day_of_Week": 6}).limit(1000)
  4. Next, define the name of the report and where it should be saved to, and set up the Excel workbook:
    report_file_name = "accident_data_report_{}.xlsx".format(strftime("%m_%d_%y"))
    report_file = PATH_TO_FILE + "/" + report_file_name
    accident_report = xlsxwriter.Workbook(report_file, {'constant_memory': True,
                                                        'default_date_format': 'mm/dd/yy'})
  5. After that, add some formats to the Excel file:
    xl_header_format = accident_report.add_format()
    xl_header_format.set_bold()
    xl_missing_format = accident_report.add_format()
    xl_missing_format.set_bg_color('red')
  6. Next, create the iterators that we'll need to loop through the data:
    e_row = 0
    e_col = 0
  7. After that, create a worksheet to be added to the notebook:
    worksheet = accident_report.add_worksheet('Accidents')
  8. Next, get the keys from the collection to use as the header of the file:
    headers = []
    doc = collection.find_one()
    for key in doc:
        headers.append(key)
  9. Now, delete the _id column:
    headers.remove('_id')
  10. Add the sheet header:
    i = 0
    for header in headers:
        worksheet.write(e_row, e_col + i, headers[i], xl_header_format)
        i += 1
  11. Next, add one row so that when we start adding the data we start at the next row in the spreadsheet:
    e_row += 1
  12. Now add the data to the workbook:
    for doc in data:
        e_col = 0
        for value in headers:
            worksheet.write(e_row, e_col, doc[value])
            e_col += 1
        e_row += 1
  13. Finally, close the file:
    accident_report.close()

How it works…

The first thing that we need to do is import all the Python libraries that we need. In this recipe, we're retrieving the accident data from MongoDB that we imported in Chapter 2, Making Your Data All It Can Be:

from pymongo import MongoClient
from time import strftime
import xlsxwriter

In order to retrieve data from MongoDB, we need to create a connection, specify the database holding the data that we want, and specify the collection that we want to query against:

client = MongoClient('localhost', 27017)
db = client.pythonbicookbook
collection = db.accidents

With the connection created, we run a query to retrieve the first 1000 records where an accident happened on a Friday. You can easily increase this limit if you like. I've limited it to 1000 records for this recipe in order to keep the code short:

data = collection.find({"Day_of_Week": 6}).limit(1000)

The next thing we need to do is set up our Excel workbook. To do that, we use the Workbook function of the XlsxWriter library, and pass it the full path to the file. In addition, we set the constant_memory parameter to True so that our computer doesn't run out of memory. We also set the default date format:

report_file_name = "accident_data_report_{}.xlsx".format(strftime("%m_%d_%y"))
report_file = PATH_TO_FILE + "/" + report_file_name
accident_report = xlsxwriter.Workbook(report_file,
                  {'constant_memory': True,'default_date_format': 'mm/dd/yy'})

Customization of the Excel file is done using formatting. You can create as many formats as you want for everything from bold to italic to font colors and more. To do that, we create a new variable for our format, and use the add_format() method of XlsxWriter to add it to the workbook. There are a number of shortcuts for some formats, one of which we use here—set_bold(). The second one that we create, if used, sets the background color of a cell to red:

xl_header_format = accident_report.add_format()
xl_header_format.set_bold()
xl_missing_format = accident_report.add_format()
xl_missing_format.set_bg_color('red')

In order to specify the cell we want to put the data into, we create a variable that will keep track of the row and column positions for us:

e_row = 0
e_col = 0

Once our workbook is set up, we need to add a worksheet. We do that by using the add_worksheet function and specifying a name for it. If you don't specify a name, it'll be called Sheet 1 as Excel tends to do:

worksheet = accident_report.add_worksheet('Accidents')

To create the header, we retrieve a single record from the collection in MongoDB and loop through it to get the keys, putting them into the headers array. Doing it this way makes the code dynamic and highly reusable:

headers = []
doc = collection.find_one()
for key in doc:
    headers.append(key)

As in the previous recipe, we need to delete the _id column because we cannot write it to the Excel file. We do that by simply removing the value from our headers array:

headers.remove('_id')

Next, we add the headers to the sheet by looping through the array and writing the values into the cells. We do this using the write() function provided by the worksheet, and specify the row and column to write the value into. Since all the data is going into one row, we don't need to increment the row count, but we need to do that for the column. In addition, we assign the xl_header_format variable that we created earlier:

i = 0
for header in headers:
    worksheet.write(e_row, e_col + i, headers[i], xl_header_format)
i += 1

Next we add one row to our row incrementer so that when we start adding the data, we start at the second row in the spreadsheet. If we didn't do this, our data would overwrite our header:

e_row += 1

To add the data to the workbook, we loop through it. First, we set the column index to zero so that we start at the beginning of the row. Next, we loop through our headers array, use the value to extract the value in the document, and write it into the cell. We do it in this way because there is no guarantee that the keys of the data would be ordered in the same way they were ordered when we retrieved the single row to create our headers. In fact, they will most likely NOT be in the same order, so we need to account for that. As we loop through our data, we increment both our row and column counters; so, we write our data column by column and row by row:

for doc in data:
    e_col = 0
    for value in headers:
        worksheet.write(e_row, e_col, doc[value])
        e_col += 1
    e_row += 1

Lastly, we close the file using the close() function which both writes and saves it. Until you call this method, you won't see the file:

accident_report.close()

That's it!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset