Syncing your Jupyter Notebook Charts to Microsoft Word Reports

Here’s the situation: You’re doing a big data analysis in your Jupyter Notebook. You’ve got tons of charts and you want to report on them. Ideally, you’d create your final report in the Jupyter notebook itself, with all its fancy markdown features and the ability to keep your code and reporting all in the same place. But here’s the rub: most people still want Word document reports, and don’t care about your code, reproducibility, etc. When reporting it’s important to give people the information in a format most useful to them.

So you’ve got tons of charts and graphs that you want to put in the Word report – how do you keep the two in sync? What if your charts change slightly (e.g. changing the styling of every chart in your report)? You’re stuck copying and pasting charts from your notebook, which is a manual, time-consuming, and error prone process.

In this post, I’ll show you my solution to this problem. It involves the following steps:

  • Saving the chart images from Jupyter Notebook to your desktop in code.
  • Preparing your Word Document report, referencing the image names that you saved in your desktop in the appropriate location in your report.
  • Loading the images into a new version of your Word Document.

Saving Chart Images from Jupyter Notebook to your Desktop

The first step is to gather the images you want to load into your report by saving them from Jupyter Notebook to image files on your hard drive. For this post, I’ll be using the data and analysis produced in my “Lets Scrape A Blog” post from a few months ago, where I scraped my favourite blog Marginal Revolution and did some simple analyses.

That analysis used matplotlib to produce some simple charts of results. To save these images to your desktop, matplotlib provides a useful function called savefig. For example, one chart produced in that analysis looks at the number of blog posts by author:

The following code produces this chart and saves it to a file named ‘num_posts_by_author_plot.png’ in the folder ‘report_images’.

A few pointers about this step:

  • Make sure you give your images useful, descriptive names. This helps ensure you place the proper reference in the Word document. I personally like following the convention of giving the plot image the same name as the plot object in your code.
  • Your images must have unique names or else the first image will be overwritten by the second image
  • To stay organized, store the images in a separate folder designed specifically for the purpose of holding your report images.

Repeating similar code for my other charts, I’m left with 5 chart images in my report_images folder:

Preparing the Word Document Report with Image References

There is a popular Microsoft Word document package out there for Python called python-docx, which is great library for manipulating Word Documents with your Python code. However, its API does not easily allow you to insert an image in the middle of the document.

What we really need is something like Jinja2 for Word Docs: a packages where you can specify special placeholder values in your document and then automatically loads images in. Well, luckily exactly such a package exists: python-docx-template. It is built on top of python-docx and Jinja2 and lets you use Jinja2-like syntax in your Word Documents (see my other post on using Jinja2 templates to create PDF Reports).

To get images into Word using python-docx-template, it’s simple: you just have to use the usual Jinja2 syntax {{ image_variable }} within your Word document. In my case, I had six images, and the Word template I put together for testing looked something like this:

For the system to work, you have to use variable names within {{ }} that align with the image names (before ‘.png’) and the plot variable names in your Jupyter notebook.

Loading the Images Into your Document

The final and most important step is to get all the images in your template. To do this, the code roughly follows the following steps: load your Word document template, load the images from your image directory as a dict of InlineImage objects, render the images in the Word document, and save the loaded-image version to a new filename.

Here is the code that does this:

To run the code, you need to specify the Word template document, the new Word document name which will contain the images, and the directory where your images are stored. 

python load_images.py <template Word Doc Filename> <image-loaded Word Doc filename> <image directory name>

In my case, I used the following command, which takes in the template template.docx, produced the image-loaded Word Document result.docx, and grabs the images from the folder report_images:

python load_images.py template.docx result.docx report_images

And voila, the image-loaded Word Document looks like this:

You can find the code I used to create this guide on Github here.

Creating PDF Reports with Python, Pdfkit, and Jinja2 Templates

Once in a while as a data scientist, you may need to create PDF reports of your analyses. This seems somewhat “old school” nowadays, but here are a couple situations why you might want to consider it:

  • You need to make reports that are easily printable. People often want “hard copies” of particular reports they are running and don’t want to reproduce everything they did in an interactive dashboard.
  • You need to match existing reporting formats: If you’re replacing a legacy reporting system, it’s often a good idea to try to match existing reporting methods as your first step. This means that if the legacy system used PDF reporting, then you should strongly consider creating this functionality in the replacement system. This is often important for getting buy-in from people comfortable with the old system.

I recently needed to do PDF reporting in a work assignment. The particular solution I came up with uses two main tools:

We’ll install our required packages with the following commands:

pip install pdfkit
pip install Jinja2

Note that you also need to install a tool called wkhtmltopdf for pdfkit to work.

Primer on Jinja2 Templates

Jinja2 is a great tool to become familiar with, especially if you do web development in Python. In short, it lets you automatically generate text documents by programmatically filling in placeholder values that you assign to text file templates. It’s a very flexible tool, used widely in Python web applications to generate HTML for users. You can think of it like super high-powered string substitution.

We’ll be using Jinja2 to generate HTML files of our reports that we will convert into PDFs with other tools. Keep in mind that Jinja2 can come in handy for other reporting applications, like sending automated emails or creating reports in other text file formats.

There are two main components of working with Jinja2:

  • Creating the text file Jinja2 templates that contain placeholder values. In these templates, you can use a variety of Jinja2 syntax features that allow you to adjust the look of the file and how it loads the placeholder data.
  • Writing the python code that assigns the placeholder values to your Jinja2 templates and renders a new text string according to these values.

Let’s create a simple template just as an illustration. This template will simply be a text file that prints out the value of a name. All you have to do it create a text file (let’s call it name.txt). Then in this file, simply add one line:

Your name is: {{ name }}

Here, ‘name’ is the name of the python variable that we’ll pass into the template, which holds the string placeholder that we want to include in the template.

Now that we have our template created, we need to write the python code that fills in the placeholder values in the template with what you need. You do this with the render function. Say, we want to create a version of the template where the name is “Mark”. Then write the following code:

Now, outputText holds a string of the template where {{ name }} is now equal to “Mark”. You can confirm this by writing the following on the command line:

The arguments to template.render() are the placeholder variables contained in the template along with what you want to assign them to:

template.render(placeholder_variable_in_template1=value_you_want_it_assigned1, placeholder_variable_in_template2=value_you_want_it_assigned2, ..., placeholder_variable_in_templateN=value_you_want_it_assignedN)

There is much much more you can to with Jinja2 templates. For example, we have only shown how to render a simple variable here but Jinja2 allows more complex expressions, such as for loops, if-else statements, and template inheritance. Another useful fact about Jinja2 templates is you can pass in arbitrary python objects like lists, dictionaries, or pandas data frames and you are able to use the objects directly in the template. Check out Jinja2 Template Designer Documentation for a full list of features. I also highly recommend the book Flask Web Development: Developing Web Applications with Python which includes an excellent guide on Jinja2 templates (which are the built-in template engine for the Flask web development framework).

Creating PDF Reports

Let’s say you want to print PDFs of tables that show the growth of a bank account. Each table shows the growth rate year by year of $100, $500, $20,000, and $50,000 dollars. Each separate pdf report uses a different interest rate to calculate the growth rate. We’ll need 10 different reports, each of which prints tables with 1%, 2%, 3%, …, 10% interest rates, respectively.

Lets first define the Pandas Dataframes that we need.

data_frames contains 10 dictionaries, each of which contain the data frame and the interest rate used to produce that data frame.

Next, we create the template file. We will generate one report for each of the 10 data frames above, and generate them by passing each data frame to the template along with the interest rate used. 

After creating this template, we then write the following code to produce 10 HTML files for our reports.

Our HTML reports now look something like this:

As a final step, we need to convert these HTML files to PDFs. To do this, we use pdfkit. All you have to do is iterate through your HTML files and then use a single line of code from pdfkit to each file to convert it into a pdf.

All of this code combined will pop out the following HTML files with PDF versions:

You can then click on 1.pdf to see that we’re getting the results we’re looking for.

We’ve given a very stripped down example of how you can create reports using python in an automated way. With more work, you can develop much more sophisticated reports limited only by what’s possible with HTML.