Digging into Data Science Tools: Using Plotly’s Dash to Build Interactive Dashboards

Dash is a tool designed for building interactive dashboards and web applications using only Python (no CSS, HTML, or JavaScript required). I came across Dash while surveying options for building dashboards and reporting tools in my current position as Data Scientist with the City of Winnipeg Transportation Division.

Why use Dash?

Some of the lowest-hanging fruit working as a data science involves simply getting the data in front of the right people in a way that is easily digestible and actionable. No fancy ML, just presenting data to the right people at the right time in the right format. Dashboards are often a great way of doing that.

However, they can be a pain to build. Off-the-shelf dashboard tools like Tableau and PowerBI can be expensive and limiting (you are constrained to the features that they choose to include in their software). Developing dashboards from scratch as a web application is also a pain, since that often requires writing CSS, HTML, and JavaScript: as a data scientist, your focus is typically programming in python or R and your job typically does not require being an expert in JavaScript.

Dash strikes a perfect middle ground for a data scientist, providing way more flexibility and customizability than off-the-shelf tools at a much lower cost (i.e. free), while still having an easy-to-use API that a python-oriented data scientist should not have difficulty grasping.

A Demo Dash app of Blog Post Data from Marginal Revolution

To become familiar with Dash and see what it can do, I started out with a simple dashboard using some of the the scraped blog post data from Marginal Revolution (see this post on the MR scraper project). You can find the dashboard here, which provides an interactive chart of the number of posts on Marginal Revolution over time for each author. You can choose any of the authors in the drop down box, and the relevant data shows up in the chart:

I have to say, I can’t believe how easy it was to build this dashboard. In total, it’s only 33 lines of python code (you can find the code here. It was also easy to deploy to heroku, following this guide.

How Dash Works

I highly recommend going through the Dash tutorial here, which walks you through app creation and the main components of Dash. Here I’ll give a high level overview of some key points.

Dash apps have two components: a layout component which describes how the application looks visually, and an interactivity component which specifies how your dashboard interacts with the user and responds to input.

Layout Component

The layout defines what the dashboard looks like. When defining the layout of your application, you create a hierarchical “tree” of components, and each component will come from one of the following two main classes:

  • dash_html_components: This includes a component to represent every HTML tag. E.g. dash_html_components.H1(children = ‘Hello Dash’) adds <h1>Hello Dash</h1> to your dashboard.
  • dash_core_components: This describes higher-level components that combine HTML, JavaScript, and CSS created through the React.js library. These components have an interactive element to them (e.g. graphs with tooltips). One of the key classes within dash_core_components is Graph. Graph components represent data visualizations created using plotly.js (Plotly is the company that created Dash). Graph provides 35 different chart types to meet your needs. Another nice feature included within dash_core_components is the ability to include Markdown, which is often more convenient for presenting text based copy in your dashboards. There are many other options available to you in the dash_core_components library to use in your dashboard (e.g. sliders, checkboxes, date ranges, interactive tables, tabs, and more – see the gallery here).

To define the layout for a page and the data contained within it, you nest these various components as appropriate. You assign this nested list to app.layout, where app = dash.Dash(). To see an example of how these components fit together and how they are rendered in the browser, see the getting started page on the Dash tutorial.

Interactivity Component

Another key feature of Dash is its ability to respond to user input and change the visualization accordingly (dashboards aren’t really of much use without this feature). Dash makes this functionality quite easy, although there is a bit too much to go into great detail in this blog post. Read part 2 of the Dash user guide.

In short, Dash lets you add interactivity by adding an @app.callback decorator to a Python function you write which takes the input from one of the layout components and returns outputs to send to another layout component that you want to change according to the new input.

You can have multiple inputs and multiple outputs. For example, if you want a graph that gives the user the option of filtering by multiple variables in your dataset, you can use multiple input filters. The function that you apply the @app.callback decorator to automatically fires whenever there are any changes to any of the inputs.

Deploying to Production

Dash uses the python Flask web application framework under the hood. Flask is an awesome, lightweight web application framework for Python that is definitely worth knowing (and relatively easy to learn given how stripped down it is). You can access the underlying Flask app instance using app.server where app is the instance of your dash app (i.e. app = Dash()).

Since the Dash app is essentially a Flask app under the hood, deploying a Dash app is the same as deploying a Flask app, and there are many guides online for deploying flask apps in a variety of situations. The Dash website provides some more details about deployment as well as steps for deploying to Heroku.

Comment below if you have any questions or if you want to share your experience using Dash!

Further Resources

Dash User Guide and Documentation
Plotly.py Documentation and Gallery
Dash Community Forum

Overview Python (and non-Python) Mapping Tools for Data Scientists

Very often, data needs to be understood on a geographic basis. As a result, data scientists should be familiar with the main mapping tools at their disposal.

I have done quite a bit of mapping before, but given its central nature in my current position as a Transportation Assets Data Scientist with the City of Winnipeg, it quickly became clear that I need to do a careful survey of the geographic mapping landscape.

There is a dizzying array of options out there. Given my main programming language is python, I’ll start with tools in that ecosystem, but then I’ll move on to other software tools available for geographic analysis.

Mapping Tools in Python

GeoPandas

GeoPandas is a fantastic library that that makes munging geographic data in Python easy.

At its core, it is essentially pandas (a must-know library for any data scientist working with python). In fact, it is actually built on top of pandas, with data structures like “GeoSeries” and “GeoDataFrame” that extend the equivalent pandas data structures with useful geographic data crunching features. So, you get all the goodness of pandas, with geographic capabilities baked in.

GeoPandas works its magic by combining the capabilities of several existing geographic data analysis libraries that are each worth being familiar with. This includes shapely, fiona, and built in geographic mapping capabilities through descartes and matplotlib.

You can read spatial data into GeoPandas just like pandas, and GeoPandas works with the geographic data formats you would expect, such as GeoJSON and ESRI Shapefiles. Once you have the data loaded, you can easily change projections, conduct geometric manipulations, aggregate data geographically, merge data with spatial joins,  and conduct geocoding (which relies on the geocoding package geopy).

Basemap

Basemap is a geographic plotting library built on top of matplotlib, which is the granddaddy of python plotting libraries. Similarly to matplotlib, basemap is quite powerful and flexible, but at the cost of being somewhat time consuming and cumbersome to get the map you want.

Another notable issue is that Basemap recently came under new management in 2016 and is going to be replaced by Cartopy (described below). Although Basemap will be maintained until 2020, the Matplotlib website indicates that all development efforts are now focused on Cartopy and users should switch to Cartopy. So, if you’re planning on using Basemap, consider using…..

Cartopy

Cartopy provides geometric transformation capabilities as well as mapping capabilities. Similarly to Basemap, Cartopy exposes an interface to matplotlib to create maps on your data. You get a lot of mapping power and flexibility coming from matplotlib, but the downside is similar: creating a nice looking map tends to be relatively more involved compared to other options, with more code and tinkering required to get what you want.

geoplotlib

Geoplotlib is another geographic mapping option for Python that appears to be a highly flexible and powerful tool, allowing static map creation, as well as animated visualizations, and interactive maps. I have never used this library, and it appears to be relatively new, but it might be one to keep an eye out for in the future.

gmplot

gmplot allows you to easily plot polygons, lines, and points on google maps, using a “matplotlib-like interface”. This allows you to quickly and easily plot your data and piggy-back on the interactivity inherent to Google Maps. Plots available include polygons with fills, drop pins, scatter points, grid lines, and heatmaps. It appears to be a great option for quick and simple interactive maps. 

Mapnik

Mapnik is a toolkit written in C++ (with Python bindings) to produce serious mapping applications. It is aimed primarily at developing these mapping applications on the web. It appears to be a heavy duty tool that powers a lot of maps you see on the web today, including OpenStreetMap and MapBox.

Folium

Folium lets you tap into the popular leaflet.js framework for creating interactive maps, without having to write a single line of JavaScript. It’s a great library that I have used quite often in recent months (I used folium to generate all the visualizations for my Winnipeg tree data blog post).

Folium allows you to map points, lines, and polygons, produce choropleth maps and heat maps, create map layers (that users can enable or disable themselves), and produce pop-up tooltips for your geographic data (bonus: these tooltips support html so you can really customize them to make them look nice). There is also a good amount of customization possible with the markers and lines used on your map.

Overall, Folium strikes a great balance between features, customizability, and ease of programming.

Plotly

Plotly is a company offering a large suite of online data analytics and visualization tools. The focus of Plotly is providing frameworks that make it easier to present visualizations on the web. You write your code in Python (or R), talking to a plotly library and the visualizations are rendered using the extremely powerful D3.js library. For a taste of what is possible, check out their website, which showcases a bunch of mapping possibilities.

On top of the charts and mapping tools, they have a bunch of additional related products of interest on their website that are worth checking out. One that I’m particularly interested in is Dash, which allows you to create responsive data-driven web applications (mainly designed for dashboards) using only Python (no JavaScript or HTML required). This is something I’m definitely going to check out and will probably produce a “diving into data science” post in the near future.

Bokeh

Bokeh is a library specializing in interactive visualizations presented in the browser. This includes geographic data and maps. Similarly to Dash, there is also the possibility of using Bokeh to create interactive web applications that update data in real time and respond to user input (it does this with a “Bokeh Server”).

Other Tools for Mapping

There are obviously a huge amount of mapping tools outside of the python ecosystem. Here is a brief summary of a few that you might want to check out. Keep in mind that there are tons and tons of tools out there that are missing from this list. These are just some of the tools that I’m somewhat familiar with.

Kepler.gl

Kepler is a web-based application that allows you to explore geodata. It’s a brand spanking new tool released in late May 2018 by Uber. You can use the software live on its website – Kepler is a client side application with no server backend so all the data resides on your local machine / browser even.  It’s not just for use in on the Kepler website however; you can install the application and run it on localhost or a server, and you can also embed it into your existing web applications.

The program has some great features, with most of the basic features you would expect in an interactive mapping application, plus some really great bonus features such as split maps and playback. The examples displayed on the website are quite beautiful and the software is easy to use (accessible for use non-programmers).

This user guide provides more information on Kepler, what it can do, and how to do it. I’m very much looking forward to checking it out for some upcoming projects.

Mapbox

Mapbox provides a suite of tools related to mapping, aimed at developers to help you create applications that use maps and spatial analysis. It provides a range of services, from creating a nice map for your website to helping you build a geoprocessing application.

This overview provides a good idea of what’s possible. Mapbox provides a bunch of basemap layers, allow you to customize your maps, add your own data, and build web and mobile applications. It also provides options for extending functionality of web apps with tools like geocoding, directions, spatial analysis, and other features. Although Mapbox is not a free service, they do seem to have generous free API call limits (see their pricing here).

Heavy Duty GIS Applications

Not included here but also really important for people doing mapping are the full fledged GIS applications such as ArcGIS and QGIS. These are extremely powerful tools that are worth knowing as a geodata analyst. Note that ArcGIS is quite expensive; however, it is an industry standard so worth knowing. QGIS is also fairly commonly used and has the advantage of being free and open source.

Any glaring ommissions in this post? Let me know in the comments below or send me an email.

Further resources

Python data visualization libraries
Essential Python Geospatial Libraries
So You’d Like to Make a Map Using Python
Visualizing Geographic Data With Python
Geospatial Data with Open Source Tools in Python (YouTube) and accompanying GitHub Repo of Notebooks

Basemap

Basemap Tutorial
Map Making in Python with Basemap

Mapnik

Mapnik Wiki
Mapnik – Maybe the Best Python Mapping Platform Yet
Take Control of Your Maps

Folium

Python tutorial on making a multilayer Leaflet web map with Folium (YouTube)
Interactive Maps with Python (Three Parts)
Creating Interactive Crime Maps with Folium

Bokeh

Python: mapping data with python library Bokeh
Interactive Maps with Bokeh