Big Transportation Data for Big Cities Conference: My Takeaways

For a long time, I’ve been interested in transportation and urban economics. When I was doing my Masters, I planned to specialize in these areas if I continued on to a PhD. So, when I saw a job position open for Data Scientist at the City of Winnipeg Transportation Assets Division, I didn’t have to spend much more than two seconds considering whether I would apply. 

Well, a few months have passed and I’m happy to announce that I was successful: I’m starting the position this week. To say I’m excited is a huge understatement. The Division has been doing very great things with the recent development of the Transportation Management Centre (TMC) and I’m looking forward to being a part of these cutting edge efforts to improve the City’s transportation system.

To get up to speed, I’ve been looking through various sources to get an idea what municipalities have been up to in this space. I was pointed to the Big Transportation Data for Big Cities Conference, which took place in 2016 in Toronto and involved transportation leaders from 18 big cities across North America. The presentations are all available online and are a great source to understand the kind of transportation data cities are collecting, how they’re using it, possibilities for future use, and challenges that remain.

How cities are using transportation data

Municipalities are collecting unprecedented amounts of data and working to apply  it in a variety of ways. Steve Buckley from the City of Toronto Transportation Services provides a useful categorization of the main areas of use for city transportation data: describing, evaluating, operating, predicting, and planning.

Describing (Understanding)

A fundamental application of the transportation data flowing into municipalities is simply to provide situational awareness about what is actually happening on the ground. This understanding is a prerequisite to all other forms of data use.

In the past, this was hard and expensive, but with widespread GPS, mobile applications, wireless communication technology, and inexpensive sensors, this kind of descriptive data is becoming cheaper to collect, easier to collect, and more detailed. 

There appears to still be a lot of “low hanging fruit” for improving safety and congestion by simply having more detailed data and observing what is actually happening on the ground. For example, one particularly interesting presentation from Nat Gale from the Los Angeles Department of Transportation points out that only 6% of their streets account for 65% of deaths and serious injuries for people walking and biking (obviously prime targets for safety improvements). His presentation goes on to describe how they installed a simple and inexpensive “scramble” pedestrian crossing at one of the most dangerous intersections in the city (Hollywood / Highland) and this appears to have increased the safety of the intersection dramatically.

Evaluating (Measuring)

While descriptive data is crucial, it is not sufficient. You also need to understand what is most important in the data (i.e. key performance indicators) and have reliable ways of figuring out whether an intervention (e.g. light timing change) actually produced better results.

Along these lines, one particularly interesting presentation was from Dan Howard (San Francisco Municipal Transportation Agency) on their use of transit arrival and departure data to determine transit travel times (no GPS data required). Using this data, they can compare travel times before and after interventions, and understand the source of delays by simply examining the statistical distribution of travel times (e.g. lognormal distribution means good schedule adherence, normal distribution implies random events affect travel times, and multiple peaks indicate intersection / signal issues).

Operating

A key theme throughout many of the presentations is the potential benefits of being able to get traffic data in real time. For example, several municipalities have live real-time camera observations, weather data, and mobile application data (among other sources). These sources can provide real-time insight into operational improvements, such as real time congestion and light timing adjustment, traffic officer deployment planning, construction management, and detecting equipment / mechanical failures.

Predicting

The improved detail of data, the real-time nature of the data, and evaluation techniques come together to enable a variety of valuable predictive analytics allowing municipalities to take proactive response (e.g. determining the locations at highest risk of congestion or accidents and preventing accidents before they happen).

Planning (Prioritizing investments)

With improved data and improved insights from the data, municipalities can do better planning of investments to yield the highest value in terms of some target (e.g. commute times, accidents).

Municipalities are starting to capitalize on the benefits of open data

One common thread throughout many of the presentations is the benefits of opening up city data to the public, third parties, and other government departments. Although this is not without its challenges, there are many potential benefits.

Personally, as a data-oriented person, I’m particularly gung-ho about opening data up to the public, as long as the data does not infringe on anyone’s privacy and the cost of making the data public is not too high. I feel like this should be almost a moral imperative of public institutions – if you’re collecting public data, then the public should be able to access that data (again, after considering privacy concerns and resource constraints).

But there are much more selfish reasons other than moral principle for cities to open up the data, and based on these presentations, municipalities starting to understand these benefits.

One important advantage is by making the data public, you create opportunities for others to do analysis or write software applications that your organization simply does not have the resources to do. For example, it may not be a core competency of a transportation department to build, deploy, and maintain mobile applications. However, many people want something like this to exist, and making transit schedules accessible through a public API facilitates others to do this work. In these cases, the municipality plays the role of enabler.

Another thing to consider is that people can be quite ingenious and figure out things to do with the data that you never dreamed of. By making the data public, you can crowdsource the ingenuity and resourcefulness of citizens for the benefit of the public. Municipalities can do this not only by opening the data, but also by hosting public events such as urban data challenges or open data hackathons. Sara Diamond from OCAD University went through several examples of clever visualizations and related projects resulting from open transit data. 

Another advantage of opening data is that it promotes collaboration with other municipalities and other departments within a single municipality. Opening the data builds competencies that can come in handy even if the data is not made public: for example, it may help a municipality share critical transportation data with other departments (e.g. emergency response teams).

This collaborative approach seems central for many municipalities in the conference. For example, Abraham Emmanuel from the City of Chicago talks about the City’s Transportation Management Center, which is working to “develop an integrated and modular system that can be accessed from anywhere on the City network” and “create interfaces with external systems to collect and share data” (where “external systems” can include the Chicago Transit Authority, Utilities, Third Parties, and others).

Municipalities are opening up to open source

Increasingly, municipalities are beginning to understand the value of open source software and incorporating it into their operations. Bibiana McHugh from TriMet Portland provides a useful comparison of the advantages of proprietary software versus open source software, with open source providing more control, fostering innovation / competition, resulting in a broader user and developer base, and the low entry costs.

Catherine Lawson from the The University at Albany Visualization and Informatics Lab (AVAIL) similarly presents benefits of open source, noting advantages such as defensible outputs (open platforms allow for 3rd party verification of output) and trustworthiness (open platforms can lead to a robust shared confidence in outcomes). In contrast, the advantages of proprietary models include alignment with procurement processes and the fact that it is the traditional, (currently) best-understood model.

Perhaps the best illustration of open source in action is given in Holly Krambeck’s (World Bank) presentation showing how open-source solutions can “leapfrog” traditional intelligent transportation systems in resource-constrained cities. She talks about the OpenTraffic program where “data providers” (e.g. taxi hailing companies) collect GPS location data from mobile devices host an open-source application called “Traffic Engine” that translates the raw GPS data into anonymized traffic statistics. These are sent to an server, pooled with other data providers statistics, and served with an API for users to access the data. OpenTraffic is built using fully open-source software and you can find a detailed report of how the project works here.

I think this is very exciting not just for the municipalities that reap the benefits of open source, but for programmers who now have the opportunity to build a reputation for themselves and their city, all while contributing a public good that benefits everyone.

Challenges

Of course, there are challenges that come along with the opportunities of producing large scale, highly detailed transportation data. Mark Fox from the University of Toronto Transportation Research Institute has an extremely useful presentation outlining some of the main challenges often associated with open city data. These include:

  • Granularity (datasets often have different level of aggregation),
  • Completeness: important to think carefully about what to open to the public and having a reason behind opening it
  • Interoperability: datasets across different departments may describe similar things but may not be comparable due to slightly differing schemas / data types)
  • Complexity: the data presented may be very complex and thus the public presentation of that data is limited
  • Reliability: whenever you collect data, there are questions of the reliability of the data that limit the ability to use it and apply it.
  • Empowerment: This is an interesting challenge I had not considered, which refers to the the incentives often built into government organizations to avoid failure at all costs and not engage in any risk-taking through experimentation. There also may tend to be a focus on short-term delivery of political goals and a lack of a long-term strategy of innovation.

Ann Cavoukian from Ryerson University (and formerly the Information and Privacy Commissioner for Ontario) adds privacy to this list of challenges. Her presentation focuses entirely on these issues, along with “Privacy by Design” standards to help mitigate these risks. She points out that extensive data collection and analytics can lead to “expanded surveillance, increasing the risk of unauthorized use and disclosure, on a scale previously unimaginable”. With recent privacy and data breach scandals from Equifax and Facebook since this presentation took place, I assume these issues are even more at the forefront of municipalities’ concerns with respect to transportation data.

Adventures in Open Data Hacking – Winnipeg Tree Data!

As a data guy, I’m pretty excited to see my city making a strong commitment to open data. The other day, I was sifting through some of this stuff to see what I could play with and what kind of interesting data mash-ups I could create with it. Very soon, my prayers were answered, with the Winnipeg tree inventory – yes, that’s right, the City of Winnipeg keeps a detailed database of all 300,000+ public trees located in the City, including botanical name, common name, tree size (diameter), and precise GPS coordinates!

After chuckling to myself in amusement for a while about how awesome this is, I dug into the data. Read on to see the results. 

You can find the code I used to create the visualizations here. For those curious, I used Python, along with a couple of amazing geographic mapping packages: GeoPandas (incorporates geographic data types into the Pandas package) and Folium (allows you to write Leaflet.js maps using Python code).

Most common trees

Turns out that, by far, the most common trees in the city are Green Ash and American Elm. These two types represent almost half of the trees in the database. Check out the plot below to see the top 10 tree types in the City.

Rarest trees

As shown in the previous plot, there are a few types of trees that totally dominate. Looking at the other end of the spectrum, there are quite a few tree types that are extremely rare, with only one or two of them found across the entire city. The map below shows the location of the 50 rarest trees (click on the tree to see the tree type). A valuable resource for all you rare tree hunters.

Biggest trees

The map below shows the 50 biggest trees in Winnipeg (by diameter). The size of each of the green dots represents the size of the tree. As you can see in the map, apparently there is a monster American Elm in Transcona that I may have to check out next time I’m the area.

Neighbourhoods with the most trees

Next, I thought I would mash-up the tree data with neighbourhood boundary data also available from the City of Winnipeg website to see what the tree situation is like for each neighbourhood.

Looking at the total number of trees, Pulberry is in the lead, followed by Kildonan Park, River Park South, and Linden Woods.

Here are the 10 neighbourhoods with the fewest trees.

I also put together a choropleth map to visualize at a glance the total number of trees in each neighbourhood.

These measurements aren’t totally fair, since bigger neighbourhoods will naturally have more trees. A better measure of how tree-filled a neighbourhood is the tree density, or the number of trees per square kilometres. The map below shows the tree density of all the neighbourhoods in the city.

Looking at the top ten neighbourhoods in terms of tree density, many are (not surprisingly) parks (Kildonan Park is the most densely treed, by far).

The plot below shows the 10 neighbourhoods with the lowest tree densities. Interestingly, Assiniboine Park is near the bottom. The only explanation I can think of is that the trees there are privately owned (the data only includes public trees).

Other ideas?

There are a lot of other ways you could slice, dice, and mash-up the data with other sources to get more interesting results. Here’s a few things I thought of:

  • Zooming in on one neighbourhood and show a dot map or heat map of all the trees to show the distribution.
  • Looking at  percentage of the trees that come from parks and filtering out park trees.
  • Developing a “tree index” for each neighbourhood (like something you would see on a real estate ad to describe the neighbourhood).
  • Examining the most common types of trees in each neighbourhood.

Comment below if you have any suggestions / requests!