China's Deadly Air

data visualization, visual design, research

Winter 2015

Background

China's air quality has been a point of contention for a number of years. Almost all my family resides in China and I worry about the quality of air that they're exposed to. The goal of this project was to present data about China's efforts in addressing climate issues in a visualization that the general public could easily understand.

The visualization provides an overview of changes to China's air quality as well as a detailed analysis of anomalies.

Research Objectives

Data Source

Approximately 180,000 rows of data was analyzed from the Chinese ministry of environmental protection database

download my dataset

It includes the city name in english, aqi, and date from 2014 to Dec 5, 2015. I don't take any responsibility for the validaty or accuracy of this data and it may require further sanitation.

Highlights

China has made the effort to monitor air quality in all its cities. 4 of the 5 most populated cities saw improvement in air quality.

As you can see from the amount of green on the map, most cities improved significantly with many improving more than 25% from 2014.

Unfortunately Shanghai saw a deterioration in air quality. This may have been due to a lack of significant improvement in the surrounding area as wind patterns circulate pollution throughout an area.

The worst got better. The best got worse.

While the worst cities of 2014 improved, the best two cities of 2014 saw a decline in their air quality. This may indicate unsustainable practices of spreading out polluting industries and moving them farther from target cities.

A Challenging Road Ahead

Despite significant improvement in 2015 compared to 2014, experts caution people to be skeptical about future improvements to air quality due to:

  • Polluting and energy hungry industries still make up too large a part of the economy
  • Coal will remain China's primary source of energy in the long term
  • No easy way to stem an increase in the number of vehicles on China's roads

Process

I took a very fluid approach with this project because I knew that the final result would be depend on the data and insights that revealed themselves through the process.

Initially I could only find macro-level data regarding China's air quality such as the average yearly concentration of the fine particle PM2.5 per province.

Exploration

At this stage, the focus was on the exploration of different visualizations. I knew that trying to explore manually would be time-consuming so I researched software options. I needed software that would allow me to generate custom chloropleth maps and that I could learn quickly.

Average yearly concentration of the fine particle PM2.5 per province

Heatmaps and bubble maps misrepresents the data as discrete geographical points when it is actually regional.

I played around with CartoDB and realized that a chloropleth map would showcase the geographic data best. I narrowed my options down to Google FusionTables and Microsoft Excel. The excel method lacked proper tutorials so I chose FusionTables.

PM2.5 concentrations in 2014 and 2015

Gradient chloropleth map. Too many colours made it hard to distinguish the data.
Setting distinct buckets for the different concentrations of PM2.5 clearly shows which areas are have polluted air. Green represents acceptable air pollution.

Data

As I was doing more research, I stumbled across the Chinese ministry of environmental protection database. It includes a wealth of data including daily air quality index data per city per day since the 2000s. Hallelujah. I decided to pivot and work with this detailed data because it would afford me more freedom in analysis.

Retrieval

There was approximately 150,000 rows of data from 2014 to 2015 but I needed it in a form that I could analyze like CSV. I knew I'd have to either write a data scraping script or find one to get this data in a timely manner. Luckily I found one.

Then I hit a few more bumps. Scraping the data took the course of a week due to slow speeds of accessing data from around the world and the server timing out on me every few hours. Sigh.

Analysis

After getting all my data, I needed to sanitize it and get it into an environment where I could analyze it quickly. Some data familiar friends said python notebook was a great choice but I didn't have the time to invest in learning. After some research, turns out Excel can do a lot of the same things mySQL does such as complex queries.

An analysis of the dataset itself revealed:

  • Data was missing for many days/cities.
  • There were twice as many entries for 2015 than 2014 due to additional monitoring stations being added.

To address these issues:

  • I merged in a dataset that another individual had scraped for the year 2014 which filled in the missing data.
  • Removed all data for cities that were not recorded in 2014 to allow for fair comparison between 2014 and 2015

Left: Line graph of number of data entries per day. You can clearly see the dips are where data is missing for certain days.
Right: Cities with data missing for 2014 had to be removed.

Exploration

Excel allowed me to quickly explore different visualizations for the data I had at hand. Some patterns emerged that inticed me to explore further such as the natural fluctuations in AQI value as seasons change.

At this point, I was overwhelmed with the amount of data and potential visualizations available.

Top left: Percentage of types of days for 2014 and 2015.
Top right: Scatter plot of AQI levels for 2014 and 2015.
Bottom left: Stacked bar graph of different types of days for all cities in 2015.
Bottom right: Scatter plot of all AQI values from 2014 to 2015.

Focusing on the Narrative

Focusing on answering my research question of "Has China's air quality improved?" allowed me to filter out extranneous information and refine my narrative. I struggled with crafting a cohesive flow with the separate elements. Ultimately, I decided to follow the newspaper information architecture as the general public is most familiar with it.

Explorations layout in merging seperate visualization elements

Visual Exploration & Refinement

At this point I realized that FusionTables won't let you export maps into vector images. I didn't think about this when I chose software and it bit me in the ass. Luckily I was able to find a vector of the Chinese provinces.

Explorations of various details such as images, textures, and type.

Project Critiques

This project was a massive undertaking but I'm pretty happy with the result. However there are a few things that I'm unsatisfied with.

  • There is potential with the use of greens and reds to indicate the severity of air quality because they are colours people associate with severity. Unfortunately, the colours for the Changes in air quality map is very similar to the colours for the AQI range maps. This may confuse viewers and requires further exploration.
  • Currently, the city labels requires viewers to reference numbers to the far right. Further exploration of labelling techniques such as extended lines may improve the user experience of the visualization.
  • While the horizontal layout works, there wasn't enough layout explorations. There may be a more cohesive layout that highlights the narrative better.

Learnings

  • Leave enough time for troubleshooting. Many of the things I predicted would go smoothly didn't.
  • It's important to research with the proper language (e.g. cities are called prefrectures in China), and in my case, in the right language, Mandarin, to get the best results.
  • I wasted a lot of time thinking too far ahead and learned software, or researched information I thought I might need. In the future, I'll try not invest too much time into tasks before I've made an informed decision.
  • Make sure I define my criteria for software holistically, taking into considerations my needs throughout the process.