Introduction to the Air Alliance API

The State of Texas collects a lot of important data about our environment. The Texas Commission for Environmental Quality (TCEQ) is the state agency in charge of measuring and regulating pollutants. The scientific data that they collect tell us a lot about the environment, but it’s difficult to access.

We collaborated with Air Alliance Houston to build the BREATHE API, which is powered by data featured on the TCEQ air quality website. This data encompasses air emissions, complaints, complaint investigations, enforcement reports, and permits in the Houston area.

We built the BREATHE dashboard to help non-tech-savvy community members visualize and explore this air quality data in their neighborhoods. We also decided to introduce the API to the public, specifically to scientists and data nerds passionate about the environment that want to dig into the data on their own. If you’d like to learn how to use the API please follow along.

For this tutorial, we are going to take a look at emissions data in Houston. We’ll be querying, analyzing, and visualizing the data in R. R libraries we’ll be using include: jsonlite and dplyr

To query emissions, pass the API endpoint as a string to the jsonlite::fromJSON function.

# emissions API endpoint
emissions <- "https://air-alliance-api.herokuapp.com/api/air_emissions"

#make date variables
start <- "2018-04-20"
end <-"2018-09-10"

#make the request!
emissions.data <- fromJSON(paste0(emissions, "?filtercol=event_began_date&gt=", as.numeric(as.POSIXct(as.character(start))), "&lt=",as.numeric(as.POSIXct(as.character(end)))))

To filter the data we use the param filter_col and pass the column name. We need to filter by date, which in this case is called event_began_date.  For the purposes of this tutorial, I looked at data from April 20, 2018 to September 10, 2018. To adjust the dates for your analysis, simply change the start and end variables.

After running that function you’ll noticed that emissions.data is a list object that includes, status, data, and message. So we can work with a nice data frame, we’ll index for the data element.

emissions.data <- emissions.data$data

At first glance, I’d like to get a rough idea of what the top contaminants reported are.

top_emissions <- emissions.data %>%
  group_by(contaminant) %>%   
  summarise(n = n()) %>%
  top_n(5) %>%
  arrange(desc(n))

It looks like the top 5 emissions include Carbon Monoxide, Opacity, Propane, Propylene, and Isobutane.

The data represents the amount of emissions released into the environment with amount_released and it looks like with the exception of Opacity, the amount uses lbs as its unit of measurement.

If we visualize the recent history of emissions (spanning 5 months of data) using ggplot2, this is what it looks like:

#data prep & cleaning for visual

emissions.data <- emissions.data %>%
  mutate(amount_released = as.numeric(gsub("(lbs|% op) \\(est\\.\\)", "", amount_released)),
  start_date = format(as.Date(event_began_date, "%Y-%m-%d"), "%m-%Y"))

#filter for emissions reported in lbs released per hour 
emissions.lbs <- emissions.data %>%
  filter(contaminant != "Opacity" & contaminant %in% top_emissions$contaminant) %>% 
  group_by(contaminant, start_date) %>%
  summarise(total_released_lbs = sum(amount_released, na.rm = TRUE)/100)

#filter for opacity reported in % 
emissions.opacity <- emissions.data %>%
  filter(contaminant == "Opacity") %>%
  group_by(start_date) %>%
  summarise(average_opacity = mean(amount_released))

There are some interesting patterns here. Opacity contaminant levels seem to peak during the month of April but then gradually level off as time passes. The month of May produces heavy amounts of Carbon Monoxide and Propylene contaminants. If we were to investigate further we could look at some of the emission event details such as event_type, cause, and physical_location.

If we visualize the distribution of event_type we see that AIR STARTUP events contribute the most to the amount of emissions (reported in lbs) released in the month of May. After perusing the cause data, it looks like “Flaring” during the startup of a particular plant in Freeport is associated with the large amount released.

 

This tutorial is only a teaser of what you can do with the data from the Air Alliance API. To continue playing with the API and the other data sources it supplies, please refer to the full API Documentation

Have fun exploring and building!

Niha Pereira

Niha is a data scientist at January Advisors. She enjoys using data and technology to support her community and local do-gooders. You can read more about her background on LinkedIn

Into civic tech, maps, data, and building awesome stuff?

Then you'll love our monthly newsletter.