The State of Texas collects a lot of important data about our environment. The Texas Commission for Environmental Quality (TCEQ) is the state agency in charge of measuring and regulating pollutants. The scientific data that they collect tell us a lot about the environment, but it’s difficult to access.
We collaborated with Air Alliance Houston to build the BREATHE API, which is powered by data featured on the TCEQ air quality website. This data encompasses air emissions, complaints, complaint investigations, enforcement reports, and permits in the Houston area.
We built the BREATHE dashboard to help non-tech-savvy community members visualize and explore this air quality data in their neighborhoods. We also decided to introduce the API to the public, specifically to scientists and data nerds passionate about the environment that want to dig into the data on their own. If you’d like to learn how to use the API please follow along.
For this tutorial, we are going to take a look at emissions data in Houston. We’ll be querying, analyzing, and visualizing the data in R. R libraries we’ll be using include:
To query emissions, pass the API endpoint as a string to the
# emissions API endpoint emissions <- "https://air-alliance-api.herokuapp.com/api/air_emissions" #make date variables start <- "2018-04-20" end <-"2018-09-10" #make the request! emissions.data <- fromJSON(paste0(emissions, "?filtercol=event_began_date>=", as.numeric(as.POSIXct(as.character(start))), "<=",as.numeric(as.POSIXct(as.character(end)))))
To filter the data we use the param
filter_col and pass the column name. We need to filter by date, which in this case is called
event_began_date. For the purposes of this tutorial, I looked at data from April 20, 2018 to September 10, 2018. To adjust the dates for your analysis, simply change the
After running that function you’ll noticed that
emissions.data is a list object that includes,
message. So we can work with a nice data frame, we’ll index for the
emissions.data <- emissions.data$data
At first glance, I’d like to get a rough idea of what the top contaminants reported are.
top_emissions <- emissions.data %>% group_by(contaminant) %>% summarise(n = n()) %>% top_n(5) %>% arrange(desc(n))
It looks like the top 5 emissions include
The data represents the amount of emissions released into the environment with
amount_released and it looks like with the exception of
Opacity, the amount uses
lbs as its unit of measurement.
If we visualize the recent history of emissions (spanning 5 months of data) using
ggplot2, this is what it looks like:
#data prep & cleaning for visual emissions.data <- emissions.data %>% mutate(amount_released = as.numeric(gsub("(lbs|% op) \\(est\\.\\)", "", amount_released)), start_date = format(as.Date(event_began_date, "%Y-%m-%d"), "%m-%Y")) #filter for emissions reported in lbs released per hour emissions.lbs <- emissions.data %>% filter(contaminant != "Opacity" & contaminant %in% top_emissions$contaminant) %>% group_by(contaminant, start_date) %>% summarise(total_released_lbs = sum(amount_released, na.rm = TRUE)/100) #filter for opacity reported in % emissions.opacity <- emissions.data %>% filter(contaminant == "Opacity") %>% group_by(start_date) %>% summarise(average_opacity = mean(amount_released))
There are some interesting patterns here. Opacity contaminant levels seem to peak during the month of April but then gradually level off as time passes. The month of May produces heavy amounts of Carbon Monoxide and Propylene contaminants. If we were to investigate further we could look at some of the emission event details such as
If we visualize the distribution of
event_type we see that
AIR STARTUP events contribute the most to the amount of emissions (reported in lbs) released in the month of May. After perusing the cause data, it looks like “Flaring” during the startup of a particular plant in Freeport is associated with the large amount released.
This tutorial is only a teaser of what you can do with the data from the Air Alliance API. To continue playing with the API and the other data sources it supplies, please refer to the full API Documentation
Have fun exploring and building!