Early voting in Texas is one of the few highlights in a state that ranks 46th in ease of voting. It makes sense: you can vote on a day and at a time most convenient for you.
But 2018 has been a controversial midterm election cycle. Across the country, many election day voting locations have been closed because they lack ADA compliance or cost a lot to operate. These closures could have an impact on voter turnout, especially for communities of color.
That makes early voting even more important. With a greater reliance on early voting, it follows that the distribution of early voting locations may have an even greater impact on voter turnout.
I wondered if I could use machine learning to analyze the distribution of early voting locations in Harris County. This model uses K-Means to group concentrations of people and determine optimal polling locations.
My approach to this analysis
First, I used Census tract data to estimate where people live. I calculated the center point of each Census tract polygon, then used the center of each tract as a proxy for the location of all people living in that tract.
Then I layered in the location of all 46 early voting locations in Harris County and measured the distance between the center of each Census tract and the closest early voting location.
Harris County already has a good distribution of early voting locations
Here is what I learned: currently, in Harris County, the average resident is 0.76 miles away from their early voting location. Approximately 30% of people in Harris County live more than one mile away from their early voting location. I also discovered that the racial and ethnic composition of a community does not influence their distance to an early voting location.
Could this distribution improve? Yes, but not by much.
After my preliminary analysis, I compared the distribution of distances using the current polling locations to the distribution of distances using the K-Means centroids.
With a K of 46, I have an optimized distance from the polls of 0.67 miles, with a standard deviation of 0.44 miles. What this initially tells us is that the location of early voting polls in Harris County are almost perfectly optimized. The difference between the K-Means model distribution and the Harris County model is miniscule and could be explained by my estimation of where each person in Harris County is physically located and the variation in building availability.
What about adding more early voting locations?
I tried to see the impact that adding an early voting location has on accessibility based on distance. Starting with a K of 40 and a range of 60, I see that every additional early voting location added based on my algorithm would decrease the average person’s distance from the polls by 0.004 miles, or 21 feet. This mathematically makes sense because the relationship between the number of centers and the distance is a negative nonlinear relationship.
Surprisingly, in order to decrease the average citizen’s distance to less than ½ a mile, Harris County would have to create 32 additional early voting locations and relocate all 46 of the current locations. This would be a very expensive process and may only have a marginal impact on voter turnout.
What should Harris County do instead?
As we shatter voter turnout records this season, Harris County can build capacity by focusing on extending hours of operation and increasing the number of electronic polling stations at each location.