I was 18 when I had my first and only (knock on wood) car accident. It occurred while I lived in Pennsylvania for my first college semester and I rear-ended a very nice Lincoln with my 4-cylinder "wanna-be-fast-and-furious" Lancer. Why did this happen? Was it because of the recent rain and therefore, slippery road surface? Was I following too closely? Or was I just too distracted jamming out to "Crash" by The DMB? No, really, that song was on when I wrecked...how fitting. Crashes can obviously occur for a multitude of reasons but can we identify those reasons in order to limit accidents? In this blog, I'll review 4 years worth of San Antonio accident data provided by the TX Department of Transportation in an effort to identify trends and/or areas of improvement to keep San Antonians safe!
If you'd like to see the code utilized for this analysis, you can view and follow along with the Jupyter Notebook here.
Let's begin with total numbers. Here's the trend for crashes that occurred in San Antonio from 2013-2016.
|Year||# Accidents||YoY Increase (%)|
As you can see, accidents are steadily increasing every year in San Antonio since 2013 with the most dramatic increase between 2014 and 2015.
Where have these accidents occurred for each year? We can plot the data by longitude and latitude in a scatter plot to observe every accident for each year. Each accident is plotted with transparency so the darker an area, the more accidents that have occurred in that location.
# plot years together, what gets increasingly darker? fig, axes = plt.subplots(nrows=2, ncols=2,figsize=(15,12)) fig.tight_layout() df13.plot(kind='scatter', x='Crash Longitude',y='Crash Latitude', title = "2013", marker='o', alpha=0.1, s=.5, ax=axes[0,0]); df14.plot(kind='scatter', x='Crash Longitude',y='Crash Latitude', title = "2014", marker='o', alpha=0.1, s=.5, ax=axes[0,1]); df15.plot(kind='scatter', x='Crash Longitude',y='Crash Latitude', title = "2015", marker='o', alpha=0.1, s=.5, ax=axes[1,0]); df16.plot(kind='scatter', x='Crash Longitude',y='Crash Latitude', title = "2016", marker='o', alpha=0.1, s=.5, ax=axes[1,1]);
Looking at these scatterplots, you can clearly see the NW to N area of 1604 as well as I10 to 410, and the NW side of 410 become increasingly darker with every year. Looks like these parts of the highway are becoming more and more dangerous. Is it possible the infrastructure does not support the amount of traffic? Let's look into the most accident-prone roads in San Antonio for the past year only.
Not surprisingly, San Antonio highways dominate the list (with IH10 being most accident-prone.) This is understandable with the amount of traffic that travels on highway routes compared to non-highways. But, what are the most dangerous San Antonio non-highway roads?
#first, remove all highways and then plot (top 10) dangerous_rds = df16[~df16['Reported Road'].str.contains('IH0010|IH0035|IH0410|SL1604|US0281|SL0410|US0090|IH0037|FM1604|SH0151')] dangerous_rds['Reported Road'].value_counts()[:10].plot(kind='barh')
Interesting...but as many San Antonians know, our roads can stretch across the city so let's plot the accidents of the top 5 non-highway roads on this list over the 2016 scatterplot. We will color code them by road.
# plot non-highway accidents over entire map, why is culebra so dangerous? plt.scatter(df16['Crash Longitude'],df16['Crash Latitude'],marker='o', alpha=0.1, s=.5) plt.scatter(bandera['Crash Longitude'],bandera['Crash Latitude'], c='r', alpha=0.2) plt.scatter(babcock['Crash Longitude'],babcock['Crash Latitude'], c='b', alpha=0.2) plt.scatter(blanco['Crash Longitude'],blanco['Crash Latitude'], c='g', alpha=0.2) plt.scatter(wurzbach['Crash Longitude'],wurzbach['Crash Latitude'], c='m', alpha=0.2) plt.scatter(military['Crash Longitude'],military['Crash Latitude'], c='y', alpha=0.2) plt.show()
You can now see the accidents that occurred on the most accident-prone, non-highway roads in San Antonio. The most glaring insight I gathered from this plot was that the most dangerous roads all exist in the N/NW area of San Antonio! I wonder if anything is being done to address the infrastructure and therefore, safety, of San Antonio residents in these areas? Could this be because of large population increases, construction, or something else? More research will need to occur in order to prove any of these hypotheses but the dataset does give us a dimension for "Crash Reason." Let's see what the top reasons are for accidents across all SA roads.
result['Crash Contributing Factor List'].value_counts()[:10].plot(kind='barh')
Driver inattention is a pretty broad reason but it dominates the list of crash contributing factors.
For now, it looks like my fellow North West/Northsiders need to be extra cautious when driving around! Many hypotheses were presented in this article and future posts will focus on diving a bit deeper and addressing the following topics:
- a better understanding of "driver inattention"
- alcohol related accidents and the effects of ride sharing programs
- creating a model to determine the likelihood of a car accident in San Antonio depending upon your travel route
Let me know if there is anything else you'd like me to focus on in future posts by commenting below! Feel free to grab the data yourself and explore - it can be pulled from this GitHub repository.