Chapter 7 Conclusion

From our exploratory analysis, we gathered quite a few interesting inferences which could prove to be useful in many ways. For example, we’ve found out some of the most accident-prone intersections, so maybe the authorities could take additional measures to tackle these areas and make them safer. We also saw that arterial slow zones in Manhattan might’ve led to lesser deaths due to accidents, although there’s a lot of room to improve on that aspect since the number of people injured went up. We also found out some interesting statistics related to time, months and years, which might be useful in planning and designing infrastructure and policies that account for these variables, and hopefully help decrease the number of accidents in NYC. The Vision Zero Plan(introduced in 2016) emphasizes “Arterial Slow Zones” - lowering speed limits on specific roadways - to reduce avoidable accidents. It seemed to be pretty effective when it was first implemented in 2016, significantly reducing accidents from the previous years, but it seems like the overall effectiveness of the plan is fading away as accidents have been consistently on the rise since then. The findings of this analysis could be used to check the effectiveness of the safety measures adopted under Vision Zero. Finally, we do know that this dataset has been very popular and there are lots of analyses available on the internet, but the challenges involved in doing these analyses such as the sheer amount of variables and observations, is what drove us to give this a shot, and by no means, this is a complete analysis, but we believe its a good start and could turn into something worthwhile if explored a bit more deeper, like we tried to do in the last part of our analysis.

7.1 Limitations:

The analysis is only as good as the data that we have, which in turn is only as good as provided by the Transport and Accident Management System updated by the NYPD. This presents the most important issue of approximate location data. It points towards the fact that a particular crash incident location information is mapped to the nearest intersections and the location coordinates may not always represent the actual location of the crash. Hence this assumption needs to be accommodated when considering the results of, which is the most dangerous intersection or which is the most dangerous street, and all these findings must be taken with a pinch of salt. Another challenge is, as there is quite a lot of information that is not geocoded (15%), thus the exact location of the crash is unknown. This adds quite a bit of skew to our statistical analysis and over time it might widen this gap. In cases where collisions have happened in the middle of the street between two intersections, the decision on which intersection that data will be mapped too is totally dependent on the NYPD officer that reported the accident. Thus no information on the agreed tie-breaker has been provided. Unspecified data for the Contributing Factors for multiple records affects the efficacy in finding the cause of a majority of the vehicle collision information. Also, there’s another issue we found out wherein we saw there were a lot of accidents happening on highways and bridges interconnecting boroughs, and the metric to decide which borough should be reported for an accident happening midway between such locations is unknown.

7.2 Future Work:

There’s a lot that can be done as future work with this project. We could explore the cause of accidents with crime-related datasets and check if there’s some kind of correlation. We could include weather data to analyze the trends and effects of rainfall, snow on traffic accidents. We could also further separate out each policy within vision zero, such as the installation of new speed cameras, and correlate it with the number of violations happening and accidents leading because of those violations to further gain some important inferences. In short, there’s a lot of ways one could explore this dataset further, which needs investing a significant amount of time and effort in research.