Chapter 3 Data transformation
The “ACCIDENT DATE” column has been split into its “DAY”,“MONTH” and “YEAR” counterparts in order to make it easily accessible. This data set is then converted into subgroups and saved as jsons for the interactive D3 plots at github/karth2512.
High-level spatial mapping in Plotly
Data Cleaning Steps: - Removing anomalies in Latitude and Longitude values. - Converting the categorical columns into factors. - Converting the Date column into Date class. - Saving transformed data frame as csv
There are certain latitudes and longitudes that are not part of the state of new york but are still part of this dataset. Since our analysis is only related to NYC, we drop these values by crosschecking the lat-longs with the boundaries of NYC.
We create a subset by sampling 50,000 rows from the original data. This specificly comes in handy when the original dataset is unwieldy.