Chapter 3 Data transformation

The “ACCIDENT DATE” column has been split into its “DAY”,“MONTH” and “YEAR” counterparts in order to make it easily accessible. This data set is then converted into subgroups and saved as jsons for the interactive D3 plots at github/karth2512.

High-level spatial mapping in Plotly

Data Cleaning Steps: - Removing anomalies in Latitude and Longitude values. - Converting the categorical columns into factors. - Converting the Date column into Date class. - Saving transformed data frame as csv

There are certain latitudes and longitudes that are not part of the state of new york but are still part of this dataset. Since our analysis is only related to NYC, we drop these values by crosschecking the lat-longs with the boundaries of NYC.

We create a subset by sampling 50,000 rows from the original data. This specificly comes in handy when the original dataset is unwieldy.