# delimit ; clear; set memory 1g; set more off; version 10; capture log close; log using dataIssues.log, replace; * ###############################################################################################; * script outlines data issues; * ###############################################################################################; *************************************************************************************************; * Part 1: build data set; *************************************************************************************************; * load DG data set; use DATA1, clear; drop sst* statfips state; * merge in other weather data sets we replicate; sort fips year; merge fips year using dataPanelNew, keep(dday8_32dm dday8_32dmAvg d_dday8_32dmH2); drop _merge; * note: transfer replicated data from degree celsius to Fahrenheit; replace dday8_32dm = dday8_32dm*1.8; replace dday8_32dmAvg = dday8_32dmAvg*1.8; replace d_dday8_32dmH2 = d_dday8_32dmH2*1.8; preserve; *************************************************************************************************; * Part 2: analysis; *************************************************************************************************; * part 2a: climate change predictions and climate baseline in DG; gen DGclimateChange = (dd89_h2_long - dd89_7000); * replication of the Hadley II model; rename d_dday8_32dmH2 climateChangeRep; collapse (mean) DGclimateChange climateChangeRep dd89_7000 dd89 dday8_32dmAvg dday8_32dm, by(fips); * number of counties with baseline climate of zero; gen byte climateZero = 1 if (dd89_7000 == 0); tab climateZero; drop climateZero; * baseline climate for counties in Texas; * Caldwell county; list dd89_7000 if (fips == 48055); * Fayette county; list dd89_7000 if (fips == 48149); * minimum and maximum climate change predictions; sort DGclimateChange; display "min temperature change of " DGclimateChange[1] " for fips code " fips[1]; display "max temperature change of " DGclimateChange[_N] " for fips code " fips[_N]; * climate change predictions for counties in California; * Fresno county; list DGclimateChange if (fips == 6019); * Kings county; list DGclimateChange if (fips == 6031); * Tulare county; list DGclimateChange if (fips == 6107); * replication of same variables; * Fresno county; list climateChangeRep if (fips == 6019); * Kings county; list climateChangeRep if (fips == 6031); * Tulare county; list climateChangeRep if (fips == 6107); * save climate change predictions; sort fips; outsheet fips DGclimateChange climateChangeRep dd89_7000 dday8_32dmAvg using DGclimateChangePredictions.csv, comma replace; *************************************************************************************************; * part 2b: correlation between various weather variables in DG; * correlation in DG; correlate dd89 dd89_7000; * correlation in replication of their variable; correlate dday8_32dm dday8_32dmAvg; * drop observations where value in DG is zero; drop if (dd89_7000 == 0) | (dd89 == 0); * correlation in DG; correlate dd89 dd89_7000; * correlation in replication of their variable; correlate dday8_32dm dday8_32dmAvg; *************************************************************************************************; * part 2c: compare average and standard deviation to our replication; restore; preserve; * make data sets consistent by limiting them to observations that have nonmissing values for both; drop if (dd89 >= .) | (dday8_32dm >= .); summ dd89 dday8_32dm; * derive year-to-year variation for each county; * DG's weather variable; bysort fips: egen dd89_std = sd(dd89); * our replication of their weather variable; bysort fips: egen dday8_32dm_std = sd(dday8_32dm); summ dd89_std dday8_32dm_std; *************************************************************************************************; * part 2d: number of counties in various states; restore; keep fips; duplicates drop; gen state = floor(fips/1000); collapse (count) counties = fips, by(state); * counties in Iowa; list if (state == 19); * counties in Nevada; list if (state == 32); log close;