/*SHORT TITLE: How to process another state file */ Background: The files that come from ICPSR have the first 4 records broken into five sections. Record 5 is broken into 2 sections. Unfortunately, there are no record identfiers in the sections that follow the first record of each type, e.g., there isn't something like 11, 12, 13, 14, 15, 21, 22, 23, 24, 25, 31, 32, etc.. This wouldn't be a problem except that there is not always a complete sequence of records numbered 1 to 5. Here is the record count. record type 1 has 3,921 records record type 2 has 3,917 records record type 3 has 2,931 records record type 4 has 3,232 records record type 5 has 3,932 records 1. In order to solve the problem of no record identifiers in continuation records, the records for each record type were joined together into one long record and a case id was added in columns 1-5 to identify groups of records that belong together. This is done with a perl program, join.pl. To run this on a new state file from ICPSR, just change "ny" to the new state code. 2. Once you have the file stf4a-70p-[state]-new.dat, you can run the Spss program, stf4a-70p.sps, to create an Spss dataset of all the variables. Again, change the state code in the program. The reason we're using Spss is (1) it can read the Fortran input statement that came with the codebook and (2) it can handle "grouped" data where not all records are present for a case. (That's why we need the caseid variable.) All the variables are read because it's too hard to figure out their individual positions when you are working with Fortran input statements that look like this: (F5.0,F1.0,A2,3X,A2,3X,A1,2X,A3,1X,A4,2X,A4,A2,6X,A1,11X, A1,A4,1X,A4,A2,A4,2A1,2A3, A2,A4,18X,A2,A1,3X,17x,A1, 7F16.0,8X,7F16.0,8X,7F16.0,8X,7F16.0,8X, 7F16.0,8X,7F16.0,8X,7F16.0,8X,7F16.0,8X, 4F16.0,112F8.0,255F8.0,255F8.0,255F8.0,241F8.0,112X). 3. Taking a subsample. See the program, get-smsa.sps for an example. This particular example picks only SMSAs (which is most of the tracks, since mostly all the areas that had track level data in 1970) and a subset of tables. It also constructs a variable, "geoid", that is the concatenation of state, county, tract, and tract suffix. This is useful if you are going to merge the population data with the housing data. 4. Note on coding for suppressed data. If data for a particular table is suppressed, the first table cell has a -1 code. This is a pain. To get rid of this, add RECODE statements to the program for the variables you pick for your subset, e.g.: do if (B17_1 eq -1). recode B17_1 to B17_54 (-1 = SYSMIS) (else = SYSMIS). end if. See get-smsa.sps for examples. Questions to Sue Zayac sue@columbia.edu