/*SHORT TITLE:  How to process another state file */

Background:

The files that come from ICPSR have the first 4 records broken into five
sections.  Record 5 is broken into 2 sections.  Unfortunately, there are no
record identfiers in the sections that follow the first record of each
type, e.g., there isn't something like 11, 12, 13, 14, 15, 21, 22, 23, 24,
25, 31, 32, etc..  This wouldn't be a problem except that there is not
always a complete sequence of records numbered 1 to 5.  Here is the record
count.

	record type 1 has  3,921 records
	record type 2 has  3,917 records
	record type 3 has  2,931 records
	record type 4 has  3,232 records
	record type 5 has  3,932 records

1.  In order to solve the problem of no record identifiers in continuation
records, the records for each record type were joined together into one
long record and a case id was added in columns 1-5 to identify groups of
records that belong together.  This is done with a perl program, join.pl.
To run this on a new state file from ICPSR, just change "ny" to the new
state code.

2.  Once you have the file stf4a-70p-[state]-new.dat, you can run the Spss
program, stf4a-70p.sps, to create an Spss dataset of all the variables.
Again, change the state code in the program.  The reason we're using Spss
is (1) it can read the Fortran input statement that came with the codebook
and (2) it can handle "grouped" data where not all records are present for
a case.  (That's why we need the caseid variable.)   All the variables
are read because it's too hard to figure out their individual positions
when you are working with Fortran input statements that look like this:

 (F5.0,F1.0,A2,3X,A2,3X,A1,2X,A3,1X,A4,2X,A4,A2,6X,A1,11X,
   A1,A4,1X,A4,A2,A4,2A1,2A3,
   A2,A4,18X,A2,A1,3X,17x,A1,
   7F16.0,8X,7F16.0,8X,7F16.0,8X,7F16.0,8X,
   7F16.0,8X,7F16.0,8X,7F16.0,8X,7F16.0,8X,
   4F16.0,112F8.0,255F8.0,255F8.0,255F8.0,241F8.0,112X).

3.  Taking a subsample.  See the program, get-smsa.sps for an example.
This particular example picks only SMSAs (which is most of the tracks,
since mostly all the areas that had track level data in 1970) and
a subset of tables.  

It also constructs a variable, "geoid",  that is the concatenation
of state, county, tract, and tract suffix.  This is useful if you
are going to merge the population data with the housing data.


4.  Note on coding for suppressed data.  If data for a particular
table is suppressed, the first table cell has a -1 code.  This is
a pain.  To get rid of this, add RECODE statements to the program
for the variables you pick for your subset, e.g.:

	do if (B17_1 eq -1).
	recode B17_1 to B17_54 (-1 = SYSMIS) (else = SYSMIS).
	end if.

See get-smsa.sps for examples.


Questions to Sue Zayac  
sue@columbia.edu