EDS has a NCHS CD-ROM collection, which includes a number of titles that can be accessed by using NCHS's Statistical Export and Tabulation System (SETS) software. This technical note on the SETS User Interface will guide you through the process of extracting a subset of the NHIS data from the CD-ROM for your analysis.
The export procedure involves four major steps:
Getting Started
SETS is installed in the EDS on the stand-alone machine. Ask staff for the data disc for the survey you want to use.
To start SETS 2.0:
- Right click the Start Menu on the bottom left hand corner of the screen.
- Select Explore.
- Go to Program Disk (D:).
- Double click the Program Files folder.
- Double click the NCHS folder.
- Double click the file Sets2.exe .
- Finer controls over records and fields.
- Fields are exported in a fixed format.
- Records are only in a fixed length, carriage return, line feed delimited format under the ASCII export.
- Various binary formats are available under the database export.
- ASCII with optional SAS or SPSS input statements (RECOMMENDED)
- dBase .dbf format
- EpiInfo input file (.qes)
- EpiInfo labels (.chk)
- Beyond 2020 source definition file (.ivs)
- Beyond 2020 labels (.ivn)
I. Opening the Data Set
From here, select the year or dataset you want to use (e.g. NHIS 2000) to start up the SETS software. When SETS starts up, you will be presented with a blank screen:
Figure 2: The Start-up Screen

To begin, choose File...Open...Set. Double-click on My Computer , then on the drive letter (F:) and finally on the NCHS folder. You will be presented with a list of set files (file extension = .set).
Figure 3: Choosing a set file

Choose the appropriate file by double-clicking it.
A title page screen comes next.
Figure 4: SETS Title Page

Read it, then press any key to continue. The licensing agreement will appear next.
Figure 5: SETS Licensing Agreement

Press F10 or click to agree to the terms of the license.
You will now see a screen that shows the sub-directories in the selected set file: Documents, Data, and Search.
Figure 6: Sub-directories

II. Browsing Documentation
For the purpose of extracting a subset, we are mainly concerned with two procedures: Browsing Documentation and Exporting/Subsetting Data.
For our first step, browsing, you will need to look at the codebook in order to know the variable names, and, if necessary*, the column locations of the variables which interest you. Codebooks are a useful way to familiarize yourself with the data and even offer simple frequency information to aid in variable selection.
The codebook can be accessed by clicking on the + to the left of the Documents Folder and selecting the document corresponding to the desired dataset. Within the Documents folder, there will be a list of sub-directories associated with the set or sets you have chosen to look at. Codebooks are generally found in the sub-directory entitled Record Layout. If you do not immediately see 'Record Layout' as an option, you may have to look within one of the available document folders by clicking on the + to the left of the document of interest. Click on the words 'Record Layout' (NOT on the folder icon), and you will see the data dictionary. You will need to write down the variable names of the variables that you are going to export.
* You will only need to know column locations if you are bypassing the SETS software and directly reading the raw data file with software such as SAS or SPSS. This is not recommended.
Figure 7: Sub-directories

III. Exporting Data/Defining a Subset
Now that you have browsed the documentation for an NHIS set, you are ready to create a subset. A subset is a user-specified collection of files, variables and cases in the data set. It may contain one or more files, all or some of the observations or cases (records), and all or some of the variables (fields). This is the most important step for your data export, because you must define a subset before you can either export or tabulate the data in SETS.To define your subset, you must issue the Export command. In order to do this, click on the + to the left of the Data folder. You will see a list of different data files:
Figure 8: Data Files

Now click on the name of the data file you are interested in. A pop-up menu of four choices will appear:
Browse Data
Tabulate Data
Export Data>
Extract Data
NOTE: We will export, rather than extract, data. Extraction, unlike the export function, yields records containing all fields in their original format. This is useful if you need access to all the variables but do not require all the observations in your analysis, allowing you to select a stratified sample (e.g. every tenth record). In general, we recommend users export their needed data.
Exporting as opposed to extracting data has the following advantages:
You can export your data either in ASCII format or one of the following formats:
If you choose to export your subset in dBase .dbf format, select DataBase under Export. While downloading in dBase format takes less time, you will not have the option of saving variable/value labels, or generating a codebook.
In addition to SAS, SPSS, and dBase, you may have the option to generate source code for the following formats depending on the dataset you are using:
After choosing ASCII or DataBase, you will see a light grey screen for a few seconds, then a field list will appear.
(i) Selecting Fields (Variables)
You will now have the opportunity to choose the variables you would like in your subset. You must select variables before you export or tabulate the subset. The screen that appears will allow you to choose which fields you want to export.
Figure 9: Field Selection

You may check the box next to Sort in order to have the variables listed in alphabetical order. You may also search field labels or code labels.
Choose each variable that you wish to export by highlighting it with the mouse. You may select multiple variables by holding down the Control key while highlighting variables.
After you have selected all variables that you wish to export, choose Select at the bottom of the screen. A dialog box will appear:

Choose 'Yes' to begin selecting records if you need a subset.
Note: If a set is hierarchical, an additional dialog box will appear:

If you believe you want more than one file, select 'Yes,' otherwise choose 'No.'
(ii) Choosing Records (Cases)
When the following screen appears, choose 'Assist'.
Figure 10: Record Selection

You will then be taken to the next screen which will allow you to choose your records using a 'point and click' method.
It may be helpful to click on 'Save' after making your record selection, allowing you to recreate a duplicate subset at a later time as well as make any necessary modifications to your sample. If you have a previously saved record expression file, choose 'Load' to use this file. Otherwise click on Assist.
Now choose records based on your selection criteria:
Figure 11: Selection Criteria

Enter one or more field names and codes and select the desired operators and connectors. If you don't know the field names or codes (values), double-click any of the field name or code windows to get assistance in selecting these items. The field name will give you information on the variable name, whereas the code window will display how the answers were coded, i.e. the values for the variable (e.g. variable sex: male = 1; female = 2).
Press 'Refresh, (located in the upper lefthand corner of the window) to show subtotals and the total number of records selected as you enter criteria
Press 'Accept' to create a query expression and return to the record selection window.
Figure 12: Query Expression

Queries can be saved so you can access the same cases again from the raw data if needed. To continue, press 'Select'.
After you accept the query and tell SETS to export, a dialog box will appear asking you if you would like to select a target and begin the export:

Choose 'Yes' to enter a destination for your export file (e.g. e:\work\).
Figure 13: Selecting a Destination for your Export

Click on the box in the lower lefthand corner to ask SETS to create a codebook for your data.
Figure 14: Selecting the Target

Click on the 'Source Code Generation' to ask SETS to create input statements for the statistical package of your choice.
Figure 15: Source Code Generation

Finally, click on the 'Export' button to finish the export process.
A dialog box will appear telling you the number of files created and whether or not you would like to open them at this time. It is recommended that rather than examining the export within SETS to see whether or not your data meets your specifications, you instead open the statistical package you wish to manipulate the data within to ensure the data is usable in the format you require.
IV. SAS & SPSS Program Edits
For those who have exported the subset in ASCII format with SAS and/or SPSS statements, additional help may be needed to edit the SAS or SPSS program in order to create an SAS or SPSS system file. The fixes are outlined below starting with SAS.SAS
There are generally two sets of SAS statements generated from an export. The first will be the program you named, e.g. nhis94.sas and the second will have the letter f appended to it, as in nhis94f.sas. This program will create a library for the formats to be used in the second program. You must run the format program first, defining the library to be a directory on your system, as in:
libname library 'e:\work\abc1';
Then on the program to read in the data, be sure the libname statement for library points to the same directory as in your proc format program. The program SETS will have given you will have a statements:
LIBNAME LIBRARY 'd:\directory';
/* change d to the drive, and directory to the directory you
want to save the permanent SAS database in. It must be the
same as where the format library will be saved */
DATA LIBRARY.???;
/* change ??? to the name of the desired created SAS
database */
Replace the libname statement with two libname statements which point to the format library, called library AND where you want to output the SAS data set to (here called sasdata). Both can point to the same directory. This combined with your data statement will look like:
libname library 'e:\work\abc1';
libname sasdata 'e:\work\abc1';
data sasdata.nhis94;
etc.
If you have any more questions about the SAS fixes, please ask the EDS consultant.
SPSS
The problem with SPSS is important in that an unedited SPSS program may treat all variable types as alphanumeric when it may not be appropriate. Because many desired statistical functions do not operate on alphanumeric variables, this is potentially problematic. A simple way to to fix this problem is to review the syntax file and simply delete the (A) following the column locations for a given variable if necessary. If you are not sure whether or not a variable has been assigned alphanumeric status inappropriately, you may be able to refer to the codebook in order to determine what kind of information the variable contains or run frequencies on the variables in question to see what codes appear. A pair of the exported and edited SPSS programs are appended here for illustrative purposes, please ask an EDS consultant for more help in converting an ASCII file into an SPSS or SAS system file.
Appendix A: Comparison of Unedited and Edited SPSS Programs
(A) Unedited Version:
* Source file: AIDSKNOW.DAT.
* Records: SEX='2' and RACE='06' and AGE<='38' and
MARSTAT='6'.
DATA LIST FILE='demo.exp' /
EDUC 1-2 (A)
INCFAM 3-4 (A)
HEALTH 5 (A)
HIV 6 (A)
HIVRISK 7 (A)
.
VARIABLE LABELS
EDUC "EDUCATION OF INDIVIDUAL - COMPLETED YEARS"
INCFAM "FAMILY INCOME"
HEALTH "HEALTH STATUS"
HIV "HEARD AIDS VIRUS CALLED 'HIV'"
HIVRISK "CHANCES OF GETTING THE AIDS VIRUS ARE:"
.
VALUE LABELS
* Source file: AIDSKNOW.DAT.
EDUC '00' 'Never attended;...' * kindergarten only.
'01' 'Grade 1'
'02' 'Grade 2'
'03' 'Grade 3'
'04' 'Grade 4'
'05' 'Grade 5'
'06' 'Grade 6'
'07' 'Grade 7'
'08' 'Grade 8'
'09' 'Grade 9'
'10' 'Grade 10'
'11' 'Grade 11'
'12' 'Grade 12'
'13' '1 year college'
'14' '2 years college'
'15' '3 years college'
'16' '4 years college'
'17' '5 years college'
'18' '6 or more years...' * college.
'19' 'Unknown'
' ' 'Under 5 years of age'
/ INCFAM '00' 'Less than $1,000'
'01' '$ 1,000 - 1,999'
'02' '$ 2,000 - 2,999'
'03' '$ 3,000 - 3,999'
'04' '$ 4,000 - 4,999'
'05' '$ 5,000 - 5,999'
'06' '$ 6,000 - 6,999'
'07' '$ 7,000 - 7,999'
'08' '$ 8,000 - 8,999'
'09' '$ 9,000 - 9,999'
'10' '$10,000 - 10,999'
'11' '$11,000 - 11,999'
'12' '$12,000 - 12,999'
'13' '$13,000 - 13,999'
'14' '$14,000 - 14,999'
'15' '$15,000 - 15,999'
'16' '$16,000 - 16,999'
'17' '$17,000 - 17,999'
'18' '$18,000 - 18,999'
'19' '$19,000 - 19,999'
'20' '$20,000 - 24,999'
'21' '$25,000 - 29,999'
'22' '$30,000 - 34,999'
'23' '$35,000 - 39,999'
'24' '$40,000 - 44,999'
'25' '$45,000 - 49,999'
'26' '$50,000 and over'
'27' 'Unknown'
/ HEALTH '1' 'Excellent'
'2' 'Very Good'
'3' 'Good'
'4' 'Fair'
'5' 'Poor'
'6' 'Unknown'
/ HIV '1' 'Yes'
'2' 'No'
'3' 'It depends'
'8' 'No answer'
'9' 'Don't know/Refused'
/ HIVRISK '1' 'High (includes 26...' * who said they were HIV infected).
'2' 'Medium'
'3' 'Low'
'4' 'None'
'5' 'Already have...' * AIDS/AIDS virus.
'7' 'Refused'
'8' 'No answer'
'9' 'Don't know'
.
(B) Edited Version:
This version, among other things, drops the alphanumeric flags (A)s from the DATA LIST statement, , thereby reading the variables as numeric rather than character data. The values in the VALUE LABELS statement have no quotes either, since they are now defining numeric variables.
* Source file: AIDSKNOW.DAT.
* Records: SEX='2' and RACE='06' and AGE<='38' and
MARSTAT='6'.
DATA LIST FILE='e:\work\john\demo.exp' /
EDUC 1-2
INCFAM 3-4
HEALTH 5
HIV 6
HIVRISK 7
.
VARIABLE LABELS
EDUC "EDUCATION OF INDIVIDUAL - COMPLETED YEARS"
INCFAM "FAMILY INCOME"
HEALTH "HEALTH STATUS"
HIV "HEARD AIDS VIRUS CALLED 'HIV'"
HIVRISK "CHANCES OF GETTING THE AIDS VIRUS ARE:"
.
VALUE LABELS
EDUC 00 'Never attended;...'
01 'Grade 1'
02 'Grade 2'
03 'Grade 3'
04 'Grade 4'
05 'Grade 5'
06 'Grade 6'
07 'Grade 7'
08 'Grade 8'
09 'Grade 9'
10 'Grade 10'
11 'Grade 11'
12 'Grade 12'
13 '1 year college'
14 '2 years college'
15 '3 years college'
16 '4 years college'
17 '5 years college'
18 '6 or more years...'
19 'Unknown'
'Under 5 years of age'
/ INCFAM 00 'Less than $1,000'
01 '$ 1,000 - 1,999'
02 '$ 2,000 - 2,999'
03 '$ 3,000 - 3,999'
04 '$ 4,000 - 4,999'
05 '$ 5,000 - 5,999'
06 '$ 6,000 - 6,999'
07 '$ 7,000 - 7,999'
08 '$ 8,000 - 8,999'
09 '$ 9,000 - 9,999'
10 '$10,000 - 10,999'
11 '$11,000 - 11,999'
12 '$12,000 - 12,999'
13 '$13,000 - 13,999'
14 '$14,000 - 14,999'
15 '$15,000 - 15,999'
16 '$16,000 - 16,999'
17 '$17,000 - 17,999'
18 '$18,000 - 18,999'
19 '$19,000 - 19,999'
20 '$20,000 - 24,999'
21 '$25,000 - 29,999'
22 '$30,000 - 34,999'
23 '$35,000 - 39,999'
24 '$40,000 - 44,999'
25 '$45,000 - 49,999'
26 '$50,000 and over'
27 'Unknown'
/ HEALTH 1 'Excellent'
2 'Very Good'
3 'Good'
4 'Fair'
5 'Poor'
6 'Unknown'
/ HIV 1 'Yes'
2 'No'
3 'It depends'
8 'No answer'
9 'Don''t know/Refused'
/ HIVRISK 1 'High (includes 26...)'
3 'Low'
4 'None'
5 'Already have...'
7 'Refused'
8 'No answer'
9 'Don''t know'
.
Save outfile = 'e:\work\john\demo.sav'.
Execute.

