Abstract
This is a brief guide to the essentials needed to know to
write an SPSS program. Although this document was
written for those using SPSS on the Cunix cluster, most, if not all
commands, should work on other operating systems. See our brief
handout,
SPSS on the Cunix Cluster or the SPSS manual specific to your
particular operating system for instructions on
running an SPSS program.
Note to SPSS for Windows users: If you are using
SPSS for Windows you probably will never need to write a program.
However, you may receive data with a syntax file. This handout can
help you understand it. If you have trouble running that file on
your PC, please contact us at EDS. We can help.
An SPSS Program Has 4 Parts:
Note: Steps #1 and #2 take most of the time, so plan
accordingly.
- Defining and reading the data (
FILE
HANDLE and DATA LIST or GET FILE).
-
Selecting and/or modifying the data (
SELECT IF,
RECODE, COMPUTE, etc.).
-
Statistical procedure(s)
(
FREQUENCIES, CROSSTABS,
REGRESSION etc.).
- Saving
a "save file" [optional but recommended for repeated runs].
Basic Rules for Writing Commands:
- A command must begin in column 1.
-
Continuation lines must be indented at least one
space.
-
A command ends with a "
." (period).
-
Case does not matter except in the name of the
file.
-
Quotes can be single or double but they must
match.
Part 1 - Defining and Reading in the Data:
First, you need to know what type
of data are you working with. Is it "raw data", an "SPSS save
file", an "SPSSportable file" or something else?
SPSS "save files" (.sav) are special format files that SPSS creates. In
the past SPSS save files could only be read on the same type of
computer/operating system where they were created. Newer versions of
SPSS save files can usually be moved to different operating systems
without problem.
SPSS "portable files" (.por) are also special format files that SPSS
creates. These are actually ASCII files and can be transported to
other computers and operating systems without problems. These are,
however, seldom seen anymore.
SPSS save and portable files are easy to use in that all the
variables are already defined. They can be read directly into Spss
without a program.
If you have
"something else", e.g., a EXCEL file or a Stata file and you have Spss
version 15 or later, you can also read data directly into Spss without
a program. Otherwise the file will have to be "transformed" to
SPSS format. Stat/transfer is available for this purpose on cunix
and at EDS. You can also buy your own copy.
See
this web page for more information.
If you have raw data, you will have to
do all the work of defining the variables yourself, i.e, write
a program.
Second, you need to know where
the data is, i.e., you need a "filename" and possibly a "path" if
the file is not in your home directory. Examples:
| File Name: |
File Type and Location: |
file1.dat |
raw data in your directory |
file1.sav |
an spss file in your directory |
surveys/file1.sav |
an spss file in a subdirectory of your home directory |
/u/9/s/me2000/surveys/file1.sav |
same as above with a full path name |
- If you are reading an SPSS .sav file you
need:
- Documentation listing the variables you want and their mnemonic
names in the system file.
- An SPSS program with
- a
GET FILE = command with the name of the save
file in quotes,
-
(optional) a
/KEEP (or
/DROP) subcommand with the names of the variables you
want to use (or not use). If you need all the variables,
leave this out.
- a period to end the command.
Example: GET FILE="/u/9/s/me2000/surveys/file1.sav"
/KEEP zodiac sex.
- If you are reading raw (ASCII) data you
need:
- Documentation describing the variables.
- Mnemonic names for the variables you want (up to 8 characters each
only) with
- Each variable's column position(s) in the file, e.g. 1-4, and
- Each
variable's type, i.e., integer, decimal, or alphanumeric.
You make up the names. You can use V[n] to V[m], e.g., V1 to V100,
if you want, but using mnemonic names is a lot easier in the
long run. You don't have to define all the variables in
your file, just the ones you need.
- The length of the records in the file (the
LRECL).
- An SPSS program with
- A
FILE HANDLE command with
- a "handle" (" be indented at least one space.
- A command ends with
a "
IN" in the example below),
- a "
/" (slash),
- a
NAME= subcommand
with the name of the raw data file in quotes,
- the
LRECL=
subcommand, and
- a period to end the command.
- A
DATA LIST command with
- a
FILE=handle subcommand,
- a "/" (slash), and
-
then the list of the mnemonic variable names,
each followed by its column positions, and its type if
it has a decimal place or is an alphanumeric, and
- a period to end the command.
Example: FILE HANDLE MYFILE/NAME="file1.dat" lrecl=1200.
DATA LIST FILE=MYFILE /
PERSONID 1-4
SEX 6
BIRTHYR 7-10
INCOME 15-21 (2)
STATE 55-56 (A).
In the example above, INCOME has 2 decimal
places and STATE is a 2 column character variable. Note
that you don't have to define all the variables in your
file, just the ones you need.
- Some raw data files can have multiple line of
data for each case.
This frequently happens
with opinion surveys where the responses from one
respondent are reported on two or three lines (in
the documentation often referred to as "records" or
"cards"). Use the subcommand, RECORDS=
following the DATA LIST command.
- a
records=# subcommand placed after the
file handle, with # = the
number of records per case,
-
a "/" (slash) marking the start of each record followed by an
integer that indicates which record it is.
Example: FILE HANDLE MYFILE/NAME="file3.dat".
DATA LIST FILE=MYFILE records=3
/1
P-ID-REC1 1-4
SEX 6
BIRTHYR 7-10
INCOME 15-21 (2)
STATE 55-56 (A)
/3
P-ID-REC3 1-4
industry 5-8
occup 9-11.
In the example above, there are three records per case. Note that no
variables are defined for record type=2. You only need to
define the variables you need.
- If you are reading an SPSS portable file you
need:
- Documentation listing the variables you want and their mnemonic
names in the portable file.
- An SPSS program with
- an
IMPORT FILE= command with the name of the portable
file in quotes,
-
(optional) a
/KEEP (or /DROP) subcommand
with the names of the variables you want to use (or not
use). If you need all the variables, leave this out.
- a period to end the command.
Example:
IMPORT FILE="/eds/datasets/gss/data/gss94-all.por"
/KEEP zodiac sex.
Part 2 - Selecting and Modifying the Data:
This part is optional. You may not need to select cases or modify or
create new variables. But if you do, these are the most common
commands.
SELECT IF - This command selects whole CASES, usually people.
Examples: SELECT IF (sex = 1).
SELECT IF (STATE = "NJ").
select if (any(racegrp,4,5,6,8)).
Warning! The effect of multiple SELECT
IF statements is cumulative. See the manual on using the
TEMPORARY command if you don't want this.
COMPUTE - Create a new variable.
Examples: COMPUTE NEWAGE=0.
COMPUTE YRRETIRE=BIRTHYR+65.
COMPUTE INCOME=salary+interest+divdnds.
RECODE - Change the values of a variable. It is best
to do this on a new variable created from an old one so you don't
lose the old values. You never know when you may have to backup
and use them again. The default format for new integer
variables is F8.2. It's worth making this more efficient with the
FORMAT command.
Example: RECODE AGE (MISSING=9)(18 thru HI=1)(LOW thru 18=0) into VOTER.
RECODE PLACE (1=1)(2 thru 7=2)(else=0) into CITYTOWN.
RECODE MONTH (" "=99) (CONVERT) ("-"=11)("&"=12) into NEWMONTH
FORMATS VOTER CITYTOWN (F1.0) NEWMONTH (F2.0)
IF - Conditional change. This is useful for cleaning
data as well as recoding (3rd example below).
Examples: COMPUTE WORKWK=0.
IF (WORK GT 0 and WORK LE 35) WORKWK=1.
COMPUT PLRTY = 1.
IF RANGE(VALUE(PLURALTY),2,8) PLRTY = 2.
IF (STATE EQ "JN") STATE="NJ".
MISSING VALUES - Declare some values of a variable
"missing" so they won't be used in statistical calculations.
Example: MISSING VALUES AGE (0)
Score1 to Score10 (999)
STATE ("XX").
Warning! Missing Values affect RECODE
and COMPUTE statements and can have unexpected results. When you create or modify any variable be sure to check
very carefully what happened with the Missing
Values. For example:
MISSING VALUES PLURALTY (2 THRU 8).
COMPUTE PLRTY=PLURALTY.
RECODE PLRTY (2 THRU 8 = 2).
won't work. Cases coded 2 through 8 are Missing and won't be recoded.
(See the Manual for the VALUE function to get around
this.)
Part 3 - Statistical Procedures:
- Very Important!!! Before you do any other
analysis, run
FREQUENCIES on all the variables
you are going to use in your analysis so you know what your
data looks like. Check the
FREQUENCIES output for
mis-codings and unusual outliers. (Hints: Be careful about running
FREQUENCIES on variables with unique or nearly unique values, e.g., ID or
INCOME. Use the subcommand /FORMAT=ONEPAGE to save space.)
- Decide what statistics procedures are appropriate for your research.
You and your advisor/statistician have to do this. EDS doesn't provide
statistical consulting.
- Look up the particular procedure command in the manual and choose the
subcommands you need.
Part 4 - Saving the SPSS Save File:
- Decide where you want to save the file, e.g., your home directory or
a subdirectory.
- Pick a name. Do not use punctuation other that an underline ("_") in
the name. The file
extension is ".sav".
- To create a save file, add the command
SAVE
OUTFILE= at the end of your program.
Examples:
SAVE OUTFILE="april.sav".
SAVE OUTFILE="surveys/april.sav".
SAVE OUTFILE="/u/9/s/me2000/surveys/april.sav".
Some other nice (but optional) commands:
| |
TITLE |
Puts a title line on your output. |
SET WIDTH 80 | Narrow the width of the output so you can easily
read it on a computer screen. |
SET HEADER NO |
Turn off the page headings after page 1. |
N OF CASES x |
Run on x number of cases to test the program. |
SAMPLE [percent] |
Take a percentage sample of cases. |
SAMPLE [n from m] |
Take a sample of n cases from m cases. |
COMMENT |
Write a comment lines in your program. Highly
recommended. The comment can extend for many lines until it
ends with a period. |
* |
Alternate way to start comment lines. |
Example of a Complete Program
TITLE "A very simple program".
set width 80.
file handle in /name="/p/us/sue/zspssx/famous.dat".
DATA LIST FILE=in /
idnum 1-3 (N)
fname 4-15 (A)
lname 16-27 (A)
age 28-29
sex 30
byear 31-34
dyear 35-38
status 39.
VAR LABELS
idnum "Case Number"
fname "First Name"
lname "Last Name"
age "Age at Death"
sex "Sex"
byear "Year of Birth"
dyear "Year of Death"
status "Status".
VALUE LABELS
sex
1 "Male"
2 "Female" /
status
1 "Real"
2 "Fictional"
3 "Possibly Real".
missing values
status (3).
select if (sex eq 2).
freq vars=status.
save outfile="famous_females.sav".
|