UTILITY PROCEDURES

proc print

proc print data=your.data (obs=n) noobs;
var name dx1 cost;
where dx1=“222”;
sum cost;
run;

helpful hint

It’s usually a good idea to print out the first couple of lines of a data set after you create it

proc sort

PROC SORT data=your.dat OUT= sortdat;  
/*note the output data set  is in the   work library */  
BY  var1; /* will be sorted by var1 ascending */  
RUN;

simple subset analysis using by

PROC PRINT data=sortdat; /* note using the output sorted data set */  
BY var1;  
SUM num_var;  
RUN;

subsetting using where

ENHANCING OUTPUT

titlen and footnoten 

title ‘your title here’;

label

LABEL;  
var1=”name for variable 1” ;  
var2=”name for variable 2” ;  
var3=”name for variable 3” ; 

format

proc print data=your.data;  
var charge;   
format charge dollar11.2;   
run;  

proc format

proc format;  
value format\_name   
range 1=’label’  
range 2 = ‘label’;  
run;  

combining proc format and format

proc format; /* first create the format with PROC FORMAT*/  
value gndrfmt 1=’Female’ /*note, no dot when *creating* the format*/  
              2=’Male’  
              0=’Unknown’;  
run;  

proc print data=your.data;     
    format gender gndrfmt. ; /\* apply the format, note dot\*/    
run;  

format libraries

format options

proc format;  
value boardfmt 
               low-49=’below’  
               50-99=’average’  
               100-high=’above average’;
proc format;  
value \$grade  
               ‘A’=’good’;  
               ‘B’=’fair’;  
               ‘C’=’poor’;  
               ‘U’=’see instructor’;  
proc print data=ia.empdata;  
format final\_exam \$grade.;  
run;  

using the data step to permanently assign variable attributes

data your.sasfile; /* data step creates new data file */     
infile ‘raw-data-file’; /*location of external file on your machine */  
input            
@1 var1 \$3.  /* input statement to read in variables */    
@4 date mmddyy8. /* informat for how to read in this variable */  
@12 var2 \$3.      
@15 var3 3.    
@18 var4 3.;    
format date mmddyy10.;/*format for how to display this variable */  
label var1=’ Variable One’ /* labels for how to name these variables */  
var2=‘Variable Two’  
var3=’Variable Three’;  
run; 

ods (output delivery system)

ods html file="C:\myfolder\myfile.html" style=sasweb;  
proc print data=mylib.mydat;  
var age weight gender;  
where gender="F";  
sum weight;  
run;  
ods html close;

review

MANIPULATING DATA SETS

use a data step to manipulate variables

DATA    myLib.myNewDataSet;  
SET    myLib.myOlsDataSet;  
/* additional statements manipulating the existing data set*/ 
RUN;

creating or redefining variables

sas functions

combining sas data sets: concatenating

DATA data.new;   
SET data.oldset1 data.oldset2 data.oldset3;   
RUN;

combining sas data sets: merging

DATA  data.new;   
MERGE data.oldset1 data.oldset2;   
BY var1; 

about merging data sets

the pain of merging  

merging trick to identify / exclude non-matching observations

data new;  
merge old1 (in=selected) old2;  
by var;  
if selected=1;  
run; 

the pain of merging II: variables must match exactly

convert a character variable to a numeric variable : input()

data  new_data; /* create new data set */
set  old_data;
num_var=input(char_var,informat.); /* e.g. numeric informat 3. */
run;

convert a numeric variable to a character variable: put()

data new_data;  
set old_data;  
char_var = put(num\_var, informat.) /* e.g. character informat $3. */   

rename a variable on the fly

data myNewData  
set myOldData1 myOldData2 (rename=(oldVar=newVar) );  
run; 

USING CONDITIONAL EXPRESSIONS

if-then-else

IF icd9 = ‘666’ THEN dx= ‘damien’;  
ELSE dx = "john";

using if-then-else to group variables

DATA new;  
SET old;  
IF cost = . THEN cat = ‘missing’;  
ELSE IF cost LT 2000 THEN cat = ‘low’;  
ELSE IF cost GT 2000 & cost LT 10,000 THEN cat = ‘mod’;  
ELSE cat = ‘high’;  
RUN;  

using an in statement to create a look up list

DATA new;  
SET old;  
IF dx IN (‘myocardial infarction’, ‘angina’, ‘chest pain’)  THEN cardiac= 1;  
ELSE cardiac=0;  
RUN; 

if-then-do-end to group statements

IF condition THEN DO;  
action;  
action;  
action;  
END;  
data new_data;  
set old_data;  
if var3=’X’ then total=sum(num_var1, num_var2);   
ratio =  num_var1 / num_var2;    
pct =  (num_var1 / total)*100  
run; 
data new_data;  
set old_data;  
total=sum(num_var1, num_var2);    
if var3=’X’ then do;  
ratio =  num_var1 / num_var2;  
pct =  (num_var1 / total)*100;    
end;  
else if var3=’Y’ then do;  
ratio =  num_var1 / num_var2;  
pct =  (num_var1 / total)*100;  
end;  
run;

arrays to create lists of variables

data new;  
set old;  
if dx1=. then dx1=999;  
if dx2=. then dx2=999;  
if dx3=. then dx3=999;  
.  
.  
.  
if dx100=. then dx100=999;   
if sex = . then sex = 999;   
if age = . then age = 999;   
run;  
data new;  
set old;  
array alldxs[102] dx1-dx100 age sex;  /*create the array*/  
do i = 1 to 105;  /*create subscript (new variable) to index the array*/  
if alldxs[i]=. then alldxs[i]= 999;  /*apply statement to array*/  
end;   
drop i;/*drop the index variable from the data set*/  
run;

using an array to restructure a data set

|sub | obs1 | obs2 | obs3 |
|1 | 4 | 5 | 6 |
|2 | 7 | 8 | 9 |

| sub | time | obs |
| 1 | 1 | 4 |
| 1 | 2 | 5 |
| 1 | 3 | 6 |
| 2 | 1 | 7 |
| 2 | 2 | 8 |
| 2 | 3 | 9 |

data new;
set old;
array myvars[3] obs1-obs3; /*define an array (list) of 3 existing variables*/  
do time=1 to 3;/*using time as the indexing variable*/  
obs=myvars[time];/*create variable "obs" equal to the indexed array variable*/    
output;/*add line of output to data set*/   
end;  
drop obs1-obs3;  
run;

restricting observations with if and where statements

data females;  
set male_female  
if sex = ‘F’;  
run;  

restricting variables with drop and keep statements

data new (drop=var1 var2);  
SET old;  
total=var3 + var4;  
run; 

using sas dates as constants