S-PLUS on the CUNIX Cluster

EDS > Statistical Software > S-PLUS
printer friendly version Print
Page

Abstract

S-PLUS 7 combines the elegance of the S language, the premier language for the development of statistical applications, with a comprehensive library of statistical, graphical, and data-processing techniques. There are a large variety of functions for statistical, numerical, and graphical techniques, including two- and three-dimensional plotting, interactive graphics and data visualization, basic statistics, Regression and ANOVA, multivariate statistics and graphics, time series analysis, survival analysis. The S-Plus language can be extended with user-written functions, and dynamically linked subroutines in C and Fortran.

Note: S-Plus Graphics require using the X11 window system. You can get a discount version of the X application, X-Win32, For Windows 95, 98, ME, and 2000 machines here. You will also need a copy of the terminal software, Putty. Once you have both installed, start the Xwin-32application. Then open the Putty configuration window. Load or define a session and then go to the "SHH Tunnels" menu and click on the "Enable X11 forwarding" box. Then click on the "Open" button and log in as usual.


Sections:

  1. Starting and Exiting S-Plus
  2. Documentation
  3. Some Common, Useful Functions and Commands
  4. Reading Raw Data into S-Plus
  5. Writing Raw Data from S-Plus
  6. Some Brief Information on S-Plus Objects
  7. Batch Mode
  8. Opening and Closing Graphics Devices
  9. Editing Objects
  10. The .First Function
  11. Cleaning up .Data

Starting and Exiting S-Plus:


Start S-Plus by typing "splus " at the UNIX prompt.

      $ splus -e

Note that all the letters of the command are in lower case. Adding the switch, "-e", enables emacs-style command line editing. It is optional. You will next see the S-Plus opening banner and prompt:

S-PLUS : Copyright (c) 1988, 2003 Insightful Corp. S : Copyright Lucent Technologies, Inc. Version 7.0.0 for Sun SPARC, SunOS 5.8, 32-bit : 2005 Working data will be in /u/1/z/zzz2000/MySwork >

You can now type in S-Plus commands.

On your first S-Plus session, a directory named MySwork is created directory to store your S-Plus work. All the objects you create are stored in a .Data subdirectory in MySwork. It pays to clean this directory out once in a while or just delete it and start a new one.

To exit S-Plus, type "q()"

      > q()

Documentation:


Online Help:

There are a number of ways to get help. Online documentation is available through the help function, e.g.:

      > help(mean)

You can also type a question mark followed by the name of a function.

	     
      > ?mean

If you are working in the X windows environment, you can type

      > help.start()
to get an interactive help system.

Documentation:

These documents are available on the web in PDF format at this EDS web page and at Insightful, Inc..


Some Common, Useful Functions and Commands:


objects()
List objects in your working directory. This can be used with pattern matching using unix regular expressions. For example:
objects(pattern="test[1-3]")
returns all objects whose names begin with test1 or test2 or test3.
		
"test1"   "test10"  "test11"  "test12"  "test12a" "test2"   "test3"
length(x)
Returns the length (number of elements) of the ojbect x.
dim(x)
Returns the dimensions of an object. In the case of matrices, the first element is the number of rows in the matrix and the second element is the number of columns.
nrow(x) and ncol(x)
The functions nrow() and ncol() are based on the function dim() and return the number of rows or the number of columns in a matrix.
rm (x)
Remove object x. This can also take pattern arguments in the form of regular expressions.
options()
See the list of options.
!
Enter a unix command, e.g.:
!ls -l survey* 
lists information about all the files in your directory starting with "survey".

See the section at the end on S-Plus Objects for some simple S-Plus data manipulation commands.


Reading Raw Data into S-Plus:


There are number of ways to read data into S-Plus. The scan function reads in a vector (row of numbers). The read.tables function reads data into a data frame (table of rows and columns). The importData function can be used to read Spss, Stata, Excel, and other datasets.

scan function:
  • Read in a vector of numbers from the keyboard (standard input):
           > test1 <- scan()
    
  • Read in a vector of character data from the keyboard (standard input): "4" is the maximum width of the character in the S-Plus data object
           > title <- scan(what=character(4))
    
  • Read in a 2xn matrix of numbers from the keyboard (standard input):
           > test2 <- matrix(scan(), ncol=2)
    
  • Read in a vector of numbers from a file of blank delimited data:
           > test3 <- scan("testnum.dat")
    
  • Read in a matrix of numbers from a file of blank delimited data:
           > test4 <- matrix(scan("testnum.dat"), ncol=15)
    
    where ncol is actually the number of cases.

  • You can also use the scan function to read a file of fixed format data with no delimiters. You have to supply two lists describing the data: "what" type of variable (list) and its "width" (vector).
          varwhat <-list(id=0,fname="",lname="",age=0)
          varwid  <-c(3,12,12,2)
          test5 <- scan("test.dat",what=varwhat,widths=varwid,flush=T)
    

read.table function:
  • Read in a file with space delimited variables:
           > test6 <- read.table("testblanks.dat",sep=" ")
    
  • Read in a file with tab delimited variables:
           > test7 <- read.table("testtab.dat",sep="\t")
    
  • Read in a file with space delimited variables and a Variable Titles Line (i.e., the column names, usually from a spreadsheet):
           > test8 <- read.table("testvarlabs.dat",header=T,sep=" ")
    
  • Read in a file of fixed format data with no delimiters. This is a pain and it may be easier and faster to fix up the file with some other application/editor. (See the example of importData function for a possible way to do this in S-Plus.) To do this, read.table needs a vector of the start location of variables to use as the "separator". It also expects a vector of column variables names (col.names) and row variables names (row.names). If you leave out the col.names, column variable will be called V1, V2, etc. It's unlikely you need names for row variables so declare them NULL. Make up the sep and col.names vectors first as individual vectors. "junk" is everything from column 28 to the end of the line.
          > colpos  <-c(1, 4, 16, 27, 29)
          > colvars <-c("id","fname","lname",
            +  "age","junk")
    
    Then run the read.table function:
          > test9 <- read.table("test.dat",sep=colpos,
            + col.names=colvars,row.names=NULL)
    
    I haven't figured out how to read only some of the data in a line using read.table, but you can get rid of "junk" defined above by defining a new object with just the first four variables:
          > test10 <-test9[1:15,1:4]
    

importData function:
The importData function can read in Spss, Excel, Stata, and other datasets. Type "help(importData) for more information. For Spss, Use the valueLabelAsNumber=TRUE option to insure values are numbers, not value labels (the default).
      > test10 <-importData("test.sav",valueLabelAsNumber=TRUE)

Writing Raw Data from S-Plus:


write.table function:
The best way to write raw output to a file is with the write.table function. This is like there read.table function. The default is a comma delimited file with row and column labels.
  • Write a file with space delimited variables:
           > write.table(test12, file="mat1.dat",sep=" ")
    
  • Write a file with tab delimited variables:
           > write.table(test12, file="mat2.dat.dat",sep="\t")
    
  • If you are going to use the data in a spead sheet you may need quotes around string variables:
           > write.table(test12, file="mat3.dat.dat",quote.strings=T)
    
  • Write a file without the column and row names:
           > write.table(test12, file="mat4.dat.dat",dimnames.write=F)
    
  • Write a file with only the column names:
           > write.table(test12, file="mat5.dat.dat",dimnames.write="col")
    
You can also designate a character string for missing values with the na= option.

exportData function:
This is the opposite of the importData function. Variables can be dropped or kept, filtered, a delimiter set for ascii files, etc. See help(exportData) for the entire list of options. Example:
       > export(test12, "test12.dta")
sink function:
Everything that would have come to your screen after you invoke this command goes to the designated file instead.
       > sink(file="temp.log")
       > summary(test12)
       > sink()
sink() ends the sink. This is a quick way to write out fixed format data:
       > sink(file="test12.dat")
       > test12
       > sink()

Some Brief Information on S-Plus Objects:


[Note: I found most of this next section somewhere on the web, but I lost the location - If you recognize it as yours, please send me e-mail and I'll give you credit.]

In order to create a data object, a name must be assigned to it. This is done using the underscore character "_" or the less-than character and a hyphen "<-", with the name of the object on the left, and the values on the right. Alternatively, the symbol "->" can be used with the values on the left and the name of the object on the right. The name must start with a letter and may contain letters, digits, and periods. S-Plus is case sensitive, x and X refer to two different things. The following are examples of data assignments:

Scalar Objects:

Assign the numeric value 175 to the scalar object named "height":

     > height <- 175

Assign the alphanumeric "Joe" to the scalar object named "person". Character values are inserted in quotes. If the quotes are omitted, S-Plus will look for a data object named "Joe" to assign to "person".

     > person <- "Joe"

Vector Objects:

The function c() "collects" the numeric values 160, 140, and 155 and stores them into the vector "heights":

     > heights <- c(160,140,155)

Create an alphanumeric vector of names:

     > people <- c("Ned","Jill","Pat","Ronnie")

Assigning Values:

Assign the value 162 to the first element of "heights":

     > heights[1] <- 162
     > heights
      [1] 162 140 155

The old object "heights" has now been replaced by the new object "heights".

Append the value 135 to the vector "heights":

     > heights[4] <- 135
     > heights
      [1] 162 140 155 135
The operator ":" creates a sequence from 1 to 5:
     > numbers <- 1:5
     > numbers
      [1] 1 2 3 4 5

Assigning Names:

The names() function assigns names to the elements of a vector:

     > names(heights) <- people

Typing the name of an object by itself causes its value to be printed on the terminal:

     > heights
      Ned Jill Pat Ronnie
      160  140 155 135

Using Subscripts:

Using a subscript to extract the second element from "heights".

     > heights[2]
      [1] 140

The [1] above refers to the position of the first element on the given output line on the screen- this is very useful when long vectors take up several lines. Notice that square brackets are used instead of parentheses. Round brackets are used by functions e.g., the c() and names() functions. Extract the second, first, and second elements from "heights": The c() function is used in the subscript when more than one element is listed.

     > heights[c(2,1,2)]
      [1] 140 162 140

Return all the values of "heights" which are less than 160:

     > heigths[heights < 160]
      [1] 140 155 135

Manipulating Matrices:

The function matrix() reads data into a matrix:

     > size.1 <- matrix(c(130,26,110,24,118,25,112,25),ncol=2)
     > size.1
          [,1] [,2]
      [1,]  130  118
      [2,]   26   25
      [3,]  110  112
      [4,]   24   25

The number of columns is specified using the argument ncol= #. Alternatively, the number of rows can be specified using the argument nrow= #or both nrow and ncol can be specified. When neither nrow nor ncol are specified, the data is read in as a one column matrix.

Specifying "byrow=T" forces S-Plus to read the data in row by row:

     > size.2 <- matrix(c(130,26,110,24,118,25,112,25),ncol=2,byrow=T)
     > size.2
          [,1] [,2] 
      [1,]  130   26
      [2,]  110   24
      [3,]  118   25
      [4,]  112   25

The list() function is used to combine two vectors of differents lengths:

     > size.names <- list(c("Abe","Bob","Carol","Deb"),c("Weight","Waist"))

The list is therefore made up of two components: the first component corresponds to the row names, and the second component corresponds to the column names.

The dimnames() function then assigns names to the dimensions of a data object (in this case, the rows and columns of object "size.2"):

     > dimnames(size.2) <- size.names
     > size.2
           Weight Waist
       Abe    130    26
       Bob    110    24
     Carol    118    25
       Deb    112    25

Extracting data from a matrix:

To extract one value from a matrix, it is necessary to use two elements in the subscript: the first element applies to the rows, the second element applies to the columns.

     > size[2,3]
      heights
       155

The full subscript expression applies to the elements of the matrix that satisfy both the row and the column condition in this case, the element in the second row, third column of the matrix "size" is printed.

If one dimension is not specified in the subscript, all elements in that dimension are extracted. In th case below, the columns are not specified so all the columns are included:

     > size[2,]
       Weight Waist heights
          110    24     155

To print the third column of the matrix "size":

     > size[,3]
      [1] 140 155 142 175 170

In both examples above, the comma must be kept in as a marker to indicate which dimension is specified. NOTE: In both of the above examples, S-Plus drops the extra dimension so that the result is a vector.

The c() function is used in matrix subscripts in the same way as it is used in vector subscripts. Here, the first and third columns of the matrix size are printed out:

     > size[,c(1,3)]
          Weight heights
     [1,]    130     140
     [2,]    110     155
     [3,]    118     142
     [4,]    112     175
     [5,]    128     170

Batch Mode:


You can also run S-Plus in batch mode. (BATCH must be all caps.)

       $ splus BATCH [source file] [output file]

You can only run in BATCH on the two cunix machines where Splus is actually running. These machines are fozimane.cc.columbia.edu and monire.cc.columbia.edu. The output file takes a while to close; be patient.


Opening and Closing Graphics Devices:

This only works in X-windows. To open a graphics window to display plotting and other graphics output, type:

      > motif()

To close a graphics window:

      > dev.off()

If a graphic window doesn't open when you need one, try the command:

      > options(gui ="motif")

before you type motif().


Editing Objects

Use the fix function. The data.ed function no longer exists.

      > fix(myfunk1)

By default this opens the vi editor. You can change this to emacs with the command options(editor="/opt/local/bin/emacs").


The .First Function:

You can create a .First function to customize your session environment, for example:

   > .First <-
   function()
   {
      options(editor="/opt/local/bin/emacs")
      length(24)
      help.start(browser="/opt/netscape4.6-us/netscape")
   }

The above commands set the default editor to emacs, set the length of the terminal page to 24, and starts help in Netscape.

Here's a fun graphics function for you to try. Call it butterfly.

function()
{
  theta <- seq(0, 24 * pi, len = 2000)
  radius <- exp(cos(theta)) - 2 * cos(4 * theta) + sin(theta/12)^5
  plot(radius * sin(theta),  - radius * cos(theta), type = "l", axes = F,
                xlab = "", ylab = "")
}

Cleaning up .Data:

All the objects you create are stored in your subdirectory, .Data. If you not careful, you can accumulate a lot of junk. To see a list of all your objects, give the command:

      > objects()

To remove an object, for example, test1, give the command:

      > rm(test1)