Stata for CUNIX: Brief Information

EDS > Statistical Software > Stata > PC and MAC
printer friendly version Print
Page

Abstract: Stata is a general-purpose statistical package. This document offers a brief introduction to Stata on CUNIX, with several examples of reading ascii data.

Stata 9 and Stata 9 SE (Special Edition) are currently available on the cunix cluster along with their X-Windows versions. Stata10 and Stata 10 SE are also available. Commands to invoke them are:

The X versions of Stata10 are not currently available.

Individual copies of Stata for Windows, Macintosh, and Linux are available to Columbia University students, faculty, and staff at a significant discount from their normal prices. See this URL for site license information.


Sections:
Documentation:

There are many Stata manuals. The most important are:

  1. Getting Started: a brief overview.
  2. Stata User's Guide: a thorough step-by-step overview of Stata's features.
  3. Stata Reference Manual: a logically organized description of all Stata commands in 3 volumes.

See the Stata Bookstore for a list of other documentation and books on Stata and statistics.

A set of current manuals is in the Electronic Data Service in 215 Lehman Library. Older versions are also available for reference in the Lehman Library Permanent Reserves section.

NOTE: Just about everything in the printed manuals, and a lot more, is available with the "help" command.

Stata Help:
Stata has extensive interactive help.
Using and Saving Stata Datasets:

A "Stata Dataset" is one in the special Stata format. Stata datasets have the extension ".dta". To open a Stata Datasets, type the use command followed by the name of a Stata dataset.

    use survey1

If you only need some of the variables from a Stata Dataset, you can just read in those variables with this variant of the use command:

    use age sex status using survey1

To save a Stata Dataset, type the save command plus a filename. You do not have to type the file extension. The extension will be ".dta" by default. If the file already exists, you will need the replace option.

    save survey2, replace

Note: about versions of Stata. A dataset created by the most recent version of Stata, cannot be read by versions earlier than version 8. To create a file that can be read by version 7 of Stata use the saveold command.

      saveold survey2 

If you are using Stata/SE and want to save the dataset for use in the smaller, Intercooled version of Stata, use the option, intercooled on the save command:

      save surveyl, intercooled 

String variables must be less than 80 characters to be save in intercooled.

Some Useful Stata Commands:
Reading ASCII Data into Stata:

The two most common commands to read data from an ASCII file into Stata are insheet and infile:

  1. insheet - Use insheet if the file was created by a spreadsheet or a database program with one observation per line and the variable delimitor is a comma or a tab character. If you are coming from Excel, create a .csv file. The first line can be a list of variables. A period (.) is understood to mean a numeric missing value; double quotes ("") to mean a missing string variable.
  2. infile is used to read fixed format raw data without delimiers using a dictionary file. See example below or click here for more information on writing a dictionary file.

The syntax and examples are below.

Insheet:
    Syntax: insheet using filename , options

where "filename" is the name of the ascii file created by the spreadsheet or database program. By default, Stata will assign the names v1,v2,...,vn to the variables. If you saved the spreadsheet file with variables names in the first row, Stata can use them if you specify the option "names", for example:

 
    insheet using mystuff.dat, names

If you didn't save the spreadsheet file with variables names, you can add them later with the label command. If you have a lot of varibles, make up a .do file with all the label commands.


Infile with a Dictionary File:
    Syntax: infile using  dictionary-file

If the variables in your data file are not delimited, you need a Dictionary File to describe the positions of your variables to Stata.

Where "dictionary-file" is the dictionary file containing the specifications for reading the variables. Here's an example.

   dictionary using dump.dat { 
       _column(1)        id %5f  
       _column(6)        age %2f 
       _column(8) str1   sex %1s 
       _column(9) str1   status %1s 
        }

Click here for more in formation on writing a dictionary file.


Increasing Memory:

If you get the message "No more room for observations" (as opposed to variables), you don't have enough memory to read in your entire Stata dataset. The command, "memory", gives a report on memory usage. To increase memory, give the command:

 set memory #mm 

Where "#" is a number and "mm" is megabytes.

You may be able to reduce your memory requirements by saving your data more efficiently. Stata's default variable type is 8 bytes. This is unnecessarily large for most social science data. Use stata's "compress" command to reduce your data to its most efficient format and then resave your file.

Note: The "compress" command does not create a compressed version of your file in the way that compression utilities such as gzip or pkzip do. Rather, the Stata "compress" command changes the data types to store your variables such that each variable is stored optimally. See the Stata Manual for more information on Stata variables types.

Further note: If you get the message "No more room for variables" (as opposed to observations), you have too many variables. Intercooled Stata has an absolute limit of 2,047 (2**11 -1) . Stata SE has a limit of 32,767 (2**15 - 1). If you know the names of the variables, you can read in only the ones you need. Since Stata works almost entirely in memory, the fewer the variables (and observations) the faster it runs.

The log File:

To start logging your session on cunix, give the Stata command:

    log using filename

where "filename" is the name of the file. It can be any name. ".smcl" will be the file extension. This is a Stata-proprietary format. To log to an ordinary ASCII file use the t option:

    log using filename, t

This will save the log to "filename.log".

If the file already exists, it will be appended to. Logging can be turned on and off any number of times during a stata session. The log file closes automatically when you exit stata. If you use the t option, the log file is an ascii file so you can edit it with any unix editor (pine, emacs, vi).


Stata in Batch Mode:

It is possible to run Stata in batch mode on cunix. Prepare a file with the stata command you want executed. Be sure to turn paging off. Save the file with the extension "do". ( Click here to see an example.) Then run stata in the background with this command:

    stata -q -b do mybatch > NUL &

where "mybatch" is the filename of the file containing your Stata commands. Output will be in a file with the same name as the input "do" file and with the file extension "log". (The "> NUL" gets rid of the "running on computer" message.)