Stata for a PC or Mac: Brief Information

EDS > Statistical Software > Stata > PC and MAC
printer friendly version Print
Page

Abstract: Stata is a general-purpose statistical package. This document offers a brief introduction to Stata on a PC or Mac, with several examples of reading ascii data.

Stata is available to Columbia University students, faculty, and staff at a significant discount from their normal prices. See this URL for site license information.


Sections:
Documentation:

There are many Stata manuals. The most important are:

  1. Getting Started: a brief overview.
  2. Stata User's Guide: a thorough step-by-step overview of Stata's features.
  3. Stata Reference Manual: a logically organized description of all Stata commands in 3 volumes.

See the Stata Bookstore for a list of other documentation and books on Stata and statistics.

A set of current manuals is in the Electronic Data Service in 215 Lehman Library. Older versions are also available for reference in the Lehman Library Permanent Reserves section.

NOTE: Just about everything in the printed manuals, and a lot more, is available with the interactive "help".

Stata Interactive Help:

Stata has extensive interactive help.

The Stata Interface:

On the PC and Mac, Stata has a number of windows: You can use the pull-down window menu or just point-and-click on the one you want.

You can close any window except the Command and Results windows. Windows can overlap as you use them and you can lose track of themm oarticularly on the Mac where they are all free floating. Use the window menu to find them or move from one to another.

When Stata is open two windows always appear and must remain open during the entire session: the command window and the results window. Many of the most common commands can be entered either by typing the correct syntax in to the command window or by using the choices listed in the program's pull-down menus. This guide contains references to both the syntax for commands and the file menu technique to selecting some commands.

Using and Saving Stata Datasets:

A "Stata Dataset" is one in the special Stata format. Stata datasets have the extension ".dta". To open a Stata Datasets, either use the "open" option in the file menu or, in the command window, type the use command followed by the name of a Stata dataset.

    use survey1

If you only need some of the variables from a Stata Dataset, you can just read in those variables with this variant of the use command:

    use age sex status using survey1

To save a Stata Dataset, either use the "save" or "save as" option in the file menu or, in the command window, type the save command plus a filename. If it already exists, you will need to add the option ,replace. You do not have to type the file extension. The extension will be ".dta" by default.

    save survey2, replace

Note: about versions of Stata. A dataset created by the most recent version of Stata, cannot be read by versions earlier than version 8. To create a file that can be read by version 7 of Stata use the saveold command.

      saveold survey2 

If you are using Stata/SE and want to save the dataset for use in the smaller, Intercooled version of Stata, use the option, intercooled on the save command:

      save surveyl, intercooled 

String variables must be less than 80 characters to be save in intercooled.

Some Useful Stata Commands:
Increasing Memory:

If you get the message "No more room for observations" (as opposed to variables), you don't have enough memory to read in your entire Stata dataset. The command, "memory", gives a report on memory usage. To increase memory, give the command:

 set memory #mm 

Where "#" is a number and "mm" is megabytes.

You may be able to reduce your memory requirements by saving your data more efficiently. Stata's default variable type is 8 bytes. This is unnecessarily large for most social science data. Use stata's "compress" command to reduce your data to its most efficient format and then resave your file.

Note: The "compress" command does not create a compressed version of your file in the way that compression utilities such as gzip or pkzip do. Rather, the Stata "compress" command changes the data types to store your variables such that each variable is stored optimally. See the Stata Manual for more information on Stata variables types.

Further note: If you get the message "No more room for variables" (as opposed to observations), you have too many variables. Intercooled Stata has an absolute limit of 2,047 (2**11 -1) . Stata SE has a limit of 32,767 (2**15 - 1). If you know the names of the variables, you can read in only the ones you need. Since Stata works almost entirely in memory, the fewer the variables (and observations) the faster it runs.

Reading ASCII Data into Stata:

The two most common commands to read data from an ASCII file into Stata are insheet and infile:

  1. insheet - Use insheet if the file was created by a spreadsheet or a database program with one observation per line and the variable delimiter is a comma or a tab character. If you are comming from Excel, create a .csv file. The first line can be a list of variables. A period (.) is understood to mean a numeric missing value; double quotes ("") to mean a missing string variable.
  2. infile is used to read fixed format raw data without delimiers using a dictionary file. See example below or click here for more information on writing a dictionary file.

The syntax used in the command window and examples are below. The commands can also be initiated from the file menu using the Import option.

Insheet:
    Syntax: insheet using filename , options

where "filename" is the name of the ascii file created by the spreadsheet or database program. By default, Stata will assign the names v1,v2,...,vn to the variables. If you saved the spreadsheet file with variables names in the first row, Stata can use them if you specify the option "names", for example:

 
    insheet using mystuff.dat, names

If you didn't save the spreadsheet file with variables names, you can add them later with the label command. If you have a lot of varibles, make up a .do file with all the label commands.


Infile with a Dictionary File:
    Syntax: infile using  dictionary-file

If the variables in your data file are not delimited, you need a Dictionary File to describe the positions of your variables to Stata.

Where "dictionary-file" is the dictionary file containing the specifications for reading the variables. Here's an example.

   dictionary using dump.dat { 
       _column(1)        id %5f  
       _column(6)        age %2f 
       _column(8) str1   sex %1s 
       _column(9) str1   status %1s 
        }

Click here for more in formation on writing a dictionary file.


Import Option in the File Menu:

You can also read in ASCII datasets using the pull-down File menu under the "Import" option. If the data isn't delimited, you will still need a dictionary file or the information on the column location and data type of your variables.


Jane Weintrop, Sue Zayac
Electronic Data Service