Finding the Right Data

EDS > EDS Services > Finding Data
printer friendly version Print
Page

The EDS web site provides guidance to those looking for data.

Most Importantly
EDS staff can help with the process whether you visit us in person at our location in 215 Lehman or remotely, via email, at eds@columbia.edu. Below is an outline of the help you can expect to receive.
Finding Data

Finding data is not always easy. The more complex the question, the more difficult finding appropriate data can be. The less experience the user has working with data, the longer it can take to help them refine the initial perspective and expectations to a feasible scope.

The [data] consultant's job is three-fold:

Some Useful Steps in Finding the "Right" Data
1. Identify the intended use of the "data".

"Data" may mean a few statistics to be put in a paper or millions of Census records to be run through sophisticated statistical analyses.

Don't assume that "data"="computer analysis". With the increase in CD-ROM distribution, fewer statistics are available in print form. The user may just need to use an electronic book. Or have been misdirected when the statistics are best found in a book.

A good first question:

"What will you do with the data once you have it?"

The context of the research is also important. The amount of effort the user "should" spend will depend on the final product and its schedule:

2. How much computer and analytical experience does the user have?

The size and complexity of a project need to match the user's capabilities as well as the time frame.

3. Define the topic precisely enough to narrow the search.

Every project has a goal. A topic to be addressed. A question to be answered. The user needs to define the topic sufficiently precisely to identify appropriate data, and may have done so already.

4. Determine the amount of data involved and its practicality.

5. Note the need to understand the data itself.

Measurement, method of collection, and quality of the data are all important in determining whether a set of data is appropriate to the problem, the context, and the proposed techniques.

This information is important in choosing a study, in analyzing it, and in understanding and presenting the results of the analysis.

The user needs to understand how data measurement and quality are documented. Quality and other collection issues should be addressed in the codebook. In the data itself, many studies use "missing" data indicators of various kinds. Census data, however, has separate flag variables which indicate when a data point has been "allocated" (estimated) or suppressed, which should be extracted along with the actual data points.