The EDS web site provides guidance to those looking for data.
- DataGate serves as a gateway to studies from many different suppliers.
- Links to our key suppliers are provided so that, those who choose to, can use the supplier's interface to studies.
- Links to sources not in DataGate give assistance for those who need to look elsewhere in the Libraries' collection or on the Internet.
- Web pages, organized by major topic, are used to introduce and provide quick access to our most frequently used products.
Most Importantly
EDS staff can help with the process whether you visit us in person at our location in 215 Lehman or remotely, via email, at eds@columbia.edu. Below is an outline of the help you can expect to receive.
Finding Data
Finding data is not always easy. The more complex the question, the more difficult finding appropriate data can be. The less experience the user has working with data, the longer it can take to help them refine the initial perspective and expectations to a feasible scope.
The [data] consultant's job is three-fold:
- To understand the user's needs clearly,
- To help the user understand what the question/problem/project entails in the world of data analysis, and, then,
- To help the user find data appropriate to the refined view of the project. It may take several iterations of each step in the consultation process for a user to refine a project appropriately.
Some Useful Steps in Finding the "Right" Data
1. Identify the intended use of the "data".
"Data" may mean a few statistics to be put in a paper or millions of Census records to be run through sophisticated statistical analyses.
Don't assume that "data"="computer analysis". With the increase in CD-ROM distribution, fewer statistics are available in print form. The user may just need to use an electronic book. Or have been misdirected when the statistics are best found in a book.
A good first question:
"What will you do with the data once you have it?"
The context of the research is also important. The amount of effort the user "should" spend will depend on the final product and its schedule:
- What is it for? A course? Term paper? Thesis? Dissertation? Research project? Career?
- Who is doing the work? Is it a professor's, researcher's or student's project? Is the user the principal investigator?
- When does the project have to be done? In days? Weeks? Months? Years?
2. How much computer and analytical experience does the user have?
The size and complexity of a project need to match the user's capabilities as well as the time frame.
- "What program do you plan to use for the analysis?" SPSS? SAS? STATA
- "How familiar with the program are you? How much have you used it? What have you been taught about it in the course? ...."
- "What procedures do you plan to use for this project?"
- "How much data manipulation is involved? What did you do last time?"
3. Define the topic precisely enough to narrow the search.
Every project has a goal. A topic to be addressed. A question to be answered. The user needs to define the topic sufficiently precisely to identify appropriate data, and may have done so already.
- "What question are you trying to answer? What hypothesis are you trying to test?"
- "Have you written a proposal? Outlined the project? Listed the variables/measures/... that you will use? Defined the analytical model?"
- "How does this project/problem/question fit into the discipline? What methods are used in articles/papers about this or related topics?"
- One useful method of helping specify a topic more precisely is to have the user search the Datagate, reading the descriptions of the studies and of the data--variables, techniques. This can help by showing what questions have been posed before, what questions can be readily answered, etc. The ICPSR study descriptions sometimes contain bibliographic references.
- If time is critically short, choose studies that can be accessed right away (that applies to most of our studies) and that have documentation that is easy to use. Consider using studies that come in a format that is ready to use by a statistical package (look for listings, on the DataGate search page, of easily accessible datasets.).
4. Determine the amount of data involved and its practicality.
- Schedule, user experience, and the goal of the project set limits on the amount and complexity of the data that can or should be used.
- This can cut both ways. A term paper project should not use too much data, but a dissertation should not use too little.
5. Note the need to understand the data itself.
Measurement, method of collection, and quality of the data are all important in determining whether a set of data is appropriate to the problem, the context, and the proposed techniques.
This information is important in choosing a study, in analyzing it, and in understanding and presenting the results of the analysis.
The user needs to understand how data measurement and quality are documented. Quality and other collection issues should be addressed in the codebook. In the data itself, many studies use "missing" data indicators of various kinds. Census data, however, has separate flag variables which indicate when a data point has been "allocated" (estimated) or suppressed, which should be extracted along with the actual data points.
- What kind of data is required to answer the question?
- Will this dataset answer the question? Does it measure the necessary properties of the object of analysis?
- Is the level of measurement of this data appropriate to the techniques used? "Continuous" variables for regression?
- Are there quality issues with this data? How will the user tell, for instance, how missing data is indicated? How much valid data is required for a meaningful analysis?

