/* SHORT TITLE: Users Manual */ AIDS Cost and Services Utilization Survey Public Use Tapes #4 and #5 Users Manual Submitted to: Agency for Health Care Policy and Research 2101 East Jefferson Street Rockville, Maryland 20852 Submitted by: Westat, Inc. 1650 Research Boulevard Rockville, Maryland 20850-3129 Contract No: 282-89-0020 Deliverable No.: 49 June 30, 1994 TABLE OF CONTENTS Chapter Page 1 INTRODUCTION 1-1 2 RESEARCH DESIGN AND ANALYSIS OBJECTIVES 2-1 2.1 Objectives 2-1 2.2 Study Design 2-2 2.3 Strengths and Limitations of Data 2-7 3 DATA STRUCTURE OVERVIEW 3-1 3.1 Content of Public Use Volumes 3-1 3.1.1 Adult Public Use Volume (Tape #4) 3-2 3.1.2. Pediatric Public Use Volume (Tape #5) 3-5 3.2 ACSUS Public Use File Data Structure 3-8 3.2.1 Patient Reported Data 3-14 3.2.2 Medical Records Abstract Data 3-14 3.2.3 Provider Billing Data 3-15 3.2.4 General Analytic Usage Guidance 3-16 4 DATA PREPARATION, TYPES OF VARIABLES, AND DATA ANOMALIES 4-1 4.1 Data Preparation 4-1 4.1.1 Range Specifications 4-1 4.1.2 Consistency Checks 4-1 4.1.3 Frequency Review 4-2 4.2 Types of Variables 4-2 4.2.1 Edited Variables 4-2 4.2.2 Derived Variables 4-2 4.2.3 Imputed Variables 4-3 4.2.4 Flag Variables 4-3 4.3 Data Anomalies 4-3 4.3.1 Self-Reported Marital Status 4-4 4.3.2 Utilization Information Reported During the Wrong Reference Period 4-4 4.3.3 Dates of Service Utilization After the Subject's Date of Death 4-5 4.3.4 Loss of Medicare Coverage 4-5 4.3.5 Self-Reported Date of Positive HIV Test 4-6 4.3.6 Self-Reported Conditions of HIV Illness 4-6 5 GUIDE TO THE DATA FILES, SPECIFIC VARIABLES, CODEBOOKS, AND ANNOTATED QUESTIONNAIRES 5-1 5.1 Patient Data Files 5-1 5.1.1 Patient Level Files 5-2 5.1.2 Service Utilization Files 5-8 5.2 Medical Record Abstract Data Files 5-12 5.2.1 Overview 5-12 5.2.2 Patient Level File 5-13 5.2.3 Medical Abstract Repeating Record Files 5-14 5.3 Provider Billing Survey Data Files 5-17 5.4 Guide to Codebooks and Annotated Questionnaires 5-18 6 DATA IMPUTATION 6-1 6.1 Introduction to Imputation Strategies 6-1 6.2 General Imputation Strategy for ACSUS Data 6-2 6.2.1 Auxiliary Variables Used to Define Imputation Cells 6-5 6.2.2 Group 1 Item Nonresponse Imputation Procedures 6-6 6.3 Limitations of the Imputed ACSUS Data 6-7 6.4 Imputation Variables 6-8 6.4.1 Original Variables 6-8 6.4.2 Imputed Variables 6-8 6.4.3 Flag Variables 6-9 List of Tables Table 2-1 Number of Adult and Pediatric Patients: AIDS Cost and Services Utilization Survey 2-4 3-1 Contents of Adult Public Use Volume (Tape # 4) 3-2 3-2 Contents of Pediatric Public Use Volume (Tape # 5) 3-5 5-1 Categories of Information on Adult Time Specific Files 5-6 5-2 Categories of Information on Pediatric Time Specific Files 5-6 6-1 Imputed Charge Components by Care Type 6-4 List of Figures Figure 3-1 Patient Files 3-9 3-2 Medical Record Abstract Files 3-12 3-3 Provider Billing Files 3-13 5-1 Example of the Codebook Format 5-20 5-2 Example of the Annotated Questionnaire Format 5-21 List of Appendices Appendix A PATIENT INTERVIEW SURVEY INSTRUMENTS (Annotated) A-1 A-1 Adult/Adolescent Screener Questionnaire A-2 Time 1 Adult/Adolescent Patient Questionnaire A-3 Time 2 Adult/Adolescent Patient Questionnaire A-4 Time 3 Adult/Adolescent Patient Questionnaire A-5 Time 4 Adult/Adolescent Patient Questionnaire A-6 Time 5 Adult/Adolescent Patient Questionnaire A-7 Time 6 Adult/Adolescent Patient Questionnaire A-8 Pediatric Screener Questionnaire A-9 Time 1 Pediatric Patient Questionnaire A-10 Time 2 Pediatric Patient Questionnaire A-11 Time 3 Pediatric Patient Questionnaire A-12 Time 4 Pediatric Patient Questionnaire A-13 Time 5 Pediatric Patient Questionnaire A-14 Time 6 Pediatric Patient Questionnaire B PROVIDER SURVEY MEDICAL RECORD ABSTRACT FORM (Annotated) B-1 C PROVIDER SURVEY BILLING FORMS (Annotated) C-1 C-1 Inpatient Stay Form C-2 Ambulatory Billing Form C-3 Home Health Form C-4 Prescribed Medicine Form D CODEBOOK APPENDICES D-1 D-1 Name of Physician/Doctor Specialty Codes D-2 Prescription Drug Codes D-3 Nonprescription Drug/Non-Traditional Substance Codes D-4 Relationship to Patient Codes 1. INTRODUCTION This manual provides documentation and guidance for users of the public release of the complete data files for the AIDS Cost and Services Utilization Survey (ACSUS). These data files are contained in two volumes: ACSUS Public Use Tape #4 (Complete Adult Patient Questionnaires, Adult Provider Billing Questionnaires, and Adult Medical Record Abstracts) and ACSUS Public Use Tape #5 (Complete Pediatric Patient Questionnaires, Pediatric Provider Billing Questionnaires and Pediatric Medical Record Questionnaires). Partial information from the study was released previously in ACSUS Public Use Tapes 1-3. The ACSUS study data have undergone subsequent editing and some changes may have been made to the data since the publication of Tapes 1-3. Therefore, Tape 4 and Tape 5 should be considered the complete and final set of public use data from the ACSUS study. It should also be noted that Tape 4 and Tape 5 are self-contained and cannot be linked with previous tapes since patient and provider identifiers are not consistent with Tapes 1-3. ACSUS was a national survey of persons infected with HIV that was conducted by Westat, Inc., for the Agency for Health Care Policy and Research (Contract No. 282-89-0020). The study 1-1 was designed to provide estimates of the use of health care, social and support services, and the costs of services by a sample of HIV-infected persons in ten geographic locations in the United States. The chapters that follow provide information about the research design (Chapter 2), structure of the public use data files (Chapter 3), data preparation (Chapter 4), and the use of the ACSUS data files and codebooks (Chapter 5). Imputation techniques are described in Chapter 6. The appendices provide copies of all data collection instruments and codebook appendices. The main body of this users manual (Chapters 1-6) is also contained as a text file on the public use tapes. All figures, tables, and appendices are available in hard copy only. 1-2 2. RESEARCH DESIGN AND ANALYSIS OBJECTIVES* 2.1 Objectives The increase in the number of reported cases of AIDS (acquired immunodeficiency syndrome) and the expanded demands on the health care system for treating persons with illness related to HIV (human immunodeficiency virus) have been widely reported. Despite the growing problem, data on use of and expenditures for health care services for this population are surprisingly scarce. Thus, public policy related to the HIV epidemic has often been formulated on the basis of limited case studies and cross- sectional research. ACSUS is the largest data collection effort targeting the population of persons infected with HIV. In addition, its design addresses three major limitations of current data. First, data from other sources do not permit examination of early stages of HIV illness, such as periods of asymptomatic infection, or HIV illness that does not satisfy diagnostic criteria for AIDS. The ACSUS sample includes more than 400 adults who reported being HIV ____________________ *This chapter is excerpted from Berk, M.L., Maffeo, C. and Schur, C.L. (1993). Research Design and Analysis Objectives, AIDS Cost and Services Utilization Survey (ACSUS) Report No. 1 (AHCPR Publication No. 93-0019). Rockville, MD: Agency for Health Care Policy and Research. 2-1 positive but having no HIV-related symptoms or conditions. Second, despite the fact that the disease is increasingly treated in outpatient settings, current information on AIDS does not permit analysis of the use of a wide variety of outpatient services. ACSUS data, on the other hand, include the use of and charges for the full range of ambulatory physician settings, use of both formal and informal home care services, use of a number of mental health and support services, and use of and charges associated with various drug therapies. Finally, perhaps of greatest importance is the longitudinal nature of ACSUS. Length of survival after infection has been increasing and may continue to do so as prophylactic treatment for HIV infection becomes the norm. Yet there is little information on how levels and mix of service use, sources of payment for care, or quality of life change over the course of the illness. With six interviews over the course of an 18-month period, ACSUS provides a wealth of information on factors affecting the use of services over different stages of the illness. Data collected through ACSUS will be key to the formulation of Federal strategies for the allocation of resources to the care of persons with HIV-related illness and to a broader understanding of the effect of HIV on segments of the U.S. health care system. Moreover, the information needs of State and local 2-2 governments parallel those of the Federal Government as they deliver care to persons with HIV-related illness. 2.2 Study Design Sample Design The ACSUS sample was drawn using a multistage design that involved selection of geographic areas, sampling of patient care sites in the selected areas, and a probability sample of HIV- infected persons who sought treatment from those providers over a study enrollment period of approximately 4 months. During the first stage, the sample of geographic areas was selected so as to provide regional diversity in third-party payment mechanisms, service delivery systems, and type of HIV- infected individuals (e.g., injecting drug users, homosexual men, pediatric cases, and women). The first step in the geographic area selection process was to focus on cities with the highest number of reported AIDS cases to ensure obtaining a large enough sample of HIV-infected persons. Using data provided by the Centers for Disease Control (CDC), the 25 cities with the greatest number of cases (accounting for more than 62 percent of all cumulative AIDS cases in the United States) were identified. 2-3 From these 25 cities, 10 were selected, using the following criteria: § At least six cities with a high prevalence of HIV infection. § At least one city in a State with a Medicaid waiver in place that permits reimbursement for services not usually covered by Medicaid. § At least one city in a State with restrictive Medicaid policies. § At least five cities with pediatric cases. § At least one city with a moderate HIV prevalence rate. The final selection of the 10 geographic areas was based on data supplied by CDC on the race and ethnicity of persons with AIDS, number of persons by exposure category, and number of pediatric cases. The 10 geographic areas are New York, Newark, Philadelphia, Baltimore, Miami, Tampa, Chicago, Houston, Los Angeles, and San Francisco. The second stage of sampling was selection of patient care sites in each area. A partial listing of hospitals and clinics (but not private practitioners) that provided services to HIV-infected persons was obtained from CDC's National AIDS Information Clearinghouse. Calls were made to the city or county department of health in each of the 10 areas to update the list 2-4 of patient care sites and to attempt to obtain some information on the number of persons treated by each. In general, accurate information on the number of patients treated (as distinct from the number of visits) was unavailable. Although some hospitals made the information public, in some cases it was deemed to be confidential. Most health departments considered information on names of private practitioners treating AIDS patients too sensitive to release. However, it was possible to rank hospitals and outpatient clinics in each area in terms of estimated caseload. In general, the patient care sites that treated the largest number of patients in the area were selected, although attempts were made to include both private and public sites. In four geographic areas, private practitioners who were affiliated with the hospitals providing substantial amounts of inpatient care were included in the sample. A sample of 32 patient care sites was selected; 26 of these participated in the study. In the third stage of sampling, a probability sample of patients was selected from each patient care site. The sampling frame of patients was identified through the use of a self- administered screener questionnaire. A group of 55 site coordinators, some of whom were clinic employees, were trained to distribute the screening form to all patients visiting the site 2-5 for care during the sampling period of 2-4 months. Use of the screener questionnaire allowed for the collection of information necessary for sampling purposes without requiring review of medical records or collection of any personal identifying information. Coordinators were trained to provide help to patients requiring assistance in completing the form. After the screening form was completed, the coordinator used the information in the form to determine whether the patient was eligible for the study and placed the patient in the appropriate sampling stratum. The coordinator then selected a systematic sample within each stratum. Approximately 6,000 persons both completed the screening form and were designated eligible for the study; of these, 2,487 were sampled (Table 2-1). 2-6 Table 2-1. Number of Adult and Pediatric Patients: AIDS Cost and Services Utilization Survey The primary sampling criterion used was illness stage, defined as AIDS, HIV-related illness, and asymptomatic. In addition, pediatric patients and women were oversampled, regardless of illness stage. (Pediatric patients who had clinically defined AIDS were enrolled in the study regardless of age; those who were HIV-positive but non-AIDS were required to be at least 15 months of age to preclude the chance of seroconversion.) The secondary goal in selecting patients was to obtain a sample whose distribution by exposure group and payment source was roughly proportionate to that of the HIV-infected population in the geographic areas targeted. Thus, patients were stratified by exposure group and, in the case of homosexual or bisexual males, by insurance status as well. This stratification not only allowed for a more representative sample of HIV-infected patients but also made it possible to obtain as many patients as possible in some of the rare strata (for example, pediatrics). The study includes some primarily public hospitals, where AIDS patients with higher socioeconomic status may be under- represented. Male homosexuals and bisexuals were stratified by source of payment in an attempt to control the distribution of patients by 2-7 socioeconomic status and reduce potential bias. It was anticipated that the number of female and pediatric cases with private insurance would be small. It also was expected that nearly all injecting drug users (IDUs) would be uninsured, with a small percent receiving public assistance. Target sampling rates were then computed for each provider. If, after reviewing the screening form, the site coordinator selected the patient into the sample, the coordinator then immediately reapproached the patient to initiate formal recruitment into the study. During recruitment, the site coordinator explained the study purpose and requirements, and the patient completed a consent form and patient location form. Coordinators reported weekly on sampling activities so that sample yield could be monitored and sampling rates adjusted. Of the 2,487 sampled patients, 88 percent, or 2,197, were successfully recruited, that is, they signed a consent form and provided location information (Table 2-1). Enrollment response rates by exposure category, illness stage, and insurance status were rather uniform. The lowest rate was for asymptomatic homosexual men who had private insurance, about 80 percent of whom agreed to participate. Given the prognosis for the disease, the stigma the disease still carries, and the socioeconomic status of this group, this 80-percent response rate is not 2-8 unexpected. The response rates for IDUs, women (many of whom are IDUs), and pediatric cases were higher than the mean response rate for the entire sample. These groups were of concern not only from the point of view of initial enrollment into the sample but also with respect to continuing participation in the study. Consequently, a number of procedures and incentives were introduced in an attempt to maximize participation. These are described in the next section. As shown in Table 2-1, 2,090 persons, or 95 percent of those agreeing to participate, actually completed the initial (Time 1) patient interview. Of these, 141 were pediatric patients. Of the adults (including 361 women), 678 were patients with AIDS, 843 patients with other HIV-related illnesses, and 422 asymptomatic patients. The illness stage for six patients was unknown at Time 1. Data Collection Design The data collection design for the patient survey component of the study involved conducting six in-person interviews (Time 1-Time 6) with study subjects over an 18-month study reference period, March 1, 1991, through August 31, 1992. Subjects were contacted every 12 to 14 weeks. It is believed 2-9 that this timeframe minimizes potential sample attrition because of death, tracking problems, and possible recall bias in reporting. Although information on insurance coverage, employment, and income was collected at every interview, the content of the interview varied somewhat over the six time periods, with detailed segments on functional status, quality of life, and access and barriers to care included in three periods. During each interview the patient was asked to name every health care provider from which a service was received during the time since the previous interview, referred to as the interview reference period. For the Time 1 interview, the interview reference period begins on March 1, 1991, and for the Time 6 interview, the period ends on August 31, 1992. During each interview the patient was asked to sign permission forms that allowed contact with every medical provider from which the patient received a service. Providers included private practitioners, outpatient departments of hospitals, freestanding clinics, pharmacies, and home health agencies. The set of signed provider permission forms represents the sample for the provider survey component of ACSUS. As of the first interview, almost 2,500 medical care providers had been identified. These providers were contacted two times during the 18-month patient survey reference period to obtain information on services rendered, charges for services, and source of payment for 2-10 charges, as well as medical data from those providers named by the patient as the usual source of care. The decision to obtain charge information from providers rather than from patients was made in recognition that data obtained from individuals may suffer from a number of biases, including recall error (Berk, Horgan, and Meyers, 1986; Berk, Schur, and Mohr, 1990; Cox and Cohen, 1985; National Center for Health Statistics, 1961). No attempt was made to collect charge data from patients, with the exception of out-of-pocket expenses and dental service charges. Dental providers were specifically excluded from the list of providers from which charge data were to be collected because of concern that awareness of a patient's HIV status might jeopardize receipt of services. Providers sampled for the provider survey component were asked either to furnish a printout of the patient's bill or to complete a data form for each reported visit made during the reference period. The information gathered from billing records should provide an accurate count of the services received at the particular medical facility and the charges for those services. It should be noted that charge data, rather than cost or expenditure data, were collected. Charges refer to billed amounts and may be greater than expenditures because of unpaid bills, bad debt, or uncompensated care. Costs should measure 2-11 actual resource use but, in fact, are rarely likely to be available or even known. Different methods of allocating fixed costs as well as cross-subsidization among hospital departments make cost comparisons difficult (Prospective Payment Assessment Commission, 1985). In addition to medical providers, approximately 2,500 nonmedical providers were identified in the first interview. These included community-based organizations providing social support services, podiatrists, and providers of alternative therapy. For these providers, respondents were asked to provide information about the type and amount of services received. Because of concerns about maintaining patient participation among a group of patients who are very ill and, in many cases, very transient, data collection procedures were designed to maximize continuing participation. Interviewers received not only extensive training in administering the questionnaire but also specialized instruction in understanding the sensitive nature of the subject matter and in relating to persons with a serious and debilitating illness. Initial and followup interviews were conducted at a location specified by the study subject. IDUs and homeless patients, however, completed Time 1 interviews at the provider site because of the potential tracking problems associated with these persons. Subsequent 2-12 interviews for these participants were also held at the patient care site if the subject preferred that location over another. Upon recruitment into the study, respondents were provided with an 800 number and told to call with any questions or problems at any time during the data collection. Participants were given a new card with the 800 number on it each time they were interviewed. About 15 calls a week were received on this number. Subjects also were paid $50.00 each time they completed an interview. Patient location data used for tracking purposes were collected initially when the subject consented to participate in the study and updated during each of the Time 1-Time 5 interviews. Study subjects were asked to provide traditional tracking information, such as the name of a person who would know how to find them if they moved or could not be located. They were also asked about places they frequented, names of friends, names of social workers or parole workers, other names they use, and nicknames or street names that they had. Using patient location data, interviewers telephoned or visited subjects to set up interview appointments. However, in some cases, patients remained difficult to locate, and interviewers obtained additional assistance from clinic staff at 2-13 the enrollment sites. When difficult-to-locate patients visited the clinic, staff would inform them that interviewers were trying to contact them. If these patients had scheduled a future clinic appointment, staff would apprise interviewers of the time. Finally, study subjects were asked to provide the name of a proxy respondent who would know about their health care use and expenses and could be contacted if study subjects were too ill to complete an interview. More than 82 percent of the study subjects provided the name of a proxy. 2.3 Strengths and Limitations of Data ACSUS is the largest and most comprehensive study of the cost and use of services by persons with HIV-related illness. Although it was designed to overcome many of the data limitations described below, the difficulties inherent in identifying persons with HIV infection necessitated a survey design with certain limitations. The implications of these limitations for data analysis can be minimized, however, if the analytic plan is designed with the particular strengths and weaknesses of the data base in mind. Thus, analysis using the ACSUS data base should emphasize those areas of inquiry in which it is strongest and should not attempt to examine issues for which the survey is ill suited. 2-14 Sources of Bias Like all surveys, ACSUS has methodological limitations. As Fowler (1988, p. 145) has indicated, "the cost of trying to achieve error-free estimates is too high for most research purposes; some potential for error exists in virtually all survey plans. Total survey design involves considering all aspects of the survey and choosing a level of rigor appropriate to the purpose of the particular project." Groves (1989) describes four major types of error: sampling error, coverage error, nonresponse error, and measurement error. Within each of these categories, there can be systematic as well as random error. The former will produce biased estimates; the latter will not. Random, or variable, error is said to be unbiased because, although it affects the precision of the estimate, it is as likely to produce an overestimate as an underestimate. Thus, the estimate will be unbiased when there is variable error, but the variance of the estimate will be increased. Three types of potential error are discussed here as they apply to ACSUS. 2-15 Coverage bias. Coverage bias occurs when some members of the target population have no chance of sample selection. Coverage bias is probably the most important source of error in ACSUS. Because it is a sample of patients, ACSUS does not include any persons who have not entered the medical care system. ACSUS excludes persons who receive care only at settings where a small number of persons with AIDS are seen. These persons may differ systematically from those surveyed in terms of income and source of payment for care. Nonresponse bias. Nonresponse bias occurs as a result of the failure to collect data from all eligible persons in the sample. This can occur at several points in ACSUS. Some persons refused to complete the ACSUS eligibility screener, and others declined participation at the time of enrollment. A small number of respondents declined to participate in later rounds of the survey. Some declined to allow contact with their medical providers. Moreover, refusals are not the only source of nonresponse. Some patients died, and it was not always possible to locate an acceptable proxy respondent who could provide the necessary utilization and cost information. Other patients moved away and were not locatable. Field procedures, including efforts to interview by telephone respondents who had relocated, were 2-16 designed to minimize nonresponse bias, and field experience suggests it is not be an important source of error. Measurement bias. Measurement bias can stem from at least four different sources: the respondent, the interviewer, the questionnaire, and the mode of interview. For example, measurement error can occur when respondents give inaccurate answers or when an accurate answer is incorrectly coded. Some self-reports of illness stage, for example, may be erroneous. Inaccurate responses may result from a respondent's inability to correctly recall events. They also may result from a respondent's deliberately incorrect answer, as could occur if a patient does not wish to reveal to an interviewer that he or she is receiving psychiatric counseling or drug detoxification. Measurement bias also occurs as a result of interviewer effects. Interviewers may ask questions incorrectly or may fail to accurately record the respondent's answer. Interviewer error tends to be variable, but if interviewers systematically probe incorrectly or if codes fail to correctly categorize particular kinds of responses, the error may result in biased estimates. 2-17 Implications for Analysis In considering the analytic utility of ACSUS, several elements of study design need to be considered. These are reviewed with an eye to specific policy and health services research issues. Statistical representation of population. The most important limitation is the lack of a national probability sample. Given the sensitive nature of the illness as well as the low rate of prevalence, a national probability sample of households would have been unlikely to achieve acceptable results in terms of participation or cost. Thus, although a national probability sample was justifiably ruled out, the resulting sample is not statistically representative of the U.S. population with HIV-related illness. Specifically, certain population subgroups are undercounted. ACSUS is limited to major metropolitan centers and does not include low-prevalence or rural areas. Persons who used health care services infrequently, and thus did not use services during the enrollment period, or who did not use services at all are undercounted. It should be noted that the primary objective of ACSUS is to provide data on the use and cost of services for persons with HIV-related illness. ACSUS was not intended to 2-18 count the number of HIV-infected persons or to measure the number of persons unable to obtain services. Geographic diversity. Relative to other studies, ACSUS has several strengths critical to the usefulness of the resulting data. Although the sample population is not strictly representative of the national population with HIV-related illness, ACSUS is one of the few surveys of HIV-related illness that was conducted in multiple geographic and provider sites. It is the first multiple-site survey in which site selection was driven primarily by analytical needs rather than by the necessity of evaluating particular programs, as was the case in the evaluation of the AIDS/HIV Service Demonstration Grants Program conducted for the Health Resources and Services Administration and the evaluation of the Robert Wood Johnson Foundation's AIDS Health Services Program (Mor, Fleishman, Allen and Piette, 1994). The ACSUS geographic and patient care sites were selected in order to ensure diversity with respect to ethnicity, exposure category, and source of payment such that there will be a significant number of persons in each analytical cell of interest. Length of reference period. One of the most important design features of ACSUS is the use of multiple contacts with respondents over an 18-month period. This substantially adds to 2-19 the analytic utility of the ACSUS data base by allowing for the examination of changes over time. As more effective drug therapies are found and prophylactic treatment of HIV infection becomes more widespread, length of survival is likely to continue to increase. A number of critical changes take place over the course of the disease that affect the infected person's ability to function and use of services. For example, as the illness progresses, individuals may become unable to work, thus losing their means of financial support as well as their access to private health insurance. These changes have implications for public financing programs such as Medicaid, which bear a disproportionately large share of the cost of care for persons with HIV. In addition, the level of services used and the mix of services are likely to change with the course of illness; in some cases, caregiving arrangements may need to be supplemented with formal home care as individuals become sicker. 2-20 REFERENCES Berk, M. L., Horgan, C., and Meyers, S.: The reporting of stigmatizing conditions: A comparison of proxy and self- reporting. Journal of Economic and Social Measurement 14:197-205, 1986. Berk, M. L., Schur, C. L., and Mohr, P.: Using survey data to estimate prescription drug costs. Health Affairs, Fall 1990. Cox, B., and Cohen, S.: Methodological Issues of Health Care Surveys. New York: Marcel Dekker, 1985. Fowler, F. J., Jr.: Survey Research Methods. Applied Social Research Methods Series, Vol. 1. Beverly Hills: Sage Publications, 1988. Groves, R. M.: Survey Errors and Survey Costs. New York: John Wiley and Sons, 1989. Mor, V., Fleishman, J.A., Allen, S.M., and Piette, J.D.: Networking AIDS Services. Ann Arbor: Health Administration Press, 1994. National Center for Health Statistics: Reporting of hospitalization in the Health Interview Survey. Series D, No. 4. Washington: U.S. Government Printing Office, 1961. Prospective Payment Assessment Commission: Technical Appendixes to the Report and Recommendations to the Secretary of the U.S. Department of Health and Human Services by the Prospective Payment Assessment Commission. Washington, D.C., 1985. 2-21 3. DATA STRUCTURE OVERVIEW As described in Chapter 2, this study involved data collection for two groups of HIV-infected persons: adult and pediatric. A public use volume has been created for each of these groups -- ACSUS Public Use Tapes #4 and #5. Data collected for each group includes up to six patient interviews, conducted once every three months over a period of 18 months, two abstracts of medical records data from the provider(s) the patient indicated as his "usual source of care," and billing data collected from medical providers identified during the patient interviews over the 18-month period. Confidentiality issues prohibit identification of respondents, where respondents include providers as well as patients. Therefore, all information on the geographic location of respondents has been deleted from these tapes. This chapter of the public use documentation describes the content of the public use volumes, and provides a description of the structure of the data files and their correspondence to data collection instruments for each group -- adults and pediatrics. 3-1 3.1 Content of Public Use Volumes Two public use volumes have been developed for the ACSUS study. The first volume contains data collected for the adult sample (Tape #4), and the second contains data collected for the pediatric sample (Tape #5). Each volume is contained on a 9 track tape, recorded at 6250 bytes per inch using the EBCDIC character set. These volumes have been recorded using IBM standard labels. Each volume contains four types of data sets: fixed length documentation files, fixed length EBCDIC data files, fixed length codebook print files corresponding to the data files, and fixed length SAS source statement files, one for each data file. Specific information regarding the content of each public use volume are provided in the following subsections. 3.1.1 Adult Public Use Volume (Tape #4) The adult public use volume contains data files, documentation files, printable codebook files, and SAS source code statements required to read each of the data files. SAS source code included consists of input statements, variable labels, and variable value format statements. Three major types 3-2 of data were collected during the study: patient interview, medical record abstract, and provider billing data. These data were supplemented with vital statistics data which are included on the patient characteristics file. The content of the adult public use volume (Tape #4) is provided in Table 3-1. Table 3-1. Contents of Adult Public Use Volume (Tape #4) Further detail regarding the content of the data files and specific instructions for using the codebooks can be found in Chapter 5 of this document. 3.1.2 Pediatric Public Use Volume (Tape #5) The pediatric public use volume contains data files, documentation files, printable codebook files, and SAS source code statements required to read each of the data files. SAS source code included consists of input statements, variable labels, and variable value format statements. Three major types of data were collected during the study: patient interview, medical record abstract, and provider billing data. These data were supplemented with vital statistics data which are included on the patient characteristics file. 3-3 The content of the pediatric public use volume (Tape #5) is provided in Table 3-2. Table 3-2. Contents of Pediatric Public Use Volume (Tape #5) Further detail regarding the content of the data files and specific instructions for using the codebooks can be found in Chapter 5 of this document. 3.2 ACSUS Public Use File Data Structure To facilitate analytic use of the ACSUS data, and enable use of current microprocessor technology, we have delivered the public use data in a relational structure as a group of fixed length normalized data files. Figures 3-1 through 3-3 show the relationships between these files and their correspondence to the data collection instruments used. The figures depict both the longitudinal and the logical relationships between the entities represented by the study data (i.e., patient, service(s) utilized, provider billing, and medical record(s)). Three major types of data are represented by this structure: patient, medical record, and provider billing. A 3-4 brief description of each data type is presented in the following subsections. Detailed guidance for using these data is provided in Chapter 5 of this document. Figure 3-1. Patient Files Figure 3-2. Medical Record Abstract Files Figure 3-3. Provider Billing FIles 3.2.1 Patient Reported Data As shown on Figure 3-1, the patient data are composed of a patient level file, interview time specific patient questionnaire data for each interview period (Time 1 through Time 6), and service utilization files which contain data collected during all six rounds of data collection. Service utilization files correspond to distinct service utilization sections in the patient questionnaire and are not included in the time specific files. A patient will have one record in the patient characteristics file, uniquely identified by the patient identifier (PATID). In addition, a patient will have one record in each of the Time 1 through Time 6 specific data files (if they 3-5 completed an interview in each of these periods) uniquely identified by patient identifier (PATID), and any number of records in each of the service utilization files, each record representing use of specific types of services over the entire 18-month study period. Patients will have zero records in a specific service utilization file if they did not report receiving that type of care. These service utilization records represent either a visit or use of a specific provider by the patient depending on the information required by the particular section of the questionnaire. Each record is identified by the patient identifier (PATID), the interview time period (1 through 6) (SFORM), a questionnaire section identifier (SFPART), and a service utilization sequence number (SSUBREC). For example, data regarding inpatient hospital stays were collected for each stay, and data regarding use of a particular physician/doctor were collected by provider. Further detail regarding the usage of these data files can be found in Chapter 5 of this document. 3-6 3.2.2 Medical Records Abstract Data As shown on Figure 3-2, the Medical Record Abstract Data comprise 4 separate files: patient, inpatient stay, check-list conditions, and T-Cell reports. Twice during the study period, medical records data were collected from providers identified by patients as their usual source of care. More than one provider may have been identified as the usual source of care for a particular patient and thus data collected from more than one provider for one patient. The Patient Level File contains one record for each patient containing data derived from all abstracts received for a patient. Each record on this file is uniquely identified by a patient identifier (PATID). The Inpatient Stays File contains information regarding inpatient stays reported by the usual source of care providers identified by the patient. A record is uniquely identified by patient identifier (PATID) plus provider identifier (USCID) plus a record sequence number (USREC02). The Checklist Conditions File contains information regarding medical conditions commonly associated with HIV- infected persons. A record on this file is uniquely identified 3-7 by patient identifier (PATID) plus provider (USCID) identifier plus a record sequence number (USREC03). The T-Cell Reports File contains information abstracted from medical records regarding laboratory tests reporting T-Cell counts. Depending upon available information, absolute counts and percentages were recorded for both CD4 and CD8 cells. A record in this file is uniquely identified by patient identifier (PATID) plus provider identifier (USCID) plus a record sequence number (USREC04). Further detail regarding the use of Medical Record Abstract Data is provided in Chapter 5 of this document. 3.2.3 Provider Billing Data Provider billing data were collected twice during the conduct of the study for medical providers reported by the patient during the patient interviews. As shown on Figure 3-3, four provider billing data files are provided which contain billing data covering the 18-month period represented by the patient interviews: Ambulatory, Inpatient, Home Health, and Pharmacy. Each record in the Ambulatory, Inpatient, and Pharmacy billing data files contains one record per event: an ambulatory 3-8 visit, an inpatient stay, or a prescription medication obtained. A record in these files can be uniquely identified by Provider Identifier (PROVID) plus Patient Identifier (PATID) plus the Form Identifier (PFORM) plus the Event Sequence Number (PSUBREC). A record in the Home Health billing data file may contain information regarding more than one event, or visit, during a period of time. Further information regarding substantive usage is presented in Chapter 5 of this document. 3.2.4 General Analytic Usage Guidance Effective use of these data require familiarity with the ACSUS study, data collection instruments and the structure and content of the data files. In conducting analyses with these data we have found it useful to begin with the data collection instruments to identify the questions of interest and associated variable names on the data file required to support the analysis. Users should reference the annotated questionnaires to identify the questions and variable names appropriate for their analyses. Once variables and files have been identified, the codebook(s) are used to identify variable values, variable position, length, type, and applicable formats. Chapter 5 includes an explanation 3-9 of how to use the codebooks and annotated questionnaires. Appendices A-C contain copies of the annotated questionnaires. Patient and provider identifiers included on patient reported service utilization files are consistent with those used in the provider billing data files. Patient reported service utilization can be associated with provider reported service and associated billing data through the patient and provider identifiers. In many cases, the level and type of service provided are inconsistent between the patient-reported data and the provider billing data. These inconsistencies are due to patient recall issues, provider misidentification, and the availability of billing records at the provider for the patient. Imputation of service and charges were performed in some cases. Details regarding situations for which imputations were performed are provided in Chapter 6 of this document. 3-10 4. DATA PREPARATION, TYPES OF VARIABLES, AND DATA ANOMALIES 4.1 Data Preparation Various data preparation and editing techniques were employed to ensure accuracy and consistency in the ACSUS data. These techniques are described in the following sections. 4.1.1 Range Specifications Acceptable ranges for all data items were defined and computerized edits conducted to identify all items falling outside the predetermined parameters. For close-ended items the ranges were determined by the codes available for the responses. For open-ended items, for example, the out-of-pocket dollar amount paid for a doctor's visit, reasonable ranges were defined. Data items that failed the range edits were reviewed for coding and data entry errors and corrected as necessary. Following review by data preparation and project staff, out-of- range values were retained if no coding and data entry errors were found. 4-1 4.1.2 Consistency Checks Consistency or logic checks were conducted to examine the relationships between responses to ensure that they did not conflict with one another or that the response to one item did not make the response to another unlikely. Logic checks were conducted both within and across the various data files. Checks within files examined the data for skip patterns and other types of logical inconsistencies. Logic checks across the ACSUS data files examined the relationship between information reported during one patient interview and similar information gathered at a subsequent interview. Logic checks also inspected the data for discrepancies in information across the different data collection instruments, for example, the medical record abstract and the patient interview. To the extent that either type of logic check uncovered coding or data entry errors, appropriate corrections were made to the files. However, edit checks uncovered a variety of inconsistent items which, following review by project staff, could not be resolved. Such inconsistencies remain in the data files and are discussed in Section 4.3. 4-2 4.1.3 Frequency Review The frequencies of responses to all data items were reviewed to ensure that appropriate skip patterns were followed and that the correct number of responses was represented for all items. If a discrepancy was discovered, the problem case was identified and hard copy of the record was reviewed to determine the appropriate response. If the hard copy revealed no additional information, the item was coded as "not ascertained". 4.2 Types of Variables Data items in the ACSUS public use tapes can be classified into four major categories: edited, derived, imputed, and flag variables. The sections that follow describe each variable type. 4.2.1 Edited Variables The majority of data items on the ACSUS data files are original variables which contain information corresponding to individual items or questions on the data collection instruments. 4-3 All original variables have been edited for skip pattern errors, range outliers, and logic checks, as described in Section 4.1 and have therefore been categorized as edited variables. 4.2.2 Derived Variables A limited number of derived variables are included on the files to assist the user in analyzing the data. A variable is defined as derived if it is constructed from one or more original data items. Because the analytic needs of each user are unique, most analysts will choose to derive their own set of analysis variables. Accordingly, derived data items on these tapes are restricted to either items the user can not easily construct without extensive knowledge of the data files, or to items thought to be of interest to a majority of users. 4.2.3 Imputed Variables In ACSUS, as in most surveys, the responses to some data items were not obtained. For this study, missing item imputation was conducted for a limited set of variables on the provider billing survey files. In addition, entire service event records were imputed on the provider billing survey files. As a result 4-4 of these imputation processes, several new variables were created and reside on the provider billing survey files. A description of ACSUS imputation procedures is provided in Chapter 6. 4.2.4 Flag Variables Three types of flag variables are present in the ACSUS public use data files. First, a series of imputation flags were created to enable users to identify imputed values and events. Imputation flags are discussed more fully in Chapter 6. Second, flag variables were created to identify and to alert the user to analytic issues that might otherwise be overlooked. Third, flag variables are used to highlight discrepant data elements. The second and third type of flag variables are discussed in Chapter 5 in the sections corresponding to the specific data files in which they reside. 4.3 Data Anomalies The purpose of this section is to assist the user in making informed decisions about conducting analyses with the ACSUS public use data tapes. It is intended to bring to the user's attention certain data considerations and anomalies so 4-5 that the user may take them into account when analyzing the ACSUS data. Although the data have been subjected to rigorous editing processes, not all data inconsistencies or problems could be resolved and therefore some inconsistencies remain in the files. Remaining inconsistencies occur primarily for two reasons. First, much of the data from the ACSUS study is self-reported and therefore subject to problems of recall error and so on. Second, the ACSUS study data draw upon multiple sources to obtain similar kinds of data and the sources sometimes report discrepant information. Accordingly, to conduct the most meaningful analysis for their purpose, users should read this section and thoroughly familiarize themselves with the various data instruments prior to undertaking any analysis. 4.3.1 Self-Reported Marital Status In each of the six adult patient interviews, subjects were asked to indicate their marital status. Review of self- reported marital status across time uncovered subjects who indicated during the current interview that they were never married but in a previous questionnaire they reported their marital status as divorced, separated, or married. 4-6 Investigation of these cases found no recurring patterns -- inconsistent cases included male and female subjects, respondents with and without children, and subject and proxy interviews. Inconsistent marital status data were not removed from the tapes. In view of these inconsistencies, users interested in incorporating marital status in their analyses may want to supplement or cross-reference marital status with the series of questions on household composition in the time specific patient characteristics files. 4.3.2 Utilization Information Reported During the Wrong Reference Period The patient files contain self-reported information on inpatient stays, nursing home stays, and dental visits. In some instances, respondents reported utilization for these services that occurred in the period of time covered by a previous reference period. Events were examined and deleted if found to be duplicates of previously reported events. If they were found to be new events reported for the first time, the event was retained. The variable which indicates the source of the 4-7 information, SFORM, was coded to reflect the interview period in which the information was collected; SFORM was not changed to represent the period in which the utilization actually occurred. This data anomaly may have a bearing on analyses that link patient-reported utilization to the specific time period in which they occur. Users interested in such analyses should link utilization by comparing the dates of service to the reference dates for each interview period rather the selecting events based on SFORM. 4.3.3 Dates of Service Utilization After the Subject's Date of Death Editing procedures included a series of logical edits which compared all dates of service utilization in the provider billing data files for deceased individuals to the subject's reported date of death. These edits identified all unimputed records where the date of service was after the date of death. Review of these cases found some coding or data entry errors which were corrected. Investigation by project staff seems to indicate that the remaining inconsistencies are a function of provider billing patterns. It appears that some health care providers generated bills following the death of a subject for 4-8 services rendered prior to death. However, the bills contain only the billing date and do not include the actual dates of service. Analysts may choose to remove all charge data for events with associated dates occurring after a subject's date of death. Due to the imputation methods which were used to impute events on the provider billing files, there are also situations where an imputed event has an associated date of service that is after a subject's date of death. This occurs because the entire record from the donor event, including the date of service, was imputed to the recipient event. 4.3.4 Loss of Medicare Coverage Editing procedures uncovered cases where a subject reported Medicare in the current questionnaire but failed to report the coverage in a subsequent interview. Unlike private insurance or some forms of public insurance where it is likely that coverage may fluctuate over time, once an individual becomes eligible for Medicare benefits they should continue to receive this coverage. Review of these cases found several instances where respondents confused Medicare and Medicaid coverage. In this situation, the data were corrected to reflect the 4-9 appropriate insurance coverage. However, the user should be aware that a small number of these inconsistencies could not be resolved and remain in the data files. 4.3.5 Self-Reported Date of Positive HIV Test In the screener questionnaire, subjects were asked to report whether they had tested positive for the HIV virus and the date of the test. Several subjects reported HIV test dates which are clearly incorrect (i.e., prior to 1986). Because the screener data reflects self-reported information, these outliers have not been removed. However, the user should be aware that these anomalies remain in the data files and that the medical record abstract may be a better source for determining an HIV positive diagnosis. 4.3.6 Self-Reported Conditions of HIV Illness Subjects were asked in the Time 1, Time 5, and Time 6 questionnaires to indicate whether they had been told they had any of a list of conditions commonly associated with HIV illness. Review of these data items found numerous instances where the subject's response in Time 5 or Time 6 contradicted what was 4-10 reported in an earlier questionnaire. These inconsistencies remain in the files and users interested in conducting analysis with those data are encouraged to cross-reference the self- reported conditions with the clinical data in the medical record abstract files. 4-11 5. GUIDE TO THE DATA FILES, SPECIFIC VARIABLES, CODEBOOKS, AND ANNOTATED QUESTIONNAIRES As discussed in Chapter 3, Data Structure Overview, ACSUS Public Use Tape #4 contains complete study data for adult ACSUS subjects and Tape #5 includes similar information for pediatric study subjects. The study data have been organized into three major components: patient data files, medical record abstract files, and provider billing survey data files. Each component contains multiple data files. The following sections provide a general discussion of the type of information contained on each set of files followed by detailed descriptions of derived, imputed, and flag variables specific to each data set. Although the adult and pediatric study data are on separate tapes, the two tapes are structured similarly with only slight variations between the adult and pediatric versions. Unless otherwise noted, the information in this chapter is applicable to both adult and pediatric files. 5.1 Patient Data Files The patient data files contain self-reported information collected for each of the six interview periods for which an 5-1 interview was completed and several items collected from a screener questionnaire completed at the time of study enrollment. They also contain selected information from other sources, such as death certificates. The patient files are classified into two types: patient level and service level. A series of 14 files (7 adult and 7 pediatric) constitute the complete set of patient level files. There is an overall patient characteristics file and a patient level file corresponding to each of the six interview periods. The service level files are a series of 29 files (15 adult and 14 pediatric). Each service level file contains complete data for a particular type of health care service, such as hospital inpatient stays, collected across the six patient interviews. 5.1.1 Patient Level Files 126.96.36.199 Overall Patient Characteristics File The overall patient level adult file contains data for the 5,898 adult subjects who completed the screener questionnaire and were eligible to participate in the study. This includes 2,327 subjects who were sampled, of which 2,050 agreed to participate. The pediatric version of this file contains similar 5-2 information for the 224 eligible pediatric subjects. This includes 160 subjects who were sampled, of which 146 agreed to participate. The overall patient characteristics files include data items in four major categories: sociodemographics, survey administration, clinical and vital status, and service utilization. These four categories are described below. Sociodemographic Variables: The overall patient characteristics file contains four sociodemographic variables: gender, race/ethnicity, age, and mode of exposure. Gender (SEX): The subject's gender was collected in the screener and medical abstract instruments. The variable SEX is a composite of two original variables edited for inconsistencies across these two sources. Race/Ethnicity (RACE): Information on race/ethnicity was collected in the screener. RACE is a recoded version of the original variable. It has been collapsed into fewer categories for purposes of confidentiality. Age at Start of Study (AGE): This is a derived variable which reflects the age of the subject in years as of March 1, 1991. For purposes of confidentiality, the small number of adult subjects who were 60 years of age or older at the start of the study have been grouped into one age category. Similarly, adult subjects 15 through 19 years of age have been grouped into one age category. Pediatric subjects who were 10 through 12 years of age have been grouped together. Mode of Exposure (EXPROUTE): EXPROUTE is the suspected mode of exposure to the HIV virus as reported in the subject's medical record. Because medical records were collected from multiple providers, more than one mode of exposure may have been reported for an individual. If multiple modes 5-3 were reported, EXPROUTE is coded to reflect all modes. EXPROUTE for the 3,848 adult subjects who were screened for participation in the study but not enrolled is defined using self-reported information from the screener instrument. Screener data is also used to define EXPROUTE for the small group of adult subjects who were enrolled in the study but for whom no medical abstract data were collected. The pediatric screener instrument did not collect information on suspected mode of exposure. Therefore, EXPROUTE is missing for 78 screened pediatric subjects who were not enrolled. Survey Administration Variables (T1_STAT through T6_STAT): There are a series of six interview status variables, one corresponding to each interview period, on the overall patient characteristics file. These variables indicate whether a patient interview was completed for that time period and whether the interview that was completed was a proxy respondent. If an interview was not completed, the status variables indicate the reason for nonresponse (e.g., subject deceased with no proxy available, subject refused, subject could not be located, etc.). Status codes are not assigned for interview periods following the death of a subject. Status codes are blank also when a final nonresponse code (e.g., subject unlocateable) was assigned at the previous interview. The interview status variables were generated for survey administrative purposes and are intended to reflect a subject's status at the time of contact by the interviewer. As a result, there are a small number of subject's in the adult files with a Time 6 interview status that indicates the interview was completed by proxy and the subject was deceased (T6_STAT is coded as DD), but for whom the vital status (VITSTAT) indicates the subject is alive. This situation occurs because vital status is reported as of the end of the study period (8/31/92) but efforts to conduct Time 6 patient interviews continued through November 1992. Clinical/Vital Status Variables: Several derived variables related to the subject's clinical and vital status are present on the overall patient level file. These are described below. 5-4 Illness Stage (ILLSTAGE): This variable was derived to indicate the stage of an individual's HIV infection at the time the screener was administered. It is based on self-reported information; it does not reflect information from subjects' medical records. This variable was intended to stratify subjects into approximate illness categories for sampling purposes. It was not an effort to apply the full CDC AIDS case definition. Analysts interested in applying the complete CDC classification scheme should construct their own definitions using data from the medical record abstract. Subjects were grouped into three disease stages based upon their responses regarding the type of conditions or symptoms they had experienced. Persons were classified as having AIDS if they had been diagnosed with any of the following: PCP, Kaposi's sarcoma, lymphoma, wasting syndrome, tuberculosis, cryptococcosis, cytomegalovirus, MAI, cryptosporidosis, dementia, histoplasmosis, toxoplasmosis, isosporiasis, leukoencephalopathy, or salmonellosis. In addition, some individuals voluntarily reported that they had been diagnosed with AIDS but had not been diagnosed with any of the listed conditions. These persons were also classified as having AIDS. Subjects were classified as being HIV-ill if they reported no AIDS qualifying conditions but did report one of the following: swollen glands, persistent fever, diarrhea, weight loss, candidiasis, or herpes simplex. Subjects who reported no AIDS or HIV-ill qualifying conditions were considered asymptomatic and those who did not know whether they had any of the listed conditions were classified as unknown. Vital Status (VITSTAT): This variable indicates a subject's vital status (alive, dead) as of August 31, 1992. Date Last Known Alive (VSLIVEMO, VSLIVEDY, VSLIVEYR): These variables are coded with the most recent date (on or before 8/31/92) that the study data confirmed that a subject was alive. These variables are blank if a subject died during the study period. Date of Death (DODMO, DODDY, DODYR): These variables are coded with the date the subject was reported 5-5 deceased. They are blank if the subject was not reported deceased. Source of Death Date (DODSOURC): Three sources of information were used to determine whether or not a subject was deceased. A death certificate was used as the primary source if one was obtained for the subject. The secondary source of this information was the medical record abstract. Finally, dates reported by a proxy respondent were used if information was not received from the other two sources. Service Utilization Variables: Aggregated patient level self-reported counts of the major types of health care services utilized were calculated for the 18-month study period. Specific variables are described below. Total Inpatient Admissions (ADMTOT): This is a count of the total number of hospital inpatient stays reported by the patient across the six interview periods. In situations where a hospital stay begins in one interview period and continues into the next period, it is counted as one stay. Total Inpatient Nights (IPNGTTOT): This variable sums the total number of nights a subject spent in a hospital during the 18-month study period. Some subjects may have reported the number of nights they were in the hospital for some, but not all, inpatient stays. IPNGTTOT sums the number of nights for only those stays with complete information. Total Emergency Room Visits (ERVSTOT): ERVSTOT is the total number of visits the subject reported making to hospital emergency rooms during the study period. Total Hospital Clinic Visits (HCVSTOT): HCVSTOT represents the total number of self-reported visits by a subject to hospital clinics during the study period. Total Other Clinic Visits (OCVSTOT): OCVSTOT is the number of visits the subject reported making to other clinics during the study period. Total Private Physician Visits (MDVSTOT): MDVSTOT represents the total number of self-reported visits by a subject to private physicians during the study period. 5-6 Total Ambulatory Visits (AMBVSTOT): AMBVSTOT represents the sum of all visits to hospital clinics (HCVSTOT), other clinics (OCVSTOT), and private physicians (MDVSTOT) reported by a subject during the study period. It excludes visits to hospital emergency rooms. Total Observation Days (TOBSDAYS): This variable reflects the total number of days during the 18-month study period that the subject was observed and for which patient interview information was collected. It does not include any gaps in coverage during periods of ineligibility (i.e., periods when the respondent was out of the country or in jail). Total observation days can be used to standardize 18-month utilization counts to account for variation in the length of observation periods across individuals. Examples: Case 1: Time 1 and Time 2 interviews were completed for this individual. The subject was in jail for the entire Time 3 interview period so no interview was completed. The subject died in Time 4 and an interview was completed by proxy respondent. Time 1: covers 3/1/91-5/7/91; 67 observation days Time 2: covers 5/7/91-8/21/91; 106 observation days Time 3: not completed, subject ineligible; 0 observation days Time 4: covers 1/27/92-3/4/92 (date subject deceased); 36 observation days Total observation days for this individual is equal to the sum of observation days for the individual interview periods (209). Case 2: Interviews were completed in each of the six periods. The person was out of the country for 5 days during the second interview period and for 9 days during the third interview period. Time 1: covers 3/1/91-6/30/91; 121 observation days Time 2: covers 6/30/91-10/6/91 with 5 day gap from 5-7 8/14/91-8/18/91; 93 observation days Time 3: covers 10/6/91-1/12/92 with 9 day gap from 12/22/91-12/30/91; 89 observation days Time 4: covers 1/12/92-4/15/92; 93 observation days Time 5: covers 4/15/92-7/4/92; 79 observation days Time 6: covers 7/4/92-8/31/92; 58 observation days Total observation days for this individual equals 533. 188.8.131.52 Time Specific Patient Level Files There are six adult and six pediatric patient level files which contain information specific to the patient for each of the interview periods. These files include all patient interview data contained on nonrepeating records. There is some variation between the adult and pediatric files in the types of information they contain. Minor differences also occur across the six times for both adults and pediatrics. Table 5-1 outlines the general categories of information contained on the adult files and indicates the presence (denoted by an X) or absence of such information in each of the time specific files. Table 5-2 presents similar information for the pediatric files. Table 5-1. Categories of Information on Adult Time Specific Files 5-8 Table 5-2. Categories of Information on Pediatric Time Specific Files Derived and flag variables in the time specific patient level files fall into two general categories: survey administration and service utilization. In general, the naming convention for these variables is to use a base name followed by an integer corresponding to the particular time specific file in which the variable resides. For example, ADM1 resides on the Time 1 file, ADM2 on the Time 2 file, and so on. Individual variables are described below. Time Specific Interview Status (T1_STAT - T6_STAT): Unlike the overall patient characteristics file, the time specific files contain records for only those persons with a completed interview. Accordingly, there are three valid status codes in these files. These include interview completed with subject; interview completed with proxy, subject living; and interview completed with proxy, subject deceased. Time Gap Flag (GAP1FLAG - GAP6FLAG): Beginning with the Time 4 interview, adult and pediatric subjects were asked to indicate whether they had travelled outside of the country for a period of 2 weeks or longer. Adult subjects were also requested to provide information about periods of incarceration exceeding 2 weeks. In Time 4, subjects provided information on all time gaps occurring since the start of the study period. Subsequent interviews gathered information on only those time gaps occurring during the current interview period. For study purposes, subjects were considered ineligible for periods of travel or incarceration two weeks or longer in duration. Information on health care utilization was not collected for these periods. The time gap flag was created to assist the user in 5-9 identifying interview periods in which a time gap occurred. If a gap occurred in a particular time period, the flag variable is set to 1. Otherwise, the flag is set to blank. Number of Inpatient Admissions, Per Interview (ADM1 - ADM6): This variable contains the number of inpatient hospital admissions reported by a subject in the specific interview period. Inpatient admissions which begin in one interview period and continue into the next period are counted as a stay in each period. Therefore, if users want to determine the total number of inpatient admissions for a subject during the 18-month study period they should use the variable created for this purpose (ADMTOT), and not attempt to sum ADM1 through ADM6. Summing ADM1 through ADM6 may result in a slightly higher number of total admissions. Number of Inpatient Nights, Per Interview (IPNGT1 - IPNGT6): IPNGT1 through IPNGT6 represent the number of nights the subject was in the hospital in each interview period. For each hospital admission, subjects were asked to indicate how many nights they spent in the hospital. The IPNGT variables are calculated by summing the number of nights across all stays reported by a patient in a particular interview period. If an inpatient stay began during the current interview period and continued into the next period, only the portion of the stay which falls in the current period is counted. Number of Emergency Room Visits, Per Interview (ERVS1 - ERVS6): This variable contains the number of visits a subject made to emergency rooms in the specific interview period. Hospital Clinic Visits, Per Interview (HCVS1 - HCVS6): HCVS1 through HCVS6 represent the number of visits a subject made to hospital clinics in each interview period. Other Clinic Visits, Per Interview (OCVS1 - OCVS6): This variable indicates the number of visits an individual made to other clinics in the specific interview period. Private Physician Visits, Per Interview (MDVS1 - MDVS6): The number of private physician visits a subject made in each interview period is contained in MDVS1 through MDVS6. Number of Ambulatory Visits, Per Interview (AMBVS1 - AMBVS6): This variable indicates the number of visits to 5-10 hospital clinics, other clinics, and private physicians made by the subject in the specific interview period. It does not include visits to emergency rooms. Interview Specific Observation Days (OBSDAYS1 - OBSDAYS6): This variable indicates the number of days in a given interview period that a subject was observed and for which patient information was collected. It is calculated by counting the number of days elapsed from the reference period begin date through the reference period end date, minus any days the subject was ineligible during this period. Observation days can be used to standardize utilization counts by taking into account variation in the length of observation periods across individuals. Examples: Case 1: Time 1 Begin Reference Date - 3/1/91 Time 1 End Reference Date - 5/20/91 Periods of Ineligibility - none Time 1 Observation Days (OBSDAYS1) - 80 days Case 2: Time 3 Begin Reference Date - 10/20/91 Time 3 End Reference Date - 1/5/92 Periods of Ineligibility - 11/2/91-11/20/91, subject in jail Time 3 Observation Days (OBSDAYS3) - 57 days 5.1.2 Service Utilization Files Self-reported information on the amount and types of health care services received reside on the service utilization files as a series of repeating records. Information on out-of- pocket payments and other sources of payment for these services is also available on these files. The service data are divided 5-11 into separate files by type of service. Each file includes complete self-reported information on a particular service for the entire study period for either adult or pediatric subjects. For example, there is an adult inpatient hospital stay file which contains all information on this type of service collected during the six patient interviews. With the exception of the dental file, the service utilization files are structured such that similar data items are assigned the same variable names across the various files. For example, the variable SRE_DOL is always used to indicate the dollar amount a subject paid out of pocket for a service, regardless of the type of service. This format makes it easier for the user to aggregate information about all services received by a particular patient. Derived and flag variables present on the service utilization files are described below. Source Questionnaire (SFORM): This variable indicates the particular interview period in which the data were collected and whether the questionnaire was for an adult or pediatric case. For example, if SFORM has a value of A, this indicates that the information was collected in the Time 1 questionnaire, and was an adult case. Type of Service Utilization (SFPART): SFPART is used to distinguish the type of service utilized. Unique Stay Flag (SHSTAYFG): Each reported inpatient hospital and nursing home stay is assigned a unique stay number using the variable SHSTAYFG. The stay numbers are unique throughout all six interviews but they are not necessarily assigned in order from the first stay reported to the last. 5-12 This variable is also assigned to all events in the separately billing doctor (SBD) files. A separately billing doctor or "separate billing doctor" is one who provides care to a patient during an inpatient or nursing home stay but bills the patient separately for services rendered. Anesthesiologists and radiologists commonly fall into this category. Although the majority of SBDs are medical doctors, they may also include other types of medical practitioners. Inclusion of SHSTAYFG on these files enables the user to link an SBD event to the inpatient hospital or nursing home stay in which the service was provided. For example: Subject has 3 inpatient stays during the study period. The first stay is assigned an SHSTAYFG of 7. The second stay is assigned an SHSTAYFG of 8. The third stay is assigned an SHSTAYFG of 108. Subject has 2 separately billing doctor (SBD) events during the study period. The first SBD event is assigned an SHSTAYFG of 8. The second SBD event is assigned an SHSTAYFG of 108. Using the values in SHSTAYFG the user can link the first SBD event to the second inpatient stay and the second SBD event to the third inpatient stay. Continuous Stay Flag (ICTMFLG): This variable can be used in conjunction with SHSTAYFG to identify the components of stays that begin in one interview period and continue into the next period. ICTMFLG indicates in which interview periods the components of the stay can be found. If useful to a particular analysis, these variables enable the user to aggregate the component records into one stay. For example: The subject was interviewed for the Time 1 interview while in the hospital and was subsequently discharged during the Time 2 interview period. Therefore, the subject has a stay that spans the first and second interview period. This stay is represented in the inpatient stay file as follows: 5-13 First component record: Provider ID for the Stay: Coded 3198671. This represents the randomly assigned sequential ID number for the hospital where the stay occurred. Date of Discharge: Coded 95/95/95 which indicates the subject was still in the hospital at the time of interview. Number of Nights in Hospital: Coded as 5. This represents the number of nights the subject spent in the hospital up until they completed the Time 1 interview. Unique Stay Flag: Coded as 15. Continuous Stay Flag: Coded as AB. A indicates that one component of the stay was collected in the Time 1 adult questionnaire while the B indicates that the second component was collected in the Time 2 questionnaire. A complete explanation of these codes is included in the inpatient stay file codebook. Second component record: Provider ID for the Stay: Coded 3198671. Date of Discharge: Coded 8/15/91 which indicates the date the subject was actually discharged. Number of Nights in Hospital: Coded as 10. This represents the number of nights the subject spent in the hospital since the Time 1 interview. Unique Stay Flag: Coded as 15. Continuous Stay Flag: Coded as AB. Therefore, if users are interested in identifying all components of a continuous stay they should first look at the continuous stay flag (ICTMFG) to see whether it is set. If ICTMFG is not set (i.e., blank) then the stay is not a continuous stay. If ICTMFG is set, users should examine the values in ICTMFG to determine in which interview periods the remaining component pieces are located. Then, the users should search through all inpatient stays for that subject that fall within the identified 5-14 interview periods and that have the same unique stay numbers (SHSTAYFG) as the initial component. Overlapping Inpatient Hospital and Nursing Home Stay Flag (ANOSTYF1, ANOSTYF2): These flags are used to denote situations where subjects were admitted to a hospital from a nursing home, residential care facility, or hospice. In these cases, subjects were not discharged from the long-term care facility and their bed was held for them until they returned from the hospital stay. This results in hospital stay dates that overlap with the dates of stay in the nursing home or other long-term care facility. This flag variable allows users to take this overlap into account if appropriate for their analysis. ANOSTYF1 and ANOSTYF2 are set only for hospital and nursing home stays that overlap. The values in ANOSTYF1 and ANOSTYF2 correspond to the unique stay number(s) (SHSTAYFG) of the stays with which it overlaps. In situations where one nursing home stay overlaps with two hospital stays, both ANOSTYF1 and ANOSTYF2 are set. For example: Case 1: Hospital stay number 43 overlaps with nursing home stay number 111. ANOSTYF1 for the hospital stay record is set to 111 to indicate that it overlaps with nursing home stay 111. Similarly, ANOSTYF1 for the nursing home stay record is set to 43 to indicate that it overlaps with hospital stay 43. Case 2: Hospital stays number 75 and 81 overlap with nursing home stay number 26. ANOSTYF1 is set to 26 for both hospital stay records to indicate that they overlap with nursing home stay 26. ANOSTYF2 is set to blank for both hospital stay records. On the nursing home record, ANOSTYF1 is set to 75 and ANOSTYF2 is set to 81. Discrepant Source of Payment Flag (INSURFLG): In each interview period subjects were asked to indicate what type of medical insurance coverage they had during the reference period. Subjects were also asked to indicate the source of payment for each episode of medical care they received. Editing procedures found situations where the subject reported that a specific type of insurance covered a particular event but failed to report that type of coverage in the overall set of questions on medical insurance. Following review of these cases for coding and data entry errors, some discrepancies remained in the data. INSURFLG is used to highlight these discrepancies. 5-15 INSURFLG is set to 1 on the service record if the reported source of payment for that particular service is inconsistent with the type of medical insurance reported at the overall patient level for the interview period in which the service was received. 5.2 Medical Record Abstract Data Files 5.2.1 Overview The medical abstract data files contain selected clinical information abstracted from study subjects' medical records. The ACSUS study design included the medical record data collection component for the purpose of confirming specific HIV-related conditions and diseases to be used in determining the stage of subjects' HIV disease. Therefore, these files are intended to provide the user with information about a subject's clinical history and are not an appropriate source for obtaining patient level utilization counts. Users interested in such analyses should use the patient interview or provider billing files instead. During three of six patient interviews, subjects were asked to indicate their usual source of medical care. Subjects were asked to sign permission forms authorizing study personnel to contact the named providers for purposes of obtaining access to subjects' medical records. If a subject reported having more 5-16 than one usual source of care, an attempt was made to obtain medical records from the multiple sources of care. If a subject did not report a usual source of care, an attempt was made to obtain the patient's medical record from the provider where they were sampled into the study. The data contained in these files reflect information abstracted from the subjects' medical records and may represent data collected from more than one medical provider. Although the ACSUS patient interviews and provider billing survey collected data for the 18-month period beginning March 1, 1991, and ending August 31, 1992, medical record data were collected for a broader interval of time. Medical abstractors were instructed to collect clinical information for health care services provided during the period beginning January 1, 1990, and ending August 31, 1992. In addition, abstractors went as far back in the medical record as necessary to confirm an AIDS diagnosis or for evidence of a positive HIV serostatus. Four separate data files make up the complete set of medical record abstract files. The first of these files is a patient record file which is a compilation of information collected from all usual sources of care which has been edited across these multiple data sources. The three remaining files, the inpatient stay, check-list conditions and T-cell reports 5-17 files, contain whatever information was collected from the individual usual sources of care. Because these data have not been compiled into one record it is possible that two different usual sources of care may report similar information about a subject. 5.2.2 Patient Level File The medical abstract files contain one patient level record. This record includes data items that were derived using information drawn from all abstracts received for a patient. Specific variables and how they were derived are described below. Record Review Period (URMOF, URDYF, URYRF, URMOL, URDYL, URYRL): This set of date variables reflects the earliest and latest medical record reviewed for an individual. It does not reflect gaps in medical record coverage occurring between these dates. The review period for an individual may span as wide a period as 1/1/90 through 8/31/92 or some portion of this period. Factors such as the completeness of a subject's medical record, provider response rate, and study subject survival time affect the length of the review period. For example: Case 1: Medical abstracts for this subject were received from three providers covering the following time periods: Provider 1: 2/8/90-7/17/91 Provider 2: 1/1/90-11/20/91 Provider 3: 9/21/91-6/5/92 The record review period for this individual is set to 1/1/90 through 6/5/92. 5-18 Case 2: Medical abstracts for this subject were received from two providers covering the following time periods: Provider 1: 3/1/90-12/13/91 Provider 2: 2/3/92-8/31/92 The record review period for this individual is set to 3/1/90 through 8/31/92. Note that the record review dates do not take into account the 2-month gap in coverage between providers (12/13/91 through 2/3/92). HIV Positive Diagnosis (UHIV), Diagnosis Date (UHDIAGMO, UHDIAGDY, UHDIAGYR) and Report Date (UHRDMO, UHRDDY, UHRDYR): These variables were derived to represent the best information available across all medical abstract forms for an individual. A subject is coded as having a positive HIV diagnosis if at least one medical abstract confirmed a diagnosis. The corresponding diagnosis and report dates from that abstract are coded in UHDIAGMO, UHDIAGDY, UHDIAGYR and UHRDMO, UNRDDY, UHRDYR. If multiple abstracts indicated a positive serostatus, then the earliest diagnosis and report dates are coded. A subject is coded as not having a positive HIV diagnosis if none of the medical abstracts for that patient confirmed a diagnosis. A subject's serostatus is unknown if all medical abstracts reported an unknown status. AIDS Diagnosis (UAIDS), Diagnosis Date (UADIAGMO, UADIAGDY, UADIAGYR) and Report Date (UAIDRDMO, UAIDRDDY, UAIDRDYR): These data items follow the same approach as the HIV positive diagnosis and report date variables. If at least one medical abstract stated that the individual was diagnosed with AIDS, the subject is coded as having an AIDS diagnosis. If multiple abstracts indicated an AIDS diagnosis, then the earliest diagnosis and report dates are coded in UADIAGMO, UADIAGDY, UADIAGYR and UAIDRDMO, UAIDRDDY, UAIDRDYR. A subject is coded as not having an AIDS diagnosis if none of the medical abstracts confirmed a diagnosis. If all medical records stated that it was unknown whether the subject had a diagnosis of AIDS, UAIDS is unknown. 5-19 5.2.3 Medical Abstract Repeating Record Files Information on inpatient stays, outpatient/clinic checklist conditions, and laboratory reports of T-cell counts exist in the files as a series of repeating records. Because data for a subject may have been collected from more than one usual source of care, each repeating record has a provider identification number that indicates the source of the information. The provider identification number, USCID0n, corresponds to the provider from whom the information was collected. This is not necessarily the provider from whom care was received. USCID0n is a randomly assigned sequential number where n is an integer that uniquely identifies two records from the same provider. Since the medical data represent information collected from multiple providers there are situations where data overlap, are duplicative, or discrepant across providers. For example, if a subject reported having a hospital and a private physician as usual sources of care, patient charts from both sources may contain information on a single inpatient stay; however, data collected on that stay may or may not be the same across providers. It is possible that the hospital chart contains more detailed information on admitting or discharge diagnoses than the chart maintained by the private physician, which may record only 5-20 a primary diagnosis. The situation may also occur where the admission or discharge date differ. In cases of apparent overlaps or discrepancies, the data have been edited for keying and coding errors only. Additional editing, which would have involved either extensive follow-up beyond the scope of this study, or making numerous assumptions without sufficient supporting evidence, was not undertaken. Discrepant or overlapping items across medical records have not been flagged. Therefore, users should be aware that these items exist and, if important to their analysis, they should examine the data files for those items prior to undertaking analysis. 184.108.40.206 Inpatient Stays File This file contains data on all inpatient stays reported in the medical record. For each inpatient stay, information was collected on the admission and discharge dates as well as admitting and discharge diagnoses. The inpatient stays file contains one flag variable which is described below. Inpatient Stay Flag (UIPSFLG): This is a flag variable that is used to denote inpatient stays where the medical record contained more than three admitting or ten discharge diagnoses. Because the data files were initially structured to accommodate no more than ten codes, the inpatient stay flag was introduced to alert 5-21 the user that additional diagnoses were reported and are contained in a "secondary" record for the stay. The inpatient stay flag is set to 1 on the primary record and on the secondary record so that the user can identify both components of the stay. The secondary record contains the same discharge and admission dates as the primary record. If more than three admitting diagnoses are reported, the additional diagnoses are recorded in the secondary stay and the discharge diagnoses are set to blank on the secondary stay. Similarly, if more than ten discharge diagnoses are reported, the additional discharge diagnoses are recorded in the secondary stay and the first admitting diagnosis is set to "99999" on the secondary stay. Subsequent admitting diagnoses are set to blank. For example: Primary Record: Secondary Record: Provider ID - 000001 Provider ID - 000001 Admission Date - 6/10/92 Admission Date - 6/10/92 Discharge Date - 7/3/92 Discharge Date - 7/3/92 Admission Diagnoses: Admission Diagnoses: Diagnosis #1 - 486XX Diagnosis #1 - 99999 Diagnosis #2 - 1363X Diagnosis #2 - blank Diagnoses #3 - blank Diagnoses #3 - blank 5-22 Discharge Diagnoses: Discharge Diagnoses: Diagnosis #1 - 1120X Diagnosis #1 - 1363X Diagnosis #2 - 7832X Diagnosis #2 - blank Diagnosis #3 - 5589X Diagnosis #3 - blank Diagnosis #4 - 7994X Diagnosis #4 - blank Diagnosis #5 - 7806X Diagnosis #5 - blank Diagnosis #6 - 2859X Diagnosis #6 - blank Diagnosis #7 - 0389X Diagnosis #7 - blank Diagnosis #8 - 01190 Diagnosis #8 - blank Diagnosis #9 - 690XX Diagnosis #9 - blank Diagnosis #10 - 0420X Diagnosis #10 -blank UIPSFLG - 1 UIPSFLG - 1 A primary record and the secondary counterpart for that stay can be identified in the data by searching for two inpatient stay records with the same patient ID number, the same provider ID number, and the same admission and discharge dates, and where UIPSFLG is set to 1. 220.127.116.11 Checklist Conditions File The outpatient/clinic checklist records contain information on medical conditions commonly associated with HIV infected persons. For purposes of this study, a checklist of 75 conditions was developed and hospital outpatient, emergency room, clinic, and physician records were reviewed for any mention of these conditions. The checklist conditions file is structured such that the earliest reported date of diagnosis is coded in the date of diagnosis variables (UCCDXMO, UCCDXDY, UCCDXYR). Subsequent diagnosis dates or reports of the condition are contained in UCCRDAT1 through UCCRDAT8. If a subject's medical record 5-23 indicated numerous visits over a brief period of time for treatment of the same medical condition, only one report date per month may have been recorded. Each outpatient/clinic record contains space for 8 report dates of a particular condition. If more than 8 dates were present in the medical data, additional dates are recorded on another record. In addition, information on the same condition may have been collected from multiple providers and appears on separate records. Therefore, it is important that users search all outpatient records for a patient if they are interested in obtaining all information that was collected on a particular condition. 18.104.22.168 T-Cell Reports File Medical abstractors were instructed to search the medical record for all laboratory reports of T-cell counts. Depending upon the available information, absolute counts, percentages, and date of report were recorded for monitoring both CD4 and CD8 cells. 5-24 5.3 Provider Billing Survey Data Files In each of six patient interviews, subjects were asked to supply the names of all medical care providers from whom they received health care during the interview period. Subjects were also requested to grant permission to allow study personnel access to their billing records from the named providers. Medical providers were contacted in two rounds of data collection depending on whether the patient reported services from the provider and gave consent during the round period. Providers were contacted to obtain information on the services rendered, charges for these services, and the source of payment for these services. The provider billing survey files contain all information gathered as part of this data collection effort. The files also include imputed data for nonrespondent providers and imputed data for charges not reported by respondent providers. Information collected as part of the provider billing survey is contained on four files: the ambulatory visit, inpatient stay, home health visit, and prescription medicine files. The file structure is one record per bill for a health care event on the ambulatory, inpatient, and prescription files. An event is defined as an ambulatory visit, an inpatient stay, or a medical prescription purchase. The home health file reports multiple events on one record which contains billing information 5-25 for one or more home health visits by a particular type of caregiver during a period of time. The records of the provider billing files have a master structure in which each record contains the same set of variables regardless of the event type. Because not all variables are applicable to each of the files, some data fields are blank for all records in the file. For example, the variable NUMNIGHT, which refers to the length of an inpatient stay, is only applicable to the inpatient stay file. Therefore, NUMNIGHT is always blank on the ambulatory, home health, and prescription medicine files. The provider billing survey files contain edited, derived, flag and imputed variables. The following paragraphs describe all derived variables and some flag variables. All imputed variables and any flag variables related to imputation are highlighted in Chapter 6. Questionnaire Form (PFORM): This variable indicates the type of provider billing survey form on which the data were collected, that is, the inpatient stay, the ambulatory, home health, or pharmacy billing form. Hospital Ownership Code (OWNSHP2): OWNSHP2 indicates whether the hospital where the care was received is publicly, privately, or federally owned. OWNSHP2 is blank for all providers that are not hospitals. Type of Care Provided (CARETYPE): Each of the four provider billing survey data files contain records that may represent more than one type of health care service. 5-26 For example, the inpatient stay file contains information about both hospital inpatient and nursing home stay events, the prescription medicine file includes data on prescription medications as well as medical equipment and supplies, and so on. CARETYPE is a derived variable which is used to distinguish among the various types of services. Nonresponses were imputed by this care type variable. Length of Stay (NUMNIGHT): This variable measures the duration of an inpatient stay in nights. It is calculated from the admission and discharge dates reported by the provider. Inpatient Stay Identifier (PHSTAYFG): Each inpatient stay is assigned a unique stay number (PHSTAYFG). Similarly, all emergency room visits that resulted in an inpatient admission and separately billing provider events in the ambulatory visit file are assigned a unique stay number which corresponds to the stay number for the associated inpatient event. If appropriate for their analyses, users can link the inpatient, emergency room, and separately billing provider events using PHSTAYFG. 5.4 Guide to Codebooks and Annotated Questionnaires There are a series of codebooks which contain complete descriptions of the contents of the data files on the tape. In general, there is a separate codebook for each data file. Two exceptions are the codebooks for the Provider Survey Billing files and the Non-Medical Services file. The Provider Survey Billing files have one master codebook that is applicable to the four component files. Eight codebooks exist for the Non-Medical Services file, one codebook for each type of non-medical service. 5-27 Printable codebooks are available on the tape as a series of text data files. The users manual does not contain hard copy versions of the codebooks. File names for the individual codebooks are listed in Chapter 3. Figure 5-1 provides a description of each of the items appearing in the codebook. In addition, each codebook includes an index of variables which alphabetically lists all variables in the codebook and the corresponding codebook page number. The index of variables is located at the end of the codebook. Appendices A-C contain copies of all data collection instruments for the ACSUS study. The instruments are annotated with the actual variable names that appear in the data files. Annotated variable names are listed in brackets under the question to which they refer. Questionnaire items that are not included in the ACSUS public use tapes are not annotated. Figure 5-2 is an example of the annotated questionnaire format. 5-28 Figure 5-1. Example of the Codebook Format Figure 5-2. Example of the Annotated Questionnaire Format 5-29 6. DATA IMPUTATION This chapter is an overview of the imputation procedures associated with the ACSUS data. Section 6.1 briefly summarizes imputation strategies for handling different types of nonresponses in surveys. Section 6.2 outlines the general imputation approach for the ACSUS provider survey data. Section 6.3 discusses some limitations in the use of the imputed ACSUS data. Section 6.4 describes the types of imputation variables in the data files. 6.1 Introduction to Imputation Strategies The problem of missing data is pervasive in survey research. Estimates derived from datasets that contain missing items may be biased when respondents differ from nonrespondents with respect to the characteristics being analyzed. By carefully assigning responses to missing data, imputation can produce a complete dataset which compensates for nonresponse bias, simplifies analyses, and produces consistent results across analyses, with minimal loss of precision. Nonresponse in surveys is typically categorized as either unit nonresponse or item nonresponse. A unit nonresponse occurs 6-1 when none of the survey items are obtained from a surveyed unit. In the ACSUS study, for example, information could not be obtained for some patient-reported events, when the provider refused or for some other reason was unable to participate in the survey. Item nonresponse occurs when some, but not all, of the responses are missing from an otherwise cooperating survey respondent. In this study, the provider data for an event was missing because the provider omitted the data, refused to give it, was unable to locate it, etc. The distinction between different types of nonresponse is useful because different imputation strategies may be required, depending on the situation. The most common imputation procedures utilize the survey data as a source of information to derive the imputed values. Surveyed units are matched by auxiliary variables that make the events similar in respect to the data being imputed, and then the respondents' information (known as donor data) are used to derive the imputed result for the nonrespondents (known as recipients). The imputation procedure can either be of a stochastic or deterministic type. The most common of these types are, respectively, the hot-deck imputation method and the mean imputation method. 6-2 Typically the procedure for both hot-deck and mean imputation methods begins by sorting the surveyed units into a set of imputation cells. Cells are formed by combining the values of the auxiliary variables used to match surveyed units. Adjacent cells may be collapsed if a cell has a deficiency in donors. Then the value for a recipient variable may be chosen at random from the donors within its cell (hot-deck method), or be set to the mean value calculated across the donors in its cell (mean method). 6.2 General Imputation Strategy for ACSUS Data Both unit and item nonresponse imputation were performed on the ACSUS provider data. Only billing data for medical providers was imputated. Information collected from patients was not imputed, due to a low nonresponse rate. For imputation, the patient-reported health care services and provider bills for services (events) were arranged into imputation categories by the following. § Whether the data were collected in Round 1 or 2. (Imputation was necessary when data were collected from a provider in one round but, although requested, data were not collected in the other.) § Whether the patient was adult or pediatric. 6-3 § Whether the records for a patient/provider pair fell into one of the following care types: - Inpatient Hospital and Nursing Home Stays. - Visits by Separately Billing Doctors Associated with Inpatient Stays. (Only for adult patients.) - Emergency Room Visits, classified as associated with an inpatient stay or not. - Ambulatory Visits, including visits to Hospital Clinics, Community Clinics, Private Medical Doctors, Mental Health Services for Psychological Counseling, and Medical Practitioners. - Home Health Visits from Medical Personnel, Social Workers/Case Managers, or Helpers and other types of caregivers. - Prescription Medications, classified by cost or by the drug itself for the most frequency purchased medications. If a provider event did not fall into one of these care type categories, then this event was excluded from any imputation procedure. For instance, imputation was not performed on Medical Equipment/Supplies data or to events for which no care type could be assigned. For each category, the events were further organized into two groups based on whether or not data had been obtained from the provider for a patient and whether imputation was to be performed when data were not available. § Group 1 consisted of events where the provider had supplied records of events reported by the patient. 6-4 § Group 2 consisted of events where the provider was a unit nonresponse and met these criteria: - The provider refused; - The provider did not respond before data collection was closed; - The provider was found to be out of business at the time of the survey; - The provider had purged all the records for the reference period; - The provider had a language barrier or some other reason existed for no data; or - There was no permission form from the patient to collect provider data. If a patient/provider pair did not meet one of the criteria for either Group 1 or Group 2, then the events for the pair were excluded from any imputation procedure. The ACSUS imputation process proceeded separately by imputation category. First, imputation cells were defined by auxiliary variables, such as, geographic region of health provider, disease stage of the patient, etc. The choice of variables was dependent on the care type and whether adult or pediatric data were being imputed. See Section 6.2.1 for details on cell definitions. Then provider records associated with Group 1 received item nonresponse imputation for missing charge components by a combination of hot-deck and mean imputation methods. Section 6-5 6.2.2 generally outlines this procedure. These methods varied by care type, but not across rounds or between adult and pediatric patients. Table 6.1 lists the charge components imputed for each care type. Table 6-1. Imputed Charge Components by Care Type After the item nonresponse imputation was completed for Group 1, a unit nonresponse imputation by the hot-deck method was completed for Group 2, using Group 1 as donors. For a Group 2 patient-reported event, a similar provider survey event from Group 1 (as defined by imputation cells) was randomly selected, and all the billing information associated with this event was imputed for the Group 2 patient-reported event. Events that were imputed were added to the Group 1 event file as separate records. This imputation strategy was used to preserve the relationships between billing data and provider care types, as well as to preserve the multivariate relationships within the components of the billing information. 6-6 6.2.1 Auxiliary Variables Used To Define Imputation Cells The variables listed below were used to define the imputation cells. These variables were defined by the source data (charge information) separately within imputation categories, and therefore, varied in use among imputation categories. For example, for the adult Inpatient Stays, seven different auxiliary variables were used to define similar events. In contrast, the imputation process for pediatric patients utilized only three auxiliary variables for the same care type. In the imputations using the hot-deck methods, cells were defined as having either hard or soft boundaries. A hard boundary cell could not be collapsed with another cell during imputation. On the other hand, a soft boundary cell variable would be collapsed with another if, within a hard boundary cell, there was a deficiency in donors. Cells with fewer donor events than recipients were combined automatically with events in the adjacent soft boundary cell until the number of donors was sufficient or until all the events in the hard boundary cells were used. Deficient cells with hard boundaries were collapsed manually according to a predefined protocol. Auxiliary variables used during ACSUS imputation were the following: 6-7 § Geographic Group (classified by provider charges) § Type of Provider Ownership (hospitals only) § Patient's Insurance Status § Patient's Disease Stage § Patient's Exposure Route § Association of an Emergency Room Visit with an Inpatient Stay § Type of Reported Ambulatory Care § Type of Home Health Care § Type of Prescription Medication § Length of Stay (for Inpatient Stays) 6-8 § Total Charge, or Total Charge per Day or per Visit § Number of Days of Home Health Care Reported by the Patient § Number of Prescription Medication Purchases Reported by the Patient 6.2.2 Group 1 - Item Nonresponse Imputation Procedures Imputation for this group was broken down into cases based on the patterns of missing information and particular care types. The imputation methods did not differ across adult or pediatric patients. Provider records which contained no missing charge components were used as donors. Imputation cells were defined according to care type. For Inpatient Stays, Separately Billing Doctors, Emergency Room Visits, and Ambulatory Visits the same imputation procedures were used. If the total charge appeared to be a copayment (zero to ten dollars), then all billing information was set to missing for imputation purposes and all the charge components were imputed. For inpatient stays whose total charge was imputed, length of stay was also imputed. Three cases of missing data were imputed as follows. Case 1: Total and component charges missing. The hot- deck method was utilized to impute the total and all charge components from a single donor. Payment data were set to missing. 6-9 Case 2: Total charge present, but all component charges missing. The hot-deck method was utilized to impute all charge components from a single donor. The imputed components were adjusted to the original nonimputed total by keeping the same proportions of components to total that were on the donor event. Payment data were retained as collected. Case 3: Total and some component charges missing. The mean method was utilized. For each respective component charge, the mean (or mean per diem) charge was calculated from those donors where the component charge was greater than zero. This mean (or mean per diem) charge was used to impute the missing component. The total charge was imputed by summing across components. Payment data were set to missing. For Home Health Visits item nonresponse imputation was accomplished by the mean method. If the total charge was zero dollars, then all billing information was set to missing for imputation purposes and the total charge was imputed. The imputed total charge was obtained by calculating the mean per visit charge from the donors in the cell and multiplying by the recipient's number of visits. There were no components of the total charge. Item nonresponse imputation for Prescription Medications was broken down into two cases. Imputation in both cases was by the mean method. If the total charge was zero dollars, then all billing information was set to missing for imputation purposes and the total charge was imputed. There were no components of the total charge. 6-10 Case 1: Top 30 most frequently use medications for adults or top 10 for pediatric patients, with at least one matching donor on dosage. If a recipient event was matched to at least one donor with the same medication and exact dosage, the imputed value was calculated as the mean charge per quantity from the matching donor(s) multiplied by the recipient's quantity. Note that donors and recipients must have had both dosage and quantity present. The payment data were set to missing. Case 2: Medications not in Case 1. Within an imputation cell the recipient events received the mean charge across the donors. The dosage and quantity of the drug were set to missing as well as the payment. 6.3 Limitations of the Imputed ACSUS Data The imputation methods for the ACSUS data attempted to preserve the relationships between response items subject to imputation by sorting records in such a way as to group records by characteristics related to the item being imputed. Furthermore, attempts were made to check for outliers in the data before and after imputation. However, if provider survey data are to be used to explore relationships between charge and payment data, caution should be exercised. The imputation process considered only the charge components as described in Section 6.2. Billing information, other than charges, was imputed to missing during the ACSUS item nonresponse imputation process. This imputation took place because the imputed charges did not have any real 6-11 relationship to other original billing information. Original charge and payment data would be more appropriate to use in exploring these relationships. The assumption behind the imputation schemes used here is that after controlling for available auxiliary information, the missing values are missing at random. If this assumption does not hold, then estimates using the imputed values could still be biased due to nonresponse. 6.4 Imputation Variables The provider billing files contain three types of variables: original, imputed, and imputation flag variables. This section describes each of these variable types and is intended to provide the user with an understanding of how to conduct analyses with the imputed data. 6.4.1 Original Variables Original variables are defined as the original data elements as they appear in the data collection instruments. Original variables do not contain any imputed data. Users 6-12 interested in examining only unimputed charge data should use these variables for their analyses. The provider billing survey collected information on total charges for a particular type of service event and, for some services, it also collected information on component charges. Total charges for an event are contained in the variable PTE_CHRG. Component charges reside in a series of variables with the naming convention PEC_*, where * refers to the specific charge component. For example, PEC_LAB is the amount charged for laboratory services and PEC_RM is the room charge. The provider billing survey also collected information on total payments and, for some services, the source of the payment and the reason that no payment was received. The amount coded in variable PTE_PAY represents the total payment for an event. The amounts paid by specific sources are contained in a series of variables with the naming convention PEP_*, where * refers to the source. For example, PEP_PRVI is the amount paid by private insurance and PEP_MED is the amount paid by Medicare/Medicaid. The reason that no payment was received is represented by a series of variables PEN_*, where * refers to the reason. For example, PEN_CARE represents Medicare assignment. 6-13 6.4.2 Imputed Variables For each original variable which may have been imputed, the provider billing files have a counterpart variable which contains either the imputed or original value. For example, PTE_ICHR is the counterpart of PTE_CHRG. If total charges were imputed for an event then the total imputed charge is coded in PTE_ICHR. If total charges were not imputed for an event then the total charge in PTE_CHRG is copied to PTE_ICHR and the two variables have the same value. Imputed charge components reside in a series of variables with the naming convention PEC_I*, where * refers to the specific charge component. For example, PEC_ILAB is the imputed/original amount charged for laboratory services and PEC_IRM is the imputed/original room charge. Similar to PTE_CHRG, the PEC_I* variables contain the same as the original variable if no imputation was done. The imputed amounts paid by specific sources are contained in a series of variables with the naming convention PEP_I*, where * refers to the source. For example, PEP_IPRV is the imputed/original amount paid by private insurance and PEP_IMED is the imputed/original amount paid by 6-14 Medicare/Medicaid. The imputed reason that no payment was received is represented by the series of variables PEN_I*. The provider billing files contain a series of miscellaneous imputation variables. These include an imputed length of stay (ILOS2) for inpatient stays and imputed drug type (IDRUGCD), quantity (IQTY) and dose (IDOSE) for pharmacy events. 6.4.3 Flag Variables Each imputed variable has a counterpart flag variable which is coded to reflect the type of imputation that was done. The imputation flag variables use four different naming conventions, as follows. FGT_I* - These are a series of imputation flags that are assigned to total amount variables. For example, FGT_ICHR is the flag variable assigned to the imputed total charge (PTE_ICHR), FGT_IPAY is assigned to the imputed total amount paid (PTE_IPAY), and so on. FGC_I* - These imputation flags are assigned to imputed component charge variables. For example, FGC_IRM is the imputation flag assigned to the imputed room charge variable (PEC_IRM). FGP_I* - These flags are assigned to imputed source of payment variables. For example, FGP_IPRV is assigned to the imputed amount paid by private insurance (PEP_IPRV). FGN_I* - These are a series of imputation flags that are assigned to the imputed reason that no payment was received. For example, FGN_ICAR is assigned to the imputed reason for non-payment due to Medicare assignment variable (PEN_ICARE). 6-15 FLG_I* - These flags are assigned to a series of miscellaneous imputed variables. For example, FLG_IDOS is assigned to the imputed drug dosage variable (IDOSE). Each of the imputation flag variables described above is coded using a series of alpha and numeric codes that reflect the type of imputation that was conducted. Valid codes for each flag variable and their meanings are described in the provider billing codebook. An imputation flag is set to zero if no imputation was done for a particular variable. An imputation flag variable may contain multiple codes if more than one type of imputation was conducted (e.g., both missing item and event imputation) for a particular variable. In addition to the imputation flag variables described above, there is an overall event flag (IEVENTFG) which indicates whether or not the entire unit nonresponse record was imputed. IEVENTFG is coded Y when the record was imputed and is blank otherwise. Therefore, analysts interested in excluding imputed events from their analyses can use this variable to identify which records to remove.