/* SHORT TITLE:  Users Manual */

                AIDS Cost and Services
                Utilization Survey
                Public Use Tapes #4 and #5

                Users Manual

                Submitted to:
                Agency for Health Care Policy and Research
                2101 East Jefferson Street
                Rockville, Maryland  20852

                Submitted by:
                Westat, Inc.
                1650 Research Boulevard
                Rockville, Maryland  20850-3129

                Contract No:  282-89-0020

                Deliverable No.: 49

                June 30, 1994


Chapter Page

1       INTRODUCTION                                    1-1


2.1     Objectives                                      2-1
2.2     Study Design                                    2-2
2.3     Strengths and Limitations of Data               2-7

3       DATA STRUCTURE OVERVIEW                         3-1

3.1     Content of Public Use Volumes                   3-1

3.1.1   Adult Public Use Volume (Tape #4)               3-2
3.1.2.  Pediatric Public Use Volume (Tape #5)           3-5

3.2     ACSUS Public Use File Data Structure            3-8

3.2.1   Patient Reported Data                           3-14
3.2.2   Medical Records Abstract Data                   3-14
3.2.3   Provider Billing Data                           3-15
3.2.4   General Analytic Usage Guidance                 3-16

        AND DATA ANOMALIES                              4-1

4.1     Data Preparation                                4-1

4.1.1   Range Specifications                            4-1
4.1.2   Consistency Checks                              4-1
4.1.3   Frequency Review                                4-2

4.2     Types of Variables                              4-2

4.2.1   Edited Variables                                4-2
4.2.2   Derived Variables                               4-2
4.2.3   Imputed Variables                               4-3
4.2.4   Flag Variables                                  4-3

4.3     Data Anomalies                                  4-3

4.3.1   Self-Reported Marital Status                    4-4
4.3.2   Utilization Information Reported During the
        Wrong Reference Period                          4-4
4.3.3   Dates of Service Utilization After the
        Subject's Date of Death                         4-5
4.3.4   Loss of Medicare Coverage                       4-5
4.3.5   Self-Reported Date of Positive HIV Test         4-6
4.3.6   Self-Reported Conditions of HIV Illness         4-6


5.1     Patient Data Files                              5-1

5.1.1   Patient Level Files                             5-2
5.1.2   Service Utilization Files                       5-8

5.2     Medical Record Abstract Data Files              5-12

5.2.1   Overview                                        5-12
5.2.2   Patient Level File                              5-13
5.2.3   Medical Abstract Repeating Record Files         5-14

5.3     Provider Billing Survey Data Files              5-17
5.4     Guide to Codebooks and Annotated 
        Questionnaires                                  5-18

6       DATA IMPUTATION                                 6-1

6.1     Introduction to Imputation Strategies           6-1
6.2     General Imputation Strategy for ACSUS Data      6-2

6.2.1   Auxiliary Variables Used to Define Imputation
        Cells                                           6-5
6.2.2   Group 1 Item Nonresponse Imputation Procedures  6-6

6.3     Limitations of the Imputed ACSUS Data           6-7

6.4     Imputation Variables                            6-8

6.4.1   Original Variables                              6-8
6.4.2   Imputed Variables                               6-8
6.4.3   Flag Variables                                  6-9

List of Tables


2-1     Number of Adult and Pediatric Patients:  
        AIDS Cost and Services Utilization Survey       2-4
3-1     Contents of Adult Public Use Volume (Tape # 4)  3-2
3-2     Contents of Pediatric Public Use Volume 
        (Tape # 5)                                      3-5
5-1     Categories of Information on Adult Time
        Specific Files                                  5-6
5-2     Categories of Information on Pediatric Time 
        Specific Files                                  5-6
6-1     Imputed Charge Components by Care Type          6-4

List of Figures


3-1     Patient Files                                  3-9
3-2     Medical Record Abstract Files                  3-12
3-3     Provider Billing Files                         3-13
5-1     Example of the Codebook Format                 5-20
5-2     Example of the Annotated Questionnaire
        Format                                         5-21

List of Appendices


        (Annotated)                                    A-1

A-1     Adult/Adolescent Screener Questionnaire
A-2     Time 1 Adult/Adolescent Patient Questionnaire
A-3     Time 2 Adult/Adolescent Patient Questionnaire
A-4     Time 3 Adult/Adolescent Patient Questionnaire
A-5     Time 4 Adult/Adolescent Patient Questionnaire
A-6     Time 5 Adult/Adolescent Patient Questionnaire
A-7     Time 6 Adult/Adolescent Patient Questionnaire
A-8     Pediatric Screener Questionnaire
A-9     Time 1 Pediatric Patient Questionnaire
A-10    Time 2 Pediatric Patient Questionnaire
A-11    Time 3 Pediatric Patient Questionnaire
A-12    Time 4 Pediatric Patient Questionnaire
A-13    Time 5 Pediatric Patient Questionnaire
A-14    Time 6 Pediatric Patient Questionnaire

        (Annotated)                                    B-1

C       PROVIDER SURVEY BILLING FORMS (Annotated)      C-1

C-1     Inpatient Stay Form
C-2     Ambulatory Billing Form
C-3     Home Health Form
C-4     Prescribed Medicine Form

D       CODEBOOK APPENDICES                            D-1

D-1     Name of Physician/Doctor Specialty Codes
D-2     Prescription Drug Codes
D-3     Nonprescription Drug/Non-Traditional Substance Codes
D-4     Relationship to Patient Codes

                                  1.   INTRODUCTION

                  This manual provides documentation and guidance for users

          of the  public release  of the  complete data  files for the AIDS

          Cost and  Services Utilization  Survey (ACSUS).  These data files

          are contained in two volumes:  ACSUS Public Use Tape #4 (Complete

          Adult   Patient    Questionnaires,   Adult    Provider    Billing

          Questionnaires, and  Adult Medical  Record Abstracts)  and  ACSUS

          Public Use  Tape #5  (Complete Pediatric  Patient Questionnaires,

          Pediatric Provider  Billing Questionnaires  and Pediatric Medical

          Record Questionnaires).

                  Partial  information   from  the   study   was   released

          previously in  ACSUS Public  Use Tapes 1-3.  The ACSUS study data

          have undergone  subsequent editing and some changes may have been

          made to  the data since the publication of Tapes 1-3.  Therefore,

          Tape 4 and Tape 5 should be considered the complete and final set

          of public use data from the ACSUS study.  It should also be noted

          that Tape  4 and  Tape 5  are self-contained and cannot be linked

          with previous  tapes since  patient and  provider identifiers are

          not consistent with Tapes 1-3.

                  ACSUS was  a national survey of persons infected with HIV

          that was  conducted by  Westat, Inc.,  for the  Agency for Health

          Care Policy  and Research  (Contract No. 282-89-0020).  The study


          was designed  to provide  estimates of  the use  of health  care,

          social and  support services,  and the  costs of  services  by  a

          sample of HIV-infected persons in ten geographic locations in the

          United States.

                  The chapters  that follow  provide information  about the

          research design  (Chapter 2),  structure of  the public  use data

          files (Chapter  3), data  preparation (Chapter 4), and the use of

          the ACSUS  data files  and codebooks  (Chapter  5).    Imputation

          techniques are  described in  Chapter 6.   The appendices provide

          copies  of   all  data   collection  instruments   and   codebook


                  The main body of this users manual (Chapters 1-6) is also

          contained as  a text  file on the public use tapes.  All figures,

          tables, and appendices are available in hard copy only.


2.1     Objectives

        The increase  in the  number of  reported cases  of  AIDS

(acquired immunodeficiency  syndrome) and the expanded demands on

the health  care system for treating persons with illness related

to HIV  (human immunodeficiency virus) have been widely reported.

Despite the  growing problem, data on use of and expenditures for

health care services for this population are surprisingly scarce.

Thus, public  policy related  to the  HIV epidemic has often been

formulated on  the basis  of  limited  case  studies  and  cross-

sectional research.

        ACSUS is the largest data collection effort targeting the

population of persons infected with HIV.  In addition, its design

addresses three  major limitations  of current data.  First, data

from other  sources do  not permit examination of early stages of

HIV illness,  such as  periods of  asymptomatic infection, or HIV

illness that  does not satisfy diagnostic criteria for AIDS.  The

ACSUS sample includes more than 400 adults who reported being HIV


*This chapter is excerpted from Berk, M.L., Maffeo, C. and Schur,
C.L. (1993).   Research Design and Analysis Objectives, AIDS Cost
and Services  Utilization Survey  (ACSUS)  Report  No.  1  (AHCPR
Publication No.  93-0019).  Rockville, MD: Agency for Health Care
Policy and Research.


positive  but  having  no  HIV-related  symptoms  or  conditions.

Second, despite the fact that the disease is increasingly treated

in outpatient  settings, current  information on  AIDS  does  not

permit analysis  of the  use of  a  wide  variety  of  outpatient

services.   ACSUS data, on the other hand, include the use of and

charges for  the full range of ambulatory physician settings, use

of both  formal and  informal home care services, use of a number

of mental  health and  support services,  and use  of and charges

associated with  various drug  therapies.   Finally,  perhaps  of

greatest importance  is the longitudinal nature of ACSUS.  Length

of survival  after infection has been increasing and may continue

to do  so as prophylactic treatment for HIV infection becomes the

norm.   Yet there  is little information on how levels and mix of

service use,  sources of  payment for  care, or  quality of  life

change over  the course of the illness.  With six interviews over

the course  of an  18-month period,  ACSUS provides  a wealth  of

information  on  factors  affecting  the  use  of  services  over

different stages of the illness.

        Data  collected   through  ACSUS   will  be  key  to  the

formulation of Federal strategies for the allocation of resources

to the  care of persons with HIV-related illness and to a broader

understanding of the effect of HIV on segments of the U.S. health

care system.   Moreover, the information needs of State and local


governments parallel  those of  the Federal  Government  as  they

deliver care to persons with HIV-related illness.

2.2     Study Design

        Sample Design

        The ACSUS sample was drawn using a multistage design that

involved selection  of geographic areas, sampling of patient care

sites in  the selected  areas, and  a probability  sample of HIV-

infected persons who sought treatment from those providers over a

study enrollment period of approximately 4 months.

        During the  first stage,  the sample  of geographic areas

was selected  so as  to provide regional diversity in third-party

payment mechanisms,  service delivery  systems, and  type of HIV-

infected individuals (e.g., injecting drug users, homosexual men,

pediatric cases,  and women).   The  first step in the geographic

area selection  process was  to focus  on cities with the highest

number of  reported AIDS cases to ensure obtaining a large enough

sample of  HIV-infected persons.   Using  data  provided  by  the

Centers for  Disease  Control  (CDC),  the  25  cities  with  the

greatest number  of cases (accounting for more than 62 percent of

all cumulative  AIDS cases in the United States) were identified.


From these  25 cities,  10 were  selected,  using  the  following


           At least  six cities  with a  high prevalence  of HIV

           At least  one city  in a State with a Medicaid waiver
            in place  that permits reimbursement for services not
            usually covered by Medicaid.

           At  least  one  city  in  a  State  with  restrictive
            Medicaid policies.

           At least five cities with pediatric cases.

           At least  one city  with a  moderate  HIV  prevalence

        The final  selection of the 10 geographic areas was based

on data supplied by CDC on the race and ethnicity of persons with

AIDS, number  of persons  by exposure  category,  and  number  of

pediatric cases.   The  10 geographic areas are New York, Newark,

Philadelphia, Baltimore,  Miami,  Tampa,  Chicago,  Houston,  Los

Angeles, and San Francisco.

        The second  stage of  sampling was  selection of  patient

care sites  in each  area.   A partial  listing of  hospitals and

clinics (but not private practitioners) that provided services to

HIV-infected  persons  was  obtained  from  CDC's  National  AIDS

Information Clearinghouse.  Calls were made to the city or county

department of  health in  each of the 10 areas to update the list


of patient  care sites  and to attempt to obtain some information

on the number of persons treated by each.

        In  general,   accurate  information  on  the  number  of

patients treated  (as distinct  from the  number of  visits)  was

unavailable.    Although  some  hospitals  made  the  information

public, in  some cases  it was  deemed to  be confidential.  Most

health departments  considered information  on names  of  private

practitioners treating  AIDS patients  too sensitive  to release.

However, it was possible to rank hospitals and outpatient clinics

in each  area in  terms of  estimated caseload.   In general, the

patient care sites that treated the largest number of patients in

the area  were selected,  although attempts  were made to include

both private and public sites.  In four geographic areas, private

practitioners who  were affiliated  with the  hospitals providing

substantial amounts  of  inpatient  care  were  included  in  the

sample.   A sample  of 32  patient care sites was selected; 26 of

these participated in the study.

        In the  third stage  of sampling, a probability sample of

patients was  selected from each patient care site.  The sampling

frame of  patients was  identified through  the use  of  a  self-

administered  screener   questionnaire.    A  group  of  55  site

coordinators, some of whom were clinic employees, were trained to

distribute the  screening form  to all patients visiting the site


for care  during the  sampling period  of 2-4 months.  Use of the

screener questionnaire  allowed for the collection of information

necessary for  sampling  purposes  without  requiring  review  of

medical  records   or  collection  of  any  personal  identifying

information.   Coordinators  were  trained  to  provide  help  to

patients requiring  assistance in completing the form.  After the

screening  form   was  completed,   the  coordinator   used   the

information in  the form  to determine  whether the  patient  was

eligible for  the study and placed the patient in the appropriate

sampling stratum.   The  coordinator then  selected a  systematic

sample within  each stratum.   Approximately  6,000 persons  both

completed the screening form and were designated eligible for the

study; of these, 2,487 were sampled (Table 2-1).


Table 2-1.     Number of  Adult and Pediatric Patients: AIDS Cost
               and Services Utilization Survey

        The primary  sampling criterion  used was  illness stage,

defined as  AIDS, HIV-related  illness,  and  asymptomatic.    In

addition,  pediatric   patients  and   women  were   oversampled,

regardless  of  illness  stage.    (Pediatric  patients  who  had

clinically defined  AIDS were enrolled in the study regardless of

age; those who were HIV-positive but non-AIDS were required to be

at  least   15  months   of  age   to  preclude   the  chance  of


        The secondary  goal in selecting patients was to obtain a

sample whose  distribution by  exposure group  and payment source

was roughly  proportionate to that of the HIV-infected population

in the geographic areas targeted.  Thus, patients were stratified

by exposure  group and,  in the  case of  homosexual or  bisexual

males, by insurance status as well.  This stratification not only

allowed for a more representative sample of HIV-infected patients

but also  made it possible to obtain as many patients as possible

in some of the rare strata (for example, pediatrics).   The study

includes some  primarily public  hospitals, where  AIDS  patients

with higher socioeconomic status may be under- represented.  Male

homosexuals and bisexuals were stratified by source of payment in

an  attempt   to  control   the  distribution   of  patients   by


socioeconomic  status   and  reduce   potential  bias.    It  was

anticipated that  the number  of female  and pediatric cases with

private insurance  would be  small.   It also  was expected  that

nearly all injecting drug users (IDUs) would be uninsured, with a

small percent receiving public assistance.  Target sampling rates

were then computed for each provider.

        If,  after   reviewing  the   screening  form,  the  site

coordinator selected the patient into the sample, the coordinator

then immediately  reapproached the  patient  to  initiate  formal

recruitment  into  the  study.    During  recruitment,  the  site

coordinator explained the study purpose and requirements, and the

patient completed  a consent  form  and  patient  location  form.

Coordinators reported  weekly  on  sampling  activities  so  that

sample yield could be monitored and sampling rates adjusted.

        Of the 2,487 sampled patients, 88 percent, or 2,197, were

successfully recruited,  that is,  they signed a consent form and

provided location  information (Table  2-1).  Enrollment response

rates by  exposure category,  illness stage, and insurance status

were rather  uniform.   The  lowest  rate  was  for  asymptomatic

homosexual men  who had  private insurance,  about 80  percent of

whom agreed to participate.  Given the prognosis for the disease,

the stigma  the disease  still  carries,  and  the  socioeconomic

status of  this group,  this  80-percent  response  rate  is  not


unexpected.  The response rates for IDUs, women (many of whom are

IDUs), and  pediatric cases  were higher  than the  mean response

rate for  the entire  sample.   These groups  were of concern not

only from the point of view of initial enrollment into the sample

but also  with respect  to continuing participation in the study.

Consequently,  a   number  of   procedures  and  incentives  were

introduced in  an attempt  to maximize  participation.  These are

described in the next section.

        As shown  in Table  2-1, 2,090  persons, or 95 percent of

those agreeing  to participate,  actually completed  the  initial

(Time 1)  patient  interview.    Of  these,  141  were  pediatric

patients.  Of the adults (including 361 women), 678 were patients

with AIDS, 843 patients with other HIV-related illnesses, and 422

asymptomatic patients.   The  illness stage  for six patients was

unknown at Time 1.

        Data Collection Design

        The  data   collection  design  for  the  patient  survey

component  of   the  study   involved  conducting  six  in-person

interviews (Time  1-Time 6)  with study subjects over an 18-month

study reference  period, March  1, 1991, through August 31, 1992.

Subjects were  contacted every  12 to  14 weeks.   It is believed


that this  timeframe minimizes potential sample attrition because

of  death,   tracking  problems,  and  possible  recall  bias  in

reporting.     Although  information   on   insurance   coverage,

employment, and  income was  collected at  every  interview,  the

content of  the interview  varied  somewhat  over  the  six  time

periods, with  detailed segments on functional status, quality of

life, and access and barriers to care included in three periods.

        During each interview the patient was asked to name every

health care provider from which a service was received during the

time since  the previous  interview, referred to as the interview

reference period.   For  the  Time  1  interview,  the  interview

reference period  begins on  March 1,  1991, and  for the  Time 6

interview, the  period ends  on August  31, 1992.    During  each

interview the  patient was  asked to  sign permission  forms that

allowed contact  with  every  medical  provider  from  which  the

patient  received   a  service.     Providers   included  private

practitioners, outpatient  departments of hospitals, freestanding

clinics, pharmacies, and home health agencies.  The set of signed

provider permission  forms represents the sample for the provider

survey component  of ACSUS.   As  of the  first interview, almost

2,500  medical   care  providers  had  been  identified.    These

providers were  contacted two  times during  the 18-month patient

survey  reference   period  to  obtain  information  on  services

rendered,  charges  for  services,  and  source  of  payment  for


charges, as  well as  medical data  from those providers named by

the patient as the usual source of care.

        The decision  to obtain charge information from providers

rather than  from patients  was made  in  recognition  that  data

obtained from  individuals may  suffer from  a number  of biases,

including recall  error (Berk,  Horgan, and  Meyers, 1986;  Berk,

Schur, and  Mohr, 1990;  Cox and Cohen, 1985; National Center for

Health Statistics,  1961).  No attempt was made to collect charge

data from  patients, with the exception of out-of-pocket expenses

and dental  service charges.   Dental providers were specifically

excluded from  the list  of providers from which charge data were

to be  collected because of concern that awareness of a patient's

HIV status might jeopardize receipt of services.

        Providers sampled  for the provider survey component were

asked either  to furnish  a printout  of the patient's bill or to

complete a  data form  for each  reported visit  made during  the

reference period.   The information gathered from billing records

should provide  an accurate count of the services received at the

particular medical  facility and  the charges for those services.

It should  be  noted  that  charge  data,  rather  than  cost  or

expenditure data,  were  collected.    Charges  refer  to  billed

amounts and  may be  greater than  expenditures because of unpaid

bills, bad  debt, or  uncompensated care.   Costs  should measure


actual resource  use but,  in  fact,  are  rarely  likely  to  be

available or  even known.   Different methods of allocating fixed

costs as  well as  cross-subsidization among hospital departments

make cost  comparisons difficult  (Prospective Payment Assessment

Commission, 1985).

        In addition  to medical  providers,  approximately  2,500

nonmedical providers  were identified  in  the  first  interview.

These included  community-based  organizations  providing  social

support  services,  podiatrists,  and  providers  of  alternative

therapy.   For these providers, respondents were asked to provide

information about the type and amount of services received.

        Because   of    concerns   about    maintaining   patient

participation among  a group of patients who are very ill and, in

many cases,  very  transient,  data  collection  procedures  were

designed to  maximize  continuing  participation.    Interviewers

received  not   only  extensive  training  in  administering  the

questionnaire but  also specialized  instruction in understanding

the sensitive  nature of  the subject  matter and  in relating to

persons with  a serious  and debilitating  illness.   Initial and

followup interviews were conducted at a location specified by the

study subject.   IDUs  and homeless  patients, however, completed

Time 1  interviews at  the provider site because of the potential

tracking problems  associated with  these  persons.    Subsequent


interviews for  these participants  were also held at the patient

care site if the subject preferred that location over another.

        Upon  recruitment   into  the   study,  respondents  were

provided with  an 800  number and told to call with any questions

or problems at any time during the data collection.  Participants

were given  a new  card with  the 800 number on it each time they

were interviewed.   About  15 calls  a week were received on this

number.   Subjects also were paid $50.00 each time they completed

an interview.

        Patient location  data used  for tracking  purposes  were

collected initially  when the subject consented to participate in

the  study   and  updated  during  each  of  the  Time  1-Time  5

interviews.   Study subjects  were asked  to provide  traditional

tracking information, such as the name of a person who would know

how to  find them  if they  moved or  could not be located.  They

were also  asked about  places they frequented, names of friends,

names of  social workers or parole workers, other names they use,

and nicknames or street names that they had.

        Using patient  location data,  interviewers telephoned or

visited subjects  to set  up interview appointments.  However, in

some  cases,   patients  remained   difficult  to   locate,   and

interviewers obtained  additional assistance from clinic staff at


the enrollment  sites.  When difficult-to-locate patients visited

the clinic, staff would inform them that interviewers were trying

to contact them.  If these patients had scheduled a future clinic

appointment,  staff  would  apprise  interviewers  of  the  time.

Finally, study subjects were asked to provide the name of a proxy

respondent who  would  know  about  their  health  care  use  and

expenses and could be contacted if study subjects were too ill to

complete an  interview.   More  than  82  percent  of  the  study

subjects provided the name of a proxy.

2.3     Strengths and Limitations of Data

        ACSUS is  the largest and most comprehensive study of the

cost and  use of  services by  persons with  HIV-related illness.

Although it was designed to overcome many of the data limitations

described below, the difficulties inherent in identifying persons

with HIV  infection necessitated  a survey  design  with  certain

limitations.   The implications  of these  limitations  for  data

analysis can  be minimized,  however, if  the  analytic  plan  is

designed with the particular strengths and weaknesses of the data

base in  mind.   Thus, analysis  using the ACSUS data base should

emphasize those  areas of  inquiry in  which it  is strongest and

should not  attempt to examine issues for which the survey is ill



        Sources of Bias

        Like all  surveys, ACSUS  has methodological limitations.

As Fowler  (1988, p.  145) has  indicated, "the cost of trying to

achieve error-free  estimates  is  too  high  for  most  research

purposes; some potential for error exists in virtually all survey

plans.   Total survey  design involves considering all aspects of

the survey  and choosing  a level  of rigor  appropriate  to  the

purpose of the particular project."

        Groves  (1989)  describes  four  major  types  of  error:

sampling  error,   coverage   error,   nonresponse   error,   and

measurement error.  Within each of these categories, there can be

systematic as  well as  random error.   The  former will  produce

biased estimates;  the latter  will not.   Random,  or  variable,

error is  said to  be unbiased  because, although  it affects the

precision of  the  estimate,  it  is  as  likely  to  produce  an

overestimate as  an underestimate.   Thus,  the estimate  will be

unbiased when  there is  variable error,  but the variance of the

estimate will  be increased.   Three types of potential error are

discussed here as they apply to ACSUS.


        Coverage bias.  Coverage bias occurs when some members of

the  target  population  have  no  chance  of  sample  selection.

Coverage bias  is probably  the most important source of error in

ACSUS.   Because it  is a  sample of  patients,  ACSUS  does  not

include any persons who have not entered the medical care system.

ACSUS excludes  persons who receive care only at settings where a

small number  of persons  with AIDS  are seen.  These persons may

differ systematically  from those surveyed in terms of income and

source of payment for care.

        Nonresponse bias.  Nonresponse bias occurs as a result of

the failure  to collect  data from  all eligible  persons in  the

sample.  This can occur at several points in ACSUS.  Some persons

refused to  complete the  ACSUS eligibility  screener, and others

declined participation at the time of enrollment.  A small number

of respondents  declined to  participate in  later rounds  of the

survey.   Some declined  to  allow  contact  with  their  medical


        Moreover,  refusals   are  not   the   only   source   of

nonresponse.   Some patients died, and it was not always possible

to locate  an acceptable  proxy respondent  who could provide the

necessary utilization and cost information.  Other patients moved

away and were not locatable.  Field procedures, including efforts

to interview  by telephone  respondents who  had relocated,  were


designed to  minimize  nonresponse  bias,  and  field  experience

suggests it is not be an important source of error.

        Measurement bias.   Measurement  bias can  stem  from  at

least four  different sources:   the respondent, the interviewer,

the questionnaire,  and the  mode of  interview.    For  example,

measurement error  can occur  when  respondents  give  inaccurate

answers or  when an  accurate answer  is incorrectly coded.  Some

self-reports of  illness stage,  for example,  may be  erroneous.

Inaccurate responses  may result from a respondent's inability to

correctly  recall   events.     They  also   may  result  from  a

respondent's deliberately  incorrect answer,  as could occur if a

patient does  not wish to reveal to an interviewer that he or she

is receiving psychiatric counseling or drug detoxification.

        Measurement bias  also occurs  as a result of interviewer

effects.   Interviewers may ask questions incorrectly or may fail

to accurately  record the respondent's answer.  Interviewer error

tends to  be variable,  but if  interviewers systematically probe

incorrectly or  if codes  fail to correctly categorize particular

kinds of responses, the error may result in biased estimates.


        Implications for Analysis

        In considering  the analytic  utility of  ACSUS,  several

elements of  study design  need to  be  considered.    These  are

reviewed with  an eye  to specific  policy  and  health  services

research issues.

        Statistical  representation  of  population.    The  most

important limitation  is  the  lack  of  a  national  probability

sample.  Given the sensitive nature of the illness as well as the

low  rate   of  prevalence,  a  national  probability  sample  of

households would have been unlikely to achieve acceptable results

in terms  of participation  or cost.   Thus,  although a national

probability sample  was  justifiably  ruled  out,  the  resulting

sample is not statistically representative of the U.S. population

with HIV-related illness.

        Specifically,   certain    population    subgroups    are

undercounted.  ACSUS is limited to major metropolitan centers and

does not include low-prevalence or rural areas.  Persons who used

health care  services infrequently, and thus did not use services

during the  enrollment period, or who did not use services at all

are undercounted.   It should be noted that the primary objective

of ACSUS  is to  provide data on the use and cost of services for

persons with  HIV-related illness.   ACSUS  was not  intended  to


count the number of HIV-infected persons or to measure the number

of persons unable to obtain services.

        Geographic diversity.   Relative  to other studies, ACSUS

has several strengths critical to the usefulness of the resulting

data.     Although  the   sample  population   is  not   strictly

representative  of   the  national  population  with  HIV-related

illness, ACSUS  is one  of the few surveys of HIV-related illness

that was conducted in multiple geographic and provider sites.  It

is the  first multiple-site  survey in  which site  selection was

driven primarily by analytical needs rather than by the necessity

of evaluating  particular  programs,  as  was  the  case  in  the

evaluation of  the AIDS/HIV  Service Demonstration Grants Program

conducted for  the Health  Resources and  Services Administration

and the  evaluation of  the Robert Wood Johnson Foundation's AIDS

Health Services Program (Mor, Fleishman, Allen and Piette, 1994).

The ACSUS  geographic and  patient care  sites were  selected  in

order to  ensure diversity  with respect  to ethnicity,  exposure

category, and  source of  payment  such  that  there  will  be  a

significant  number   of  persons  in  each  analytical  cell  of


        Length of  reference period.   One  of the most important

design features  of ACSUS  is the  use of  multiple contacts with

respondents over  an 18-month period.  This substantially adds to


the analytic  utility of  the ACSUS data base by allowing for the

examination of  changes  over  time.    As  more  effective  drug

therapies are  found and  prophylactic treatment of HIV infection

becomes more widespread, length of survival is likely to continue

to increase.   A  number of  critical changes take place over the

course of  the disease  that affect the infected person's ability

to function  and use  of services.   For  example, as the illness

progresses, individuals  may become  unable to  work, thus losing

their means  of financial  support as  well as  their  access  to

private health  insurance.   These changes  have implications for

public  financing   programs  such  as  Medicaid,  which  bear  a

disproportionately large  share of  the cost  of care for persons

with HIV.  In addition, the level of services used and the mix of

services are likely to change with the course of illness; in some

cases, caregiving  arrangements may  need to be supplemented with

formal home care as individuals become sicker.



Berk, M.  L., Horgan,  C., and  Meyers, S.:    The  reporting  of
    stigmatizing conditions:   A  comparison of  proxy and  self-
    reporting.    Journal  of  Economic  and  Social  Measurement
    14:197-205, 1986.

Berk, M.  L., Schur,  C. L.,  and Mohr, P.:  Using survey data to
    estimate prescription drug costs.  Health Affairs, Fall 1990.

Cox, B.,  and Cohen,  S.:   Methodological Issues  of Health Care
    Surveys.  New York:  Marcel Dekker, 1985.

Fowler, F.  J., Jr.:   Survey  Research Methods.   Applied Social
    Research Methods  Series, Vol.  1.    Beverly  Hills:    Sage
    Publications, 1988.

Groves, R.  M.:  Survey Errors and Survey Costs.  New York:  John
    Wiley and Sons, 1989.

Mor,  V.,   Fleishman,  J.A.,  Allen,  S.M.,  and  Piette,  J.D.:
    Networking AIDS  Services.  Ann Arbor:  Health Administration
    Press, 1994.

National  Center   for   Health   Statistics:      Reporting   of
    hospitalization in  the Health  Interview Survey.   Series D,
    No. 4. Washington:  U.S. Government Printing Office, 1961.

Prospective Payment  Assessment Commission:  Technical Appendixes
    to the  Report and  Recommendations to  the Secretary  of the
    U.S.  Department   of  Health   and  Human  Services  by  the
    Prospective Payment Assessment Commission.  Washington, D.C.,


                            3.   DATA STRUCTURE OVERVIEW

                  As described  in Chapter  2,  this  study  involved  data

          collection for  two groups  of HIV-infected  persons:  adult  and

          pediatric.   A public  use volume  has been  created for  each of

          these groups -- ACSUS Public Use Tapes #4 and #5.  Data collected

          for each  group includes  up to six patient interviews, conducted

          once every three months over a period of 18 months, two abstracts

          of  medical   records  data  from  the  provider(s)  the  patient

          indicated as  his  "usual  source  of  care,"  and  billing  data

          collected from  medical providers  identified during  the patient

          interviews over the 18-month period.

                  Confidentiality   issues   prohibit   identification   of

          respondents, where  respondents  include  providers  as  well  as

          patients.   Therefore, all information on the geographic location

          of respondents has been deleted from these tapes.

                  This chapter  of the  public use  documentation describes

          the content of the public use volumes, and provides a description

          of the  structure of  the data  files and their correspondence to

          data  collection   instruments  for  each  group  --  adults  and



          3.1     Content of Public Use Volumes

                  Two public  use volumes have been developed for the ACSUS

          study.   The first  volume contains  data collected for the adult

          sample (Tape  #4), and the second contains data collected for the

          pediatric sample  (Tape #5).   Each  volume is  contained on  a 9

          track tape,  recorded at  6250 bytes  per inch  using the  EBCDIC

          character set.   These  volumes  have  been  recorded  using  IBM

          standard labels.   Each  volume contains four types of data sets:

          fixed length documentation files, fixed length EBCDIC data files,

          fixed length  codebook print  files  corresponding  to  the  data

          files, and  fixed length SAS source statement files, one for each

          data file.

                  Specific information regarding the content of each public

          use volume are provided in the following subsections.

          3.1.1   Adult Public Use Volume (Tape #4)

                  The  adult   public  use   volume  contains  data  files,

          documentation files,  printable codebook  files, and  SAS  source

          code statements  required to  read each  of the  data files.  SAS

          source code  included  consists  of  input  statements,  variable

          labels, and  variable value format statements.  Three major types


          of data  were collected  during the  study:   patient  interview,

          medical record  abstract, and  provider billing data.  These data

          were supplemented  with vital  statistics data which are included

          on the patient characteristics file.

                  The content  of the  adult public use volume (Tape #4) is

          provided in Table 3-1.

          Table 3-1.  Contents of Adult Public Use Volume (Tape #4)

                  Further detail  regarding the  content of  the data files

          and specific instructions for using the codebooks can be found in

          Chapter 5 of this document.

          3.1.2   Pediatric Public Use Volume (Tape #5)

                  The pediatric  public use  volume  contains  data  files,

          documentation files,  printable codebook  files, and  SAS  source

          code statements  required to  read each  of the  data files.  SAS

          source code  included  consists  of  input  statements,  variable

          labels, and  variable value format statements.  Three major types

          of data  were collected  during the  study:   patient  interview,

          medical record  abstract, and  provider billing data.  These data

          were supplemented  with vital  statistics data which are included

          on the patient characteristics file.


                  The content  of the pediatric public use volume (Tape #5)

          is provided in Table 3-2.

          Table 3-2.  Contents of Pediatric Public Use Volume (Tape #5)

                  Further detail  regarding the  content of  the data files

          and specific instructions for using the codebooks can be found in

          Chapter 5 of this document.

          3.2     ACSUS Public Use File Data Structure

                  To facilitate  analytic use of the ACSUS data, and enable

          use of  current microprocessor  technology, we have delivered the

          public use  data in  a relational  structure as  a group of fixed

          length normalized  data files.   Figures 3-1 through 3-3 show the

          relationships between these files and their correspondence to the

          data collection  instruments used.   The  figures depict both the

          longitudinal and  the logical  relationships between the entities

          represented  by   the  study   data  (i.e.,  patient,  service(s)

          utilized, provider billing, and medical record(s)).

                  Three  major  types  of  data  are  represented  by  this

          structure:   patient, medical  record, and  provider billing.   A


          brief description of each data type is presented in the following

          subsections.   Detailed guidance for using these data is provided

          in Chapter 5 of this document.

                  Figure 3-1.  Patient Files

                  Figure 3-2.  Medical Record Abstract Files

                  Figure 3-3.  Provider Billing FIles

          3.2.1   Patient Reported Data

                  As shown  on Figure 3-1, the patient data are composed of

          a  patient   level  file,   interview   time   specific   patient

          questionnaire data for each interview period (Time 1 through Time

          6), and  service utilization  files which  contain data collected

          during all  six rounds  of data  collection.  Service utilization

          files correspond  to distinct service utilization sections in the

          patient questionnaire  and are  not included in the time specific


                  A  patient   will  have   one  record   in  the   patient

          characteristics  file,   uniquely  identified   by  the   patient

          identifier (PATID).   In addition, a patient will have one record

          in each of the Time 1 through Time 6 specific data files (if they


          completed  an  interview  in  each  of  these  periods)  uniquely

          identified by  patient identifier  (PATID),  and  any  number  of

          records in  each of  the service  utilization files,  each record

          representing use  of specific  types of  services over the entire

          18-month study  period.   Patients will  have zero  records in  a

          specific  service   utilization  file  if  they  did  not  report

          receiving that type of care.

                  These service  utilization  records  represent  either  a

          visit or  use of  a specific provider by the patient depending on

          the  information  required  by  the  particular  section  of  the

          questionnaire.     Each  record  is  identified  by  the  patient

          identifier (PATID),  the interview  time  period  (1  through  6)

          (SFORM), a  questionnaire  section  identifier  (SFPART),  and  a

          service utilization sequence number (SSUBREC).  For example, data

          regarding inpatient  hospital stays were collected for each stay,

          and data  regarding use  of a  particular  physician/doctor  were

          collected by provider.

                  Further detail  regarding the  usage of  these data files

          can be found in Chapter 5 of this document.


          3.2.2   Medical Records Abstract Data

                  As shown  on Figure 3-2, the Medical Record Abstract Data

          comprise 4  separate files:   patient, inpatient stay, check-list

          conditions, and  T-Cell reports.   Twice during the study period,

          medical records  data were collected from providers identified by

          patients as  their usual  source of care.  More than one provider

          may have  been identified  as the  usual source  of  care  for  a

          particular patient  and thus  data collected  from more  than one

          provider for one patient.

                  The Patient  Level File  contains  one  record  for  each

          patient containing data derived from all abstracts received for a

          patient.   Each record  on this  file is uniquely identified by a

          patient identifier (PATID).

                  The Inpatient  Stays File  contains information regarding

          inpatient stays  reported by  the usual  source of care providers

          identified by  the patient.   A  record is uniquely identified by

          patient identifier  (PATID) plus provider identifier (USCID) plus

          a record sequence number (USREC02).

                  The  Checklist   Conditions  File   contains  information

          regarding  medical   conditions  commonly  associated  with  HIV-

          infected persons.   A  record on this file is uniquely identified


          by patient  identifier (PATID)  plus provider  (USCID) identifier

          plus a record sequence number (USREC03).

                  The T-Cell  Reports File  contains information abstracted

          from medical  records regarding laboratory tests reporting T-Cell

          counts.   Depending upon  available information,  absolute counts

          and percentages  were recorded  for both  CD4 and  CD8 cells.   A

          record in  this file is uniquely identified by patient identifier

          (PATID) plus  provider identifier  (USCID) plus a record sequence

          number (USREC04).

                  Further  detail  regarding  the  use  of  Medical  Record

          Abstract Data is provided in Chapter 5 of this document.

          3.2.3   Provider Billing Data

                  Provider billing  data were  collected twice  during  the

          conduct of  the study  for  medical  providers  reported  by  the

          patient during  the patient  interviews.  As shown on Figure 3-3,

          four provider  billing data  files  are  provided  which  contain

          billing data  covering the  18-month period  represented  by  the

          patient interviews:   Ambulatory,  Inpatient,  Home  Health,  and

          Pharmacy.  Each record in the Ambulatory, Inpatient, and Pharmacy

          billing data  files contains one record per event:  an ambulatory


          visit, an  inpatient stay, or a prescription medication obtained.

          A record  in these  files can  be uniquely identified by Provider

          Identifier (PROVID) plus Patient Identifier (PATID) plus the Form

          Identifier (PFORM)  plus the  Event Sequence Number (PSUBREC).  A

          record  in   the  Home  Health  billing  data  file  may  contain

          information regarding  more than  one event,  or visit,  during a

          period of time.

                  Further  information   regarding  substantive   usage  is

          presented in Chapter 5 of this document.

          3.2.4   General Analytic Usage Guidance

                  Effective use  of these data require familiarity with the

          ACSUS study,  data collection  instruments and  the structure and

          content of  the data  files.   In conducting  analyses with these

          data we  have found  it useful  to begin with the data collection

          instruments to  identify the questions of interest and associated

          variable names on the data file required to support the analysis.

          Users should  reference the  annotated questionnaires to identify

          the questions  and variable names appropriate for their analyses.

          Once variables  and files  have been  identified, the codebook(s)

          are used  to identify variable values, variable position, length,

          type, and  applicable formats.  Chapter 5 includes an explanation


          of  how  to  use  the  codebooks  and  annotated  questionnaires.

          Appendices A-C contain copies of the annotated questionnaires.

                  Patient and  provider  identifiers  included  on  patient

          reported service utilization files are consistent with those used

          in the  provider billing  data files.   Patient  reported service

          utilization can  be associated with provider reported service and

          associated  billing   data  through   the  patient  and  provider

          identifiers.   In many  cases, the  level  and  type  of  service

          provided are  inconsistent between  the patient-reported data and

          the provider  billing data.   These  inconsistencies are  due  to

          patient  recall   issues,  provider  misidentification,  and  the

          availability of  billing records at the provider for the patient.

          Imputation of  service and  charges were performed in some cases.

          Details regarding situations for which imputations were performed

          are provided in Chapter 6 of this document.



          4.1     Data Preparation

                  Various data  preparation  and  editing  techniques  were

          employed to  ensure accuracy  and consistency  in the ACSUS data.

          These techniques are described in the following sections.

          4.1.1   Range Specifications

                  Acceptable ranges  for all  data items  were defined  and

          computerized  edits  conducted  to  identify  all  items  falling

          outside the  predetermined parameters.  For close-ended items the

          ranges were  determined by the codes available for the responses.

          For open-ended  items,  for  example,  the  out-of-pocket  dollar

          amount paid for a doctor's visit, reasonable ranges were defined.

                  Data items  that failed the range edits were reviewed for

          coding  and   data  entry  errors  and  corrected  as  necessary.

          Following review  by data  preparation and project staff, out-of-

          range values  were retained  if no  coding and  data entry errors

          were found.


          4.1.2   Consistency Checks

                  Consistency or logic checks were conducted to examine the

          relationships between  responses to  ensure  that  they  did  not

          conflict with  one another  or that  the response to one item did

          not make  the response  to another  unlikely.   Logic checks were

          conducted both  within and across the various data files.  Checks

          within files  examined the data for skip patterns and other types

          of logical inconsistencies.

                  Logic checks  across the  ACSUS data  files examined  the

          relationship between  information  reported  during  one  patient

          interview  and  similar  information  gathered  at  a  subsequent

          interview.     Logic  checks   also  inspected   the   data   for

          discrepancies in information across the different data collection

          instruments, for  example, the  medical record  abstract and  the

          patient interview.

                  To the  extent that  either type of logic check uncovered

          coding or data entry errors, appropriate corrections were made to

          the  files.     However,  edit  checks  uncovered  a  variety  of

          inconsistent items  which, following  review  by  project  staff,

          could not  be resolved.   Such inconsistencies remain in the data

          files and are discussed in Section 4.3.


          4.1.3   Frequency Review

                  The frequencies  of responses  to  all  data  items  were

          reviewed to  ensure that  appropriate skip patterns were followed

          and that  the correct number of responses was represented for all

          items.   If a  discrepancy was  discovered, the  problem case was

          identified and  hard copy of the record was reviewed to determine

          the  appropriate   response.    If  the  hard  copy  revealed  no

          additional information, the item was coded as "not ascertained".

          4.2     Types of Variables

                  Data  items   in  the  ACSUS  public  use  tapes  can  be

          classified into  four major categories: edited, derived, imputed,

          and flag  variables.   The sections  that  follow  describe  each

          variable type.

          4.2.1   Edited Variables

                  The majority  of data  items on  the ACSUS data files are

          original variables  which contain  information  corresponding  to

          individual items or questions on the data collection instruments.


          All original  variables have been edited for skip pattern errors,

          range outliers, and logic checks, as described in Section 4.1 and

          have therefore been categorized as edited variables.

          4.2.2   Derived Variables

                  A limited number of derived variables are included on the

          files to  assist the  user in  analyzing the data.  A variable is

          defined as derived if it is constructed from one or more original

          data items.   Because the analytic needs of each user are unique,

          most analysts  will choose  to derive  their own  set of analysis

          variables.   Accordingly, derived  data items  on these tapes are

          restricted to  either items  the user  can not  easily  construct

          without extensive  knowledge of  the  data  files,  or  to  items

          thought to be of interest to a majority of users.

          4.2.3   Imputed Variables

                  In ACSUS,  as in most surveys, the responses to some data

          items were not obtained.  For this study, missing item imputation

          was conducted  for a  limited set  of variables  on the  provider

          billing survey  files.  In addition, entire service event records

          were imputed  on the  provider billing survey files.  As a result


          of these imputation processes, several new variables were created

          and reside  on the  provider billing survey files.  A description

          of ACSUS imputation procedures is provided in Chapter 6.

          4.2.4   Flag Variables

                  Three types  of flag  variables are  present in the ACSUS

          public use  data files.  First, a series of imputation flags were

          created to  enable users  to identify  imputed values and events.

          Imputation flags  are discussed more fully in Chapter 6.  Second,

          flag variables  were created to identify and to alert the user to

          analytic issues  that might otherwise be overlooked.  Third, flag

          variables are  used to  highlight discrepant  data elements.  The

          second and  third type of flag variables are discussed in Chapter

          5 in  the sections  corresponding to  the specific  data files in

          which they reside.

          4.3     Data Anomalies

                  The purpose  of this  section is  to assist  the user  in

          making informed  decisions about  conducting  analyses  with  the

          ACSUS public  use data  tapes.   It is  intended to  bring to the

          user's attention  certain data  considerations and  anomalies  so


          that the user may take them into account when analyzing the ACSUS


                  Although the data have been subjected to rigorous editing

          processes, not  all data  inconsistencies or  problems  could  be

          resolved and  therefore some inconsistencies remain in the files.

          Remaining  inconsistencies   occur  primarily  for  two  reasons.

          First, much of the data from the ACSUS study is self-reported and

          therefore subject to problems of recall error and so on.  Second,

          the ACSUS study data draw upon multiple sources to obtain similar

          kinds  of  data  and  the  sources  sometimes  report  discrepant

          information.     Accordingly,  to  conduct  the  most  meaningful

          analysis for  their purpose,  users should  read this section and

          thoroughly  familiarize   themselves  with   the   various   data

          instruments prior to undertaking any analysis.

          4.3.1   Self-Reported Marital Status

                  In each  of the  six adult  patient interviews,  subjects

          were asked  to indicate  their marital  status.   Review of self-

          reported  marital  status  across  time  uncovered  subjects  who

          indicated during  the current  interview  that  they  were  never

          married but  in a  previous  questionnaire  they  reported  their

          marital status as divorced, separated, or married.


                  Investigation of  these cases found no recurring patterns

          --  inconsistent   cases  included   male  and  female  subjects,

          respondents with  and without  children, and  subject  and  proxy

          interviews.   Inconsistent marital  status data  were not removed

          from  the  tapes.    In  view  of  these  inconsistencies,  users

          interested in  incorporating marital status in their analyses may

          want to  supplement or  cross-reference marital  status with  the

          series of questions on household composition in the time specific

          patient characteristics files.

          4.3.2   Utilization Information Reported During the Wrong

                  Reference Period

                  The patient  files contain  self-reported information  on

          inpatient stays,  nursing home stays, and dental visits.  In some

          instances, respondents  reported utilization  for these  services

          that occurred  in the  period  of  time  covered  by  a  previous

          reference period.

                  Events  were   examined  and   deleted  if  found  to  be

          duplicates of  previously reported events.  If they were found to

          be new  events  reported  for  the  first  time,  the  event  was

          retained.   The  variable  which  indicates  the  source  of  the


          information, SFORM,  was coded to reflect the interview period in

          which the  information was  collected; SFORM  was not  changed to

          represent the period in which the utilization actually occurred.

                  This data  anomaly may  have a  bearing on  analyses that

          link patient-reported  utilization to the specific time period in

          which they  occur.  Users interested in such analyses should link

          utilization by  comparing the  dates of  service to the reference

          dates for each interview period rather the selecting events based

          on SFORM.

          4.3.3   Dates of Service Utilization After the Subject's Date of


                  Editing procedures  included a  series of  logical  edits

          which compared  all dates  of service utilization in the provider

          billing data  files for  deceased individuals  to  the  subject's

          reported date  of death.   These  edits identified  all unimputed

          records where  the date  of service  was after the date of death.

          Review of  these cases  found some  coding or  data entry  errors

          which were  corrected.   Investigation by  project staff seems to

          indicate that  the remaining  inconsistencies are  a function  of

          provider billing  patterns.   It appears  that some  health  care

          providers generated  bills following  the death  of a subject for


          services rendered  prior to  death.   However, the  bills contain

          only the  billing date  and do  not include  the actual  dates of

          service.   Analysts may  choose to  remove all  charge  data  for

          events with  associated dates occurring after a subject's date of


                  Due to  the imputation  methods which were used to impute

          events on  the provider  billing files, there are also situations

          where an  imputed event has an associated date of service that is

          after a  subject's date of death.  This occurs because the entire

          record from  the donor  event, including the date of service, was

          imputed to the recipient event.

          4.3.4   Loss of Medicare Coverage

                  Editing  procedures   uncovered  cases  where  a  subject

          reported Medicare  in the  current questionnaire  but  failed  to

          report the  coverage in  a subsequent  interview.  Unlike private

          insurance or  some forms  of public  insurance where it is likely

          that coverage may fluctuate over time, once an individual becomes

          eligible for  Medicare benefits  they should  continue to receive

          this coverage.   Review  of these  cases found  several instances

          where respondents  confused Medicare  and Medicaid  coverage.  In

          this  situation,   the  data   were  corrected   to  reflect  the


          appropriate insurance  coverage.   However, the  user  should  be

          aware that  a small  number of these inconsistencies could not be

          resolved and remain in the data files.

          4.3.5   Self-Reported Date of Positive HIV Test

                  In the  screener questionnaire,  subjects were  asked  to

          report whether they had tested positive for the HIV virus and the

          date of the test.  Several subjects reported HIV test dates which

          are clearly  incorrect  (i.e.,  prior  to  1986).    Because  the

          screener data  reflects self-reported information, these outliers

          have not  been removed.   However,  the user should be aware that

          these anomalies  remain in  the data  files and  that the medical

          record abstract  may be  a better  source for  determining an HIV

          positive diagnosis.

          4.3.6   Self-Reported Conditions of HIV Illness

                  Subjects were  asked in  the Time  1, Time  5, and Time 6

          questionnaires to  indicate whether  they had  been told they had

          any of a list of conditions commonly associated with HIV illness.

          Review of  these data  items found  numerous instances  where the

          subject's response  in Time  5 or  Time 6  contradicted what  was


          reported in  an earlier  questionnaire.    These  inconsistencies

          remain in  the files  and users interested in conducting analysis

          with those  data are  encouraged  to  cross-reference  the  self-

          reported conditions  with the clinical data in the medical record

          abstract files.


                          AND ANNOTATED QUESTIONNAIRES

                  As discussed in Chapter 3, Data Structure Overview, ACSUS

          Public Use  Tape #4  contains complete study data for adult ACSUS

          subjects and  Tape #5  includes similar information for pediatric

          study subjects.   The  study data  have been organized into three

          major components:  patient data  files, medical  record  abstract

          files, and  provider billing  survey data  files.  Each component

          contains multiple data files.

                  The following  sections provide  a general  discussion of

          the type  of information  contained on each set of files followed

          by detailed descriptions of  derived, imputed, and flag variables

          specific to  each data  set.   Although the  adult and  pediatric

          study data  are on  separate tapes,  the two tapes are structured

          similarly with  only slight  variations  between  the  adult  and

          pediatric versions.   Unless  otherwise noted, the information in

          this chapter is applicable to both adult and pediatric files.

          5.1     Patient Data Files

                  The patient  data files contain self-reported information

          collected for  each of  the six  interview periods  for which  an


          interview was  completed  and  several  items  collected  from  a

          screener questionnaire completed at the time of study enrollment.

          They also  contain selected  information from other sources, such

          as death certificates.

                  The patient  files are classified into two types: patient

          level and  service level.   A  series of  14 files (7 adult and 7

          pediatric) constitute  the complete  set of  patient level files.

          There is  an overall  patient characteristics  file and a patient

          level file  corresponding to  each of  the six interview periods.

          The service level files are a series of 29 files (15 adult and 14

          pediatric).  Each service level file contains complete data for a

          particular  type   of  health  care  service,  such  as  hospital

          inpatient stays, collected across the six patient interviews.

          5.1.1   Patient Level Files

 Overall Patient Characteristics File

                  The overall  patient level  adult file  contains data for

          the 5,898 adult subjects who completed the screener questionnaire

          and were  eligible to  participate in  the study.   This includes

          2,327 subjects  who  were  sampled,  of  which  2,050  agreed  to

          participate.  The pediatric version of this file contains similar


          information for  the  224  eligible  pediatric  subjects.    This

          includes 160  subjects who  were sampled,  of which 146 agreed to


                  The overall  patient characteristics  files include  data

          items  in   four  major   categories:  sociodemographics,  survey

          administration,  clinical   and   vital   status,   and   service

          utilization.  These four categories are described below.

                  Sociodemographic  Variables:      The   overall   patient
                  characteristics  file   contains  four   sociodemographic
                  variables:  gender,  race/ethnicity,  age,  and  mode  of

                      Gender (SEX):   The subject's gender was collected in
                      the screener  and medical  abstract instruments.  The
                      variable SEX is a composite of two original variables
                      edited for inconsistencies across these two sources.

                      Race/Ethnicity (RACE):  Information on race/ethnicity
                      was collected  in the  screener.   RACE is  a recoded
                      version of  the  original  variable.    It  has  been
                      collapsed  into  fewer  categories  for  purposes  of

                      Age at  Start of  Study (AGE):   This  is  a  derived
                      variable which  reflects the  age of  the subject  in
                      years  as   of  March  1,  1991.    For  purposes  of
                      confidentiality, the  small number  of adult subjects
                      who were 60 years of age or older at the start of the
                      study  have  been  grouped  into  one  age  category.
                      Similarly, adult  subjects 15 through 19 years of age
                      have been  grouped into  one age category.  Pediatric
                      subjects who  were 10  through 12  years of  age have
                      been grouped together.

                      Mode  of   Exposure  (EXPROUTE):    EXPROUTE  is  the
                      suspected mode  of  exposure  to  the  HIV  virus  as
                      reported in  the subject's  medical record.   Because
                      medical  records   were   collected   from   multiple
                      providers, more  than one  mode of  exposure may have
                      been reported  for an  individual.  If multiple modes


                      were reported,  EXPROUTE  is  coded  to  reflect  all

                      EXPROUTE  for  the  3,848  adult  subjects  who  were
                      screened for  participation  in  the  study  but  not
                      enrolled is  defined using  self-reported information
                      from the  screener instrument.  Screener data is also
                      used to  define EXPROUTE for the small group of adult
                      subjects who  were enrolled in the study but for whom
                      no medical abstract data were collected.

                      The pediatric  screener instrument  did  not  collect
                      information   on    suspected   mode   of   exposure.
                      Therefore,  EXPROUTE   is  missing  for  78  screened
                      pediatric subjects who were not enrolled.

                  Survey   Administration    Variables   (T1_STAT   through
                  T6_STAT):   There are  a series  of six  interview status
                  variables, one corresponding to each interview period, on
                  the  overall   patient  characteristics   file.     These
                  variables  indicate   whether  a  patient  interview  was
                  completed for  that time period and whether the interview
                  that was  completed  was  a  proxy  respondent.    If  an
                  interview  was   not  completed,   the  status  variables
                  indicate  the   reason  for  nonresponse  (e.g.,  subject
                  deceased  with   no  proxy  available,  subject  refused,
                  subject could  not be  located, etc.).   Status codes are
                  not assigned for interview periods following the death of
                  a subject.   Status  codes are  blank also  when a  final
                  nonresponse  code   (e.g.,  subject   unlocateable)   was
                  assigned at the previous interview.

                      The interview  status variables  were  generated  for
                      survey administrative  purposes and  are intended  to
                      reflect a  subject's status at the time of contact by
                      the interviewer.   As  a result,  there are  a  small
                      number of  subject's in the adult files with a Time 6
                      interview status  that indicates  the  interview  was
                      completed by  proxy  and  the  subject  was  deceased
                      (T6_STAT is  coded as  DD), but  for whom  the  vital
                      status (VITSTAT)  indicates  the  subject  is  alive.
                      This  situation   occurs  because   vital  status  is
                      reported as  of the end of the study period (8/31/92)
                      but efforts  to conduct  Time  6  patient  interviews
                      continued through November 1992.

                  Clinical/Vital  Status   Variables:     Several   derived
                  variables related  to the  subject's clinical  and  vital
                  status are  present on  the overall  patient level  file.
                  These are described below.


                      Illness Stage  (ILLSTAGE):  This variable was derived
                 to  indicate   the  stage   of  an  individual's  HIV
                      infection at  the time the screener was administered.
                      It is based on self-reported information; it does not
                      reflect information  from subjects'  medical records.
                      This variable  was intended to stratify subjects into
                      approximate illness categories for sampling purposes.
                      It was  not an effort to apply the full CDC AIDS case
                      definition.   Analysts  interested  in  applying  the
                      complete CDC  classification scheme  should construct
                      their own  definitions using  data from  the  medical
                      record abstract.

                      Subjects were grouped into three disease stages based
                      upon their responses regarding the type of conditions
                      or symptoms  they  had  experienced.    Persons  were
                      classified as  having AIDS if they had been diagnosed
                      with any  of the  following: PCP,  Kaposi's  sarcoma,
                      lymphoma,     wasting     syndrome,     tuberculosis,
                      cryptococcosis,         cytomegalovirus,         MAI,
                      cryptosporidosis,      dementia,      histoplasmosis,
                      toxoplasmosis, isosporiasis,  leukoencephalopathy, or
                      salmonellosis.     In  addition,   some   individuals
                      voluntarily reported  that they  had  been  diagnosed
                      with AIDS  but had not been diagnosed with any of the
                      listed  conditions.     These   persons   were   also
                      classified as having AIDS.

                      Subjects were  classified as  being HIV-ill  if  they
                      reported no AIDS qualifying conditions but did report
                      one of  the  following:  swollen  glands,  persistent
                      fever, diarrhea,  weight loss, candidiasis, or herpes

                      Subjects who  reported no  AIDS or HIV-ill qualifying
                      conditions were considered asymptomatic and those who
                      did not  know whether  they had  any  of  the  listed
                      conditions were classified as unknown.

                      Vital Status  (VITSTAT):   This variable  indicates a
                      subject's vital status (alive, dead) as of August 31,

                      Date Last Known Alive (VSLIVEMO, VSLIVEDY, VSLIVEYR):
                      These variables  are coded  with the most recent date
                      (on or  before 8/31/92) that the study data confirmed
                      that a  subject was alive.  These variables are blank
                      if a subject died during the study period.

                      Date of Death (DODMO, DODDY, DODYR):  These variables
                      are coded  with the  date the  subject  was  reported


                      deceased.   They are  blank if  the subject  was  not
                 reported deceased.

                      Source of  Death Date  (DODSOURC):   Three sources of
                      information were  used to  determine whether or not a
                      subject was  deceased.   A death certificate was used
                      as the  primary source  if one  was obtained  for the
                      subject.   The secondary  source of  this information
                      was the  medical record  abstract.    Finally,  dates
                      reported  by   a  proxy   respondent  were   used  if
                      information was  not  received  from  the  other  two

                  Service Utilization  Variables:  Aggregated patient level
                  self-reported counts  of the  major types  of health care
                  services utilized  were calculated for the 18-month study
                  period.  Specific variables are described below.

                      Total Inpatient Admissions (ADMTOT):  This is a count
                      of the  total  number  of  hospital  inpatient  stays
                      reported by  the patient  across  the  six  interview
                      periods.   In situations where a hospital stay begins
                      in one  interview period  and continues into the next
                      period, it is counted as one stay.

                      Total Inpatient  Nights (IPNGTTOT):    This  variable
                      sums the  total number of nights a subject spent in a
                      hospital during  the 18-month  study  period.    Some
                      subjects may  have reported the number of nights they
                      were in the hospital for some, but not all, inpatient
                      stays.   IPNGTTOT sums  the number of nights for only
                      those stays with complete information.

                      Total Emergency  Room Visits  (ERVSTOT):   ERVSTOT is
                      the total  number  of  visits  the  subject  reported
                      making to  hospital emergency  rooms during the study

                      Total Hospital  Clinic  Visits  (HCVSTOT):    HCVSTOT
                      represents the  total number  of self-reported visits
                      by a  subject to  hospital clinics  during the  study

                      Total Other  Clinic Visits (OCVSTOT):  OCVSTOT is the
                      number of visits the subject reported making to other
                      clinics during the study period.

                      Total Private  Physician Visits  (MDVSTOT):   MDVSTOT
                      represents the  total number  of self-reported visits
                      by a  subject to  private physicians during the study


                      Total  Ambulatory   Visits  (AMBVSTOT):      AMBVSTOT
                      represents the  sum of all visits to hospital clinics
                      (HCVSTOT),  other   clinics  (OCVSTOT),  and  private
                      physicians (MDVSTOT) reported by a subject during the
                      study  period.     It  excludes  visits  to  hospital
                      emergency rooms.

                      Total Observation  Days (TOBSDAYS):    This  variable
                      reflects the total number of days during the 18-month
                      study period  that the  subject was  observed and for
                      which patient  interview information  was  collected.
                      It does  not include  any  gaps  in  coverage  during
                      periods of  ineligibility  (i.e.,  periods  when  the
                      respondent was out of the country or in jail).  Total
                      observation days  can be used to standardize 18-month
                      utilization counts  to account  for variation  in the
                      length of observation periods across individuals.


                          Case 1:   Time  1  and  Time  2  interviews  were
                          completed for  this individual.   The subject was
                          in jail for the entire Time 3 interview period so
                          no interview  was completed.  The subject died in
                          Time 4  and an  interview was  completed by proxy

                              Time 1:  covers 3/1/91-5/7/91; 67 observation
                              Time   2:    covers    5/7/91-8/21/91;    106
                              observation days
                              Time 3:  not completed, subject ineligible; 0
                              observation days
                              Time 4:  covers 1/27/92-3/4/92  (date subject
                                 36 observation days

                              Total observation days for this individual is
                              equal to  the sum of observation days for the
                              individual interview periods (209).

                          Case 2:  Interviews were completed in each of the
                          six periods.   The  person was out of the country
                          for 5 days during the second interview period and
                          for 9 days during the third interview period.

                              Time   1:    covers    3/1/91-6/30/91;    121
                              observation days
                              Time 2: covers 6/30/91-10/6/91 with 5 day gap


                                 8/14/91-8/18/91; 93 observation days

                              Time 3: covers 10/6/91-1/12/92 with 9 day gap
                              from 12/22/91-12/30/91; 89 observation days
                              Time   4:      covers   1/12/92-4/15/92;   93
                              observation days
                              Time   5:       covers   4/15/92-7/4/92;   79
                              observation days
                              Time   6:       covers   7/4/92-8/31/92;   58
                              observation days

                              Total observation  days for  this  individual
                              equals 533.

 Time Specific Patient Level Files

                  There are six adult and six pediatric patient level files

          which contain information specific to the patient for each of the

          interview periods.   These  files include  all patient  interview

          data contained  on nonrepeating records.  There is some variation

          between the adult and pediatric files in the types of information

          they contain.   Minor differences also occur across the six times

          for both  adults and  pediatrics.  Table 5-1 outlines the general

          categories of  information  contained  on  the  adult  files  and

          indicates the  presence (denoted  by an  X) or  absence  of  such

          information in  each of  the time  specific  files.    Table  5-2

          presents similar information for the pediatric files.

          Table 5-1.     Categories of  Information on  Adult Time Specific


          Table 5-2.     Categories  of   Information  on   Pediatric  Time
                  Specific Files

                  Derived and  flag variables  in the time specific patient

          level  files   fall   into   two   general   categories:   survey

          administration and  service utilization.   In general, the naming

          convention for  these variables is to use a base name followed by

          an integer  corresponding to the particular time specific file in

          which the  variable resides.   For  example, ADM1  resides on the

          Time 1  file, ADM2  on the  Time 2  file, and  so on.  Individual

          variables are described below.

                  Time  Specific  Interview  Status  (T1_STAT  -  T6_STAT):
                  Unlike the overall patient characteristics file, the time
                  specific files  contain records  for only  those  persons
                  with a completed interview.  Accordingly, there are three
                  valid  status  codes  in  these  files.    These  include
                  interview completed  with  subject;  interview  completed
                  with proxy,  subject living; and interview completed with
                  proxy, subject deceased.

                  Time Gap  Flag (GAP1FLAG - GAP6FLAG):  Beginning with the
                  Time 4 interview, adult and pediatric subjects were asked
                  to indicate  whether they  had travelled  outside of  the
                  country for  a period  of  2  weeks  or  longer.    Adult
                  subjects were also requested to provide information about
                  periods of  incarceration exceeding  2 weeks.  In Time 4,
                  subjects  provided information on all time gaps occurring
                  since  the   start  of  the  study  period.    Subsequent
                  interviews gathered  information on  only those time gaps
                  occurring during the current interview period.

                  For study  purposes, subjects  were considered ineligible
                  for periods  of travel  or  incarceration  two  weeks  or
                  longer  in   duration.     Information  on   health  care
                  utilization was  not collected  for these  periods.   The
                  time  gap   flag  was  created  to  assist  the  user  in


                  identifying  interview   periods  in  which  a  time  gap
             occurred.  If a gap occurred in a particular time period,
                  the flag  variable is  set to  1.  Otherwise, the flag is
                  set to blank.

                  Number of  Inpatient Admissions,  Per Interview  (ADM1  -
                  ADM6):   This variable  contains the  number of inpatient
                  hospital admissions reported by a subject in the specific
                  interview period.   Inpatient  admissions which  begin in
                  one interview  period and  continue into  the next period
                  are counted  as a  stay in  each period.   Therefore,  if
                  users want  to determine  the total  number of  inpatient
                  admissions for a subject during the 18-month study period
                  they should  use the  variable created  for this  purpose
                  (ADMTOT), and  not attempt  to  sum  ADM1  through  ADM6.
                  Summing ADM1 through ADM6 may result in a slightly higher
                  number of total admissions.

                  Number of  Inpatient  Nights,  Per  Interview  (IPNGT1  -
                  IPNGT6):   IPNGT1 through  IPNGT6 represent the number of
                  nights the  subject was in the hospital in each interview
                  period.  For each hospital admission, subjects were asked
                  to indicate  how many  nights they spent in the hospital.
                  The IPNGT  variables are calculated by summing the number
                  of nights  across all  stays reported  by a  patient in a
                  particular interview  period.  If an inpatient stay began
                  during the  current interview  period and  continued into
                  the next period, only the portion of the stay which falls
                  in the current period is counted.

                  Number of  Emergency Room  Visits, Per Interview (ERVS1 -
                  ERVS6):   This variable  contains the  number of visits a
                  subject made to emergency rooms in the specific interview

                  Hospital Clinic  Visits, Per  Interview (HCVS1  - HCVS6):
                  HCVS1 through  HCVS6 represent  the number  of  visits  a
                  subject  made  to  hospital  clinics  in  each  interview

                  Other Clinic Visits, Per Interview (OCVS1 - OCVS6):  This
                  variable indicates  the number  of visits  an  individual
                  made to other clinics in the specific interview period.

                  Private Physician  Visits, Per Interview (MDVS1 - MDVS6):
                  The number  of private physician visits a subject made in
                  each interview  period  is  contained  in  MDVS1  through

                  Number of  Ambulatory Visits,  Per  Interview  (AMBVS1  -
                  AMBVS6):  This variable indicates the number of visits to


                  hospital clinics,  other clinics,  and private physicians
             made by the subject in the specific interview period.  It
                  does not include visits to emergency rooms.

                  Interview   Specific   Observation   Days   (OBSDAYS1   -
                  OBSDAYS6):  This variable indicates the number of days in
                  a given  interview period that a subject was observed and
                  for which  patient information  was  collected.    It  is
                  calculated by  counting the  number of  days elapsed from
                  the reference  period begin  date through  the  reference
                  period  end   date,  minus   any  days  the  subject  was
                  ineligible during  this period.   Observation days can be
                  used to  standardize utilization  counts by  taking  into
                  account variation  in the  length of  observation periods
                  across individuals.


                      Case 1:

                      Time 1 Begin Reference Date - 3/1/91
                      Time 1 End Reference Date - 5/20/91
                      Periods of Ineligibility - none
                      Time 1 Observation Days (OBSDAYS1) - 80 days

                      Case 2:

                      Time 3 Begin Reference Date - 10/20/91
                      Time 3 End Reference Date - 1/5/92
                      Periods of  Ineligibility - 11/2/91-11/20/91, subject
                      in jail
                      Time 3 Observation Days (OBSDAYS3) - 57 days

          5.1.2   Service Utilization Files

                  Self-reported information  on the  amount  and  types  of

          health care  services received  reside on the service utilization

          files as  a series  of repeating records.  Information on out-of-

          pocket payments  and other  sources of payment for these services

          is also  available on  these files.  The service data are divided


          into separate  files by  type of  service.   Each  file  includes

          complete self-reported  information on  a particular  service for

          the entire  study period  for either adult or pediatric subjects.

          For example, there is an adult inpatient hospital stay file which

          contains all information on this type of service collected during

          the six patient interviews.

                  With the  exception  of  the  dental  file,  the  service

          utilization files are structured such that similar data items are

          assigned the  same variable  names across the various files.  For

          example, the  variable SRE_DOL  is always  used to  indicate  the

          dollar amount  a subject  paid  out  of  pocket  for  a  service,

          regardless of  the type  of service.  This format makes it easier

          for the user to aggregate information about all services received

          by a particular patient.

                  Derived  and   flag  variables  present  on  the  service
                  utilization files are described below.

                  Source Questionnaire  (SFORM):   This variable  indicates
                  the particular  interview period  in which  the data were
                  collected and  whether the questionnaire was for an adult
                  or pediatric  case.  For example, if SFORM has a value of
                  A, this  indicates that  the information was collected in
                  the Time 1 questionnaire, and was an adult case.

                  Type of  Service Utilization (SFPART):  SFPART is used to
                  distinguish the type of service utilized.

                  Unique Stay  Flag (SHSTAYFG):   Each  reported  inpatient
                  hospital and  nursing home stay is assigned a unique stay
                  number using the variable SHSTAYFG.  The stay numbers are
                  unique throughout  all six  interviews but  they are  not
                  necessarily  assigned   in  order  from  the  first  stay
                  reported to the last.


                  This variable  is also  assigned to  all  events  in  the
                  separately billing  doctor (SBD)  files.    A  separately
                  billing doctor  or "separate  billing doctor"  is one who
                  provides care to a patient during an inpatient or nursing
                  home stay  but bills  the patient separately for services
                  rendered.   Anesthesiologists and  radiologists  commonly
                  fall into  this category.   Although the majority of SBDs
                  are medical doctors, they may also include other types of
                  medical practitioners.   Inclusion  of SHSTAYFG  on these
                  files enables  the user  to link  an  SBD  event  to  the
                  inpatient hospital  or nursing  home stay  in  which  the
                  service was provided.

                  For example:

                      Subject  has  3  inpatient  stays  during  the  study

                          The first stay is assigned an SHSTAYFG of 7.
                          The second stay is assigned an SHSTAYFG of 8.
                          The third stay is assigned an SHSTAYFG of 108.

                      Subject has  2 separately billing doctor (SBD) events
                      during the study period.

                          The first SBD event is assigned an SHSTAYFG of 8.
                          The second  SBD event  is assigned an SHSTAYFG of

                      Using the  values in  SHSTAYFG the  user can link the
                      first SBD  event to the second inpatient stay and the
                      second SBD event to the third inpatient stay.

                  Continuous Stay  Flag (ICTMFLG):   This  variable can  be
                  used  in   conjunction  with  SHSTAYFG  to  identify  the
                  components of  stays that  begin in  one interview period
                  and continue  into the next period.  ICTMFLG indicates in
                  which interview periods the components of the stay can be
                  found.    If  useful  to  a  particular  analysis,  these
                  variables enable  the user  to  aggregate  the  component
                  records into one stay.

                  For example:

                      The subject  was interviewed for the Time 1 interview
                      while in the hospital and was subsequently discharged
                      during the  Time 2  interview period.  Therefore, the
                      subject has  a stay  that spans  the first and second
                      interview period.   This  stay is  represented in the
                      inpatient stay file as follows:


                          First component record:

                          Provider ID  for the  Stay:  Coded 3198671.  This
                          represents the  randomly assigned  sequential  ID
                          number for the hospital where the stay occurred.

                          Date of Discharge: Coded 95/95/95 which indicates
                          the subject was still in the hospital at the time
                          of interview.

                          Number of  Nights in  Hospital: Coded as 5.  This
                          represents the number of nights the subject spent
                          in the  hospital up until they completed the Time
                          1 interview.

                          Unique Stay Flag: Coded as 15.

                          Continuous Stay  Flag: Coded  as AB.  A indicates
                          that one  component of  the stay was collected in
                          the  Time  1  adult  questionnaire  while  the  B
                          indicates that the second component was collected
                          in  the   Time  2   questionnaire.    A  complete
                          explanation of  these codes  is included  in  the
                          inpatient stay file codebook.

                      Second component record:

                          Provider ID for the Stay:  Coded 3198671.

                          Date of  Discharge: Coded 8/15/91 which indicates
                          the date the subject was actually discharged.

                          Number of  Nights in Hospital: Coded as 10.  This
                          represents the number of nights the subject spent
                          in the hospital since the Time 1 interview.

                          Unique Stay Flag: Coded as 15.

                          Continuous Stay Flag: Coded as AB.

                      Therefore, if users are interested in identifying all
                      components of  a continuous  stay they  should  first
                      look at  the continuous  stay flag  (ICTMFG)  to  see
                      whether it  is set.   If  ICTMFG is  not  set  (i.e.,
                      blank) then  the stay  is not  a continuous stay.  If
                      ICTMFG is  set, users  should examine  the values  in
                      ICTMFG to  determine in  which interview  periods the
                      remaining component  pieces are  located.   Then, the
                      users should  search through  all inpatient stays for
                      that  subject   that  fall   within  the   identified


                      interview periods  and that have the same unique stay
                 numbers (SHSTAYFG) as the initial component.

                  Overlapping Inpatient Hospital and Nursing Home Stay Flag
                  (ANOSTYF1, ANOSTYF2):   These  flags are  used to  denote
                  situations where  subjects were  admitted to  a  hospital
                  from  a  nursing  home,  residential  care  facility,  or
                  hospice.   In these  cases, subjects  were not discharged
                  from the  long-term care  facility and their bed was held
                  for them  until they  returned from  the  hospital  stay.
                  This results in hospital stay dates that overlap with the
                  dates of stay in the nursing home or other long-term care
                  facility.   This flag  variable allows users to take this
                  overlap into account if appropriate for their analysis.

                  ANOSTYF1 and  ANOSTYF2 are  set  only  for  hospital  and
                  nursing home  stays that overlap.  The values in ANOSTYF1
                  and ANOSTYF2  correspond to  the  unique  stay  number(s)
                  (SHSTAYFG) of  the stays  with which  it  overlaps.    In
                  situations where  one nursing home stay overlaps with two
                  hospital stays, both ANOSTYF1 and ANOSTYF2 are set.

                  For example:

                      Case 1: Hospital stay number 43 overlaps with nursing
                      home stay number 111.  ANOSTYF1 for the hospital stay
                      record is  set to  111 to  indicate that  it overlaps
                      with nursing  home stay 111.  Similarly, ANOSTYF1 for
                      the nursing home stay record is set to 43 to indicate
                      that it overlaps with hospital stay 43.

                      Case 2:  Hospital stays number 75 and 81 overlap with
                      nursing home  stay number  26.  ANOSTYF1 is set to 26
                      for both  hospital stay records to indicate that they
                      overlap with  nursing home  stay 26.  ANOSTYF2 is set
                      to blank  for both  hospital stay  records.   On  the
                      nursing home  record,  ANOSTYF1  is  set  to  75  and
                      ANOSTYF2 is set to 81.

                  Discrepant Source  of Payment  Flag (INSURFLG):   In each
                  interview period  subjects were  asked to  indicate  what
                  type of  medical insurance  coverage they  had during the
                  reference period.   Subjects  were also asked to indicate
                  the source  of payment  for each  episode of medical care
                  they received.  Editing procedures found situations where
                  the subject  reported that  a specific  type of insurance
                  covered a particular event but failed to report that type
                  of coverage  in the  overall set  of questions on medical
                  insurance.   Following review  of these  cases for coding
                  and data entry errors, some discrepancies remained in the
                  data.  INSURFLG is used to highlight these discrepancies.


                  INSURFLG is  set to  1  on  the  service  record  if  the
             reported source of payment for that particular service is
                  inconsistent with  the type of medical insurance reported
                  at the  overall patient level for the interview period in
                  which the service was received.

          5.2     Medical Record Abstract Data Files

          5.2.1   Overview

                  The medical abstract data files contain selected clinical

          information abstracted from study subjects' medical records.  The

          ACSUS study  design included  the medical  record data collection

          component for  the purpose  of  confirming  specific  HIV-related

          conditions and  diseases to  be used  in determining the stage of

          subjects' HIV  disease.   Therefore, these  files are intended to

          provide the  user with  information about  a  subject's  clinical

          history and  are not  an appropriate source for obtaining patient

          level utilization  counts.   Users interested  in  such  analyses

          should use  the  patient  interview  or  provider  billing  files


                  During three  of six  patient interviews,  subjects  were

          asked to  indicate their  usual source of medical care.  Subjects

          were asked  to sign  permission forms authorizing study personnel

          to contact  the named  providers for purposes of obtaining access

          to subjects'  medical records.  If a subject reported having more


          than one  usual source  of care,  an attempt  was made  to obtain

          medical records  from the multiple sources of care.  If a subject

          did not  report a  usual source  of care,  an attempt was made to

          obtain the  patient's medical record from the provider where they

          were sampled  into the  study.  The data contained in these files

          reflect information abstracted from the subjects' medical records

          and may  represent data  collected from  more  than  one  medical


                  Although  the   ACSUS  patient  interviews  and  provider

          billing survey  collected data  for the 18-month period beginning

          March 1,  1991, and  ending August  31, 1992, medical record data

          were  collected   for  a  broader  interval  of  time.    Medical

          abstractors were  instructed to  collect clinical information for

          health care services provided during the period beginning January

          1, 1990,  and ending  August 31,  1992.  In addition, abstractors

          went as far back in the medical record as necessary to confirm an

          AIDS diagnosis or for evidence of a positive HIV serostatus.

                  Four separate  data files  make up  the complete  set  of

          medical record  abstract files.   The  first of  these files is a

          patient  record  file  which  is  a  compilation  of  information

          collected from  all usual  sources of  care which has been edited

          across these  multiple data  sources.  The three remaining files,

          the inpatient  stay, check-list  conditions  and  T-cell  reports


          files,  contain  whatever  information  was  collected  from  the

          individual usual  sources of  care.   Because these data have not

          been compiled  into one  record it is possible that two different

          usual sources  of care  may report  similar information  about  a


          5.2.2   Patient Level File

                  The medical  abstract files  contain  one  patient  level

          record.   This record includes data items that were derived using

          information drawn  from all  abstracts received  for  a  patient.

          Specific variables and how they were derived are described below.

                  Record Review  Period (URMOF, URDYF, URYRF, URMOL, URDYL,
                  URYRL):  This set of date variables reflects the earliest
                  and latest medical record reviewed for an individual.  It
                  does  not   reflect  gaps   in  medical  record  coverage
                  occurring between  these dates.  The review period for an
                  individual may  span as  wide a  period as 1/1/90 through
                  8/31/92 or  some portion of this period.  Factors such as
                  the completeness  of a subject's medical record, provider
                  response rate, and study subject survival time affect the
                  length of the review period.  For example:

                      Case 1:   Medical  abstracts for  this  subject  were
                      received from  three providers covering the following
                      time periods:

                          Provider 1: 2/8/90-7/17/91
                          Provider 2: 1/1/90-11/20/91
                          Provider 3: 9/21/91-6/5/92

                      The record  review period  for this individual is set
                      to 1/1/90 through 6/5/92.


                      Case 2:   Medical  abstracts for  this  subject  were
                 received from  two providers  covering the  following
                      time periods:

                          Provider 1: 3/1/90-12/13/91
                          Provider 2: 2/3/92-8/31/92

                      The record  review period  for this individual is set
                      to 3/1/90  through 8/31/92.   Note  that  the  record
                      review dates do not take into account the 2-month gap
                      in  coverage   between  providers  (12/13/91  through

                  HIV Positive  Diagnosis (UHIV), Diagnosis Date (UHDIAGMO,
                  UHDIAGDY, UHDIAGYR)  and  Report  Date  (UHRDMO,  UHRDDY,
                  UHRDYR):   These variables  were derived to represent the
                  best information  available across  all medical  abstract
                  forms for  an individual.  A subject is coded as having a
                  positive HIV  diagnosis if  at least one medical abstract
                  confirmed a  diagnosis.   The corresponding diagnosis and
                  report dates  from that  abstract are  coded in UHDIAGMO,
                  UHDIAGDY,  UHDIAGYR  and  UHRDMO,  UNRDDY,  UHRDYR.    If
                  multiple abstracts  indicated a positive serostatus, then
                  the earliest  diagnosis and  report dates  are coded.   A
                  subject is  coded as  not having a positive HIV diagnosis
                  if  none  of  the  medical  abstracts  for  that  patient
                  confirmed a diagnosis.  A subject's serostatus is unknown
                  if all medical abstracts reported an unknown status.

                  AIDS  Diagnosis   (UAIDS),  Diagnosis   Date   (UADIAGMO,
                  UADIAGDY, UADIAGYR)  and Report Date (UAIDRDMO, UAIDRDDY,
                  UAIDRDYR):   These data items follow the same approach as
                  the HIV positive diagnosis and report date variables.  If
                  at least  one medical abstract stated that the individual
                  was diagnosed  with AIDS,  the subject is coded as having
                  an AIDS  diagnosis.   If multiple  abstracts indicated an
                  AIDS diagnosis,  then the  earliest diagnosis  and report
                  dates are  coded  in  UADIAGMO,  UADIAGDY,  UADIAGYR  and
                  UAIDRDMO, UAIDRDDY,  UAIDRDYR.  A subject is coded as not
                  having an AIDS diagnosis if none of the medical abstracts
                  confirmed a  diagnosis.   If all  medical records  stated
                  that it  was unknown  whether the subject had a diagnosis
                  of AIDS, UAIDS is unknown.


          5.2.3   Medical Abstract Repeating Record Files

                  Information   on   inpatient   stays,   outpatient/clinic

          checklist conditions,  and laboratory  reports of  T-cell  counts

          exist in  the files  as a  series of  repeating records.  Because

          data for  a subject  may have  been collected  from more than one

          usual source  of care,  each  repeating  record  has  a  provider

          identification  number   that  indicates   the  source   of   the

          information.     The  provider  identification  number,  USCID0n,

          corresponds  to  the  provider  from  whom  the  information  was

          collected.   This is  not necessarily the provider from whom care

          was received.   USCID0n  is a randomly assigned sequential number

          where n  is an  integer that uniquely identifies two records from

          the same provider.

                  Since the  medical data  represent information  collected

          from multiple  providers there are situations where data overlap,

          are duplicative, or discrepant across providers.  For example, if

          a subject  reported having  a hospital and a private physician as

          usual sources  of care,  patient charts  from  both  sources  may

          contain information  on a  single inpatient  stay; however,  data

          collected on  that stay  may  or  may  not  be  the  same  across

          providers.   It is possible that the hospital chart contains more

          detailed information on admitting or discharge diagnoses than the

          chart maintained  by the private physician, which may record only


          a primary  diagnosis.   The situation  may also  occur where  the

          admission or discharge date differ.

                  In cases  of apparent overlaps or discrepancies, the data

          have been  edited for  keying and coding errors only.  Additional

          editing, which  would have  involved either  extensive  follow-up

          beyond the  scope of  this study,  or making numerous assumptions

          without  sufficient  supporting  evidence,  was  not  undertaken.

          Discrepant or  overlapping items  across medical records have not

          been flagged.   Therefore, users should be aware that these items

          exist and,  if important  to their  analysis, they should examine

          the data files for those items prior to undertaking analysis.

 Inpatient Stays File

                  This file  contains data  on all inpatient stays reported

          in the  medical record.  For each inpatient stay, information was

          collected on  the  admission  and  discharge  dates  as  well  as

          admitting and discharge diagnoses.

                  The inpatient stays file contains one flag variable which
                  is described below.

                  Inpatient Stay  Flag (UIPSFLG):   This is a flag variable
                  that is  used to denote inpatient stays where the medical
                  record  contained   more  than  three  admitting  or  ten
                  discharge  diagnoses.     Because  the  data  files  were
                  initially structured  to accommodate  no  more  than  ten
                  codes, the  inpatient stay  flag was  introduced to alert


                  the user  that additional diagnoses were reported and are
             contained in a "secondary" record for the stay.

                  The inpatient stay flag is set to 1 on the primary record
                  and on the secondary record so that the user can identify
                  both components  of  the  stay.    The  secondary  record
                  contains the  same discharge  and admission  dates as the
                  primary record.   If  more than three admitting diagnoses
                  are reported,  the additional  diagnoses are  recorded in
                  the secondary stay and the discharge diagnoses are set to
                  blank on the secondary stay.  Similarly, if more than ten
                  discharge  diagnoses   are   reported,   the   additional
                  discharge diagnoses  are recorded  in the  secondary stay
                  and the  first admitting  diagnosis is  set to "99999" on
                  the secondary  stay.   Subsequent admitting diagnoses are
                  set to blank.

                  For example:

                  Primary Record:            Secondary Record:

                  Provider ID - 000001       Provider ID - 000001
                  Admission Date - 6/10/92   Admission Date - 6/10/92
                  Discharge Date  - 7/3/92   Discharge Date  - 7/3/92

                  Admission Diagnoses:       Admission Diagnoses:
                    Diagnosis #1 - 486XX     Diagnosis #1 - 99999
                    Diagnosis #2 - 1363X     Diagnosis #2 - blank
                    Diagnoses #3 - blank     Diagnoses #3 - blank


                  Discharge Diagnoses:       Discharge Diagnoses:
     Diagnosis #1 - 1120X     Diagnosis #1 - 1363X
                    Diagnosis #2 - 7832X     Diagnosis #2 - blank
                    Diagnosis #3 - 5589X     Diagnosis #3 - blank
                    Diagnosis #4 - 7994X     Diagnosis #4 - blank
                    Diagnosis #5 - 7806X     Diagnosis #5 - blank
                    Diagnosis #6 - 2859X     Diagnosis #6 - blank
                    Diagnosis #7 - 0389X     Diagnosis #7 - blank
                    Diagnosis #8 - 01190     Diagnosis #8 - blank
                    Diagnosis #9 - 690XX     Diagnosis #9 - blank
                    Diagnosis #10 - 0420X    Diagnosis #10 -blank

                  UIPSFLG - 1                UIPSFLG - 1

                  A primary  record and  the secondary counterpart for that
                  stay can  be identified  in the data by searching for two
                  inpatient stay  records with  the same patient ID number,
                  the same  provider ID  number, and the same admission and
                  discharge dates, and where UIPSFLG is set to 1.

 Checklist Conditions File

                  The   outpatient/clinic    checklist   records    contain

          information on  medical conditions  commonly associated  with HIV

          infected persons.   For purposes of this study, a checklist of 75

          conditions was developed and hospital outpatient, emergency room,

          clinic, and  physician records  were reviewed  for any mention of

          these conditions.

                  The checklist conditions file is structured such that the

          earliest reported  date of  diagnosis is  coded in  the  date  of

          diagnosis variables  (UCCDXMO,  UCCDXDY,  UCCDXYR).    Subsequent

          diagnosis dates  or reports  of the  condition are  contained  in

          UCCRDAT1  through  UCCRDAT8.    If  a  subject's  medical  record


          indicated numerous  visits  over  a  brief  period  of  time  for

          treatment of the same medical condition, only one report date per

          month may have been recorded.

                  Each outpatient/clinic record contains space for 8 report

          dates of  a particular  condition.   If more  than 8  dates  were

          present in  the medical  data, additional  dates are  recorded on

          another record.   In  addition, information on the same condition

          may have  been collected  from multiple  providers and appears on

          separate records.   Therefore,  it is important that users search

          all outpatient  records for  a patient  if they are interested in

          obtaining all  information that  was collected  on  a  particular


 T-Cell Reports File

                  Medical abstractors were instructed to search the medical

          record for  all laboratory  reports of  T-cell counts.  Depending

          upon the available information, absolute counts, percentages, and

          date of  report were  recorded for  monitoring both  CD4 and  CD8



          5.3     Provider Billing Survey Data Files

                  In each of six patient interviews, subjects were asked to

          supply the  names of  all medical  care providers  from whom they

          received health  care during the interview period.  Subjects were

          also requested  to grant  permission  to  allow  study  personnel

          access  to  their  billing  records  from  the  named  providers.

          Medical providers were contacted in two rounds of data collection

          depending on  whether the  patient  reported  services  from  the

          provider and  gave consent  during the  round period.   Providers

          were contacted  to obtain  information on  the services rendered,

          charges for  these services,  and the source of payment for these

          services.     The  provider  billing  survey  files  contain  all

          information gathered as part of this data collection effort.  The

          files also  include imputed  data for nonrespondent providers and

          imputed data for charges not reported by respondent providers.

                  Information collected  as part  of the  provider  billing

          survey  is   contained  on  four  files:  the  ambulatory  visit,

          inpatient stay,  home health  visit,  and  prescription  medicine

          files.   The file  structure is  one record per bill for a health

          care event  on the ambulatory, inpatient, and prescription files.

          An event is defined as an ambulatory visit, an inpatient stay, or

          a medical  prescription purchase.   The  home health file reports

          multiple events  on one record which contains billing information


          for one  or more  home health  visits by  a  particular  type  of

          caregiver during a period of time.

                  The records  of the  provider billing files have a master

          structure in which each record contains the same set of variables

          regardless of  the event  type.   Because not  all variables  are

          applicable to  each of  the files, some data fields are blank for

          all records  in the  file.   For example,  the variable NUMNIGHT,

          which refers  to  the  length  of  an  inpatient  stay,  is  only

          applicable to  the inpatient  stay file.   Therefore, NUMNIGHT is

          always blank  on the  ambulatory, home  health, and  prescription

          medicine files.

                  The  provider   billing  survey   files  contain  edited,

          derived, flag  and imputed  variables.   The following paragraphs

          describe all  derived variables  and some  flag variables.    All

          imputed variables  and any  flag variables  related to imputation

          are highlighted in Chapter 6.

                  Questionnaire Form  (PFORM):  This variable indicates the
                  type of  provider billing  survey form  on which the data
                  were  collected,   that  is,   the  inpatient  stay,  the
                  ambulatory, home health, or pharmacy billing form.

                  Hospital Ownership  Code (OWNSHP2):    OWNSHP2  indicates
                  whether the  hospital where  the  care  was  received  is
                  publicly, privately,  or federally  owned.    OWNSHP2  is
                  blank for all providers that are not hospitals.

                  Type of  Care Provided  (CARETYPE):   Each  of  the  four
                  provider billing  survey data  files contain records that
                  may represent  more than one type of health care service.


                  For example, the inpatient stay file contains information
             about both  hospital  inpatient  and  nursing  home  stay
                  events, the  prescription medicine  file includes data on
                  prescription medications as well as medical equipment and
                  supplies, and  so on.   CARETYPE  is a  derived  variable
                  which is  used to  distinguish among the various types of
                  services.   Nonresponses were  imputed by  this care type

                  Length of  Stay (NUMNIGHT):   This  variable measures the
                  duration  of   an  inpatient  stay  in  nights.    It  is
                  calculated  from   the  admission   and  discharge  dates
                  reported by the provider.

                  Inpatient Stay  Identifier (PHSTAYFG):    Each  inpatient
                  stay  is   assigned  a  unique  stay  number  (PHSTAYFG).
                  Similarly, all  emergency room visits that resulted in an
                  inpatient  admission   and  separately  billing  provider
                  events in the ambulatory visit file are assigned a unique
                  stay number  which corresponds to the stay number for the
                  associated inpatient  event.   If appropriate  for  their
                  analyses, users  can link  the inpatient, emergency room,
                  and separately billing provider events using PHSTAYFG.

          5.4     Guide to Codebooks and Annotated Questionnaires

                  There are  a series  of codebooks  which contain complete

          descriptions of  the contents  of the data files on the tape.  In

          general, there  is a  separate codebook  for each data file.  Two

          exceptions are  the codebooks  for the  Provider  Survey  Billing

          files and  the Non-Medical  Services file.   The  Provider Survey

          Billing files  have one master codebook that is applicable to the

          four component  files.  Eight codebooks exist for the Non-Medical

          Services file, one codebook for each type of non-medical service.


                  Printable codebooks are available on the tape as a series

          of text  data files.  The users manual does not contain hard copy

          versions of  the  codebooks.    File  names  for  the  individual

          codebooks are listed in Chapter 3.

                  Figure 5-1  provides a  description of  each of the items

          appearing in  the codebook.   In addition, each codebook includes

          an index of variables which alphabetically lists all variables in

          the codebook  and the  corresponding codebook  page number.   The

          index of variables is located at the end of the codebook.

                  Appendices A-C  contain copies  of  all  data  collection

          instruments for  the ACSUS  study.  The instruments are annotated

          with the  actual variable  names that  appear in  the data files.

          Annotated  variable  names  are  listed  in  brackets  under  the

          question to  which they  refer.  Questionnaire items that are not

          included in the ACSUS public use tapes are not annotated.  Figure

          5-2 is an example of the annotated questionnaire format.


                     Figure 5-1.  Example of the Codebook Format

             Figure 5-2.  Example of the Annotated Questionnaire Format


                      6.   DATA IMPUTATION

        This chapter  is an overview of the imputation procedures

associated with  the ACSUS  data.  Section 6.1 briefly summarizes

imputation   strategies   for   handling   different   types   of

nonresponses in  surveys.    Section  6.2  outlines  the  general

imputation approach  for the ACSUS provider survey data.  Section

6.3 discusses  some limitations  in the  use of the imputed ACSUS

data.  Section 6.4 describes the types of imputation variables in

the data files.

6.1     Introduction to Imputation Strategies

        The problem  of  missing  data  is  pervasive  in  survey

research.   Estimates derived  from datasets that contain missing

items may  be biased  when respondents differ from nonrespondents

with respect to the characteristics being analyzed.  By carefully

assigning responses  to missing  data, imputation  can produce  a

complete  dataset   which  compensates   for  nonresponse   bias,

simplifies  analyses,  and  produces  consistent  results  across

analyses, with minimal loss of precision.

        Nonresponse in surveys is typically categorized as either

unit nonresponse  or item nonresponse.  A unit nonresponse occurs


when none  of the survey items are obtained from a surveyed unit.

In the  ACSUS  study,  for  example,  information  could  not  be

obtained for  some patient-reported  events,  when  the  provider

refused or for some other reason was unable to participate in the

survey.   Item nonresponse  occurs when some, but not all, of the

responses  are  missing  from  an  otherwise  cooperating  survey

respondent.   In this  study, the  provider data for an event was

missing because  the provider  omitted the  data, refused to give

it, was  unable to  locate it,  etc.    The  distinction  between

different  types  of  nonresponse  is  useful  because  different

imputation  strategies   may  be   required,  depending   on  the


        The most  common imputation procedures utilize the survey

data as  a source  of information  to derive  the imputed values.

Surveyed units  are matched  by auxiliary variables that make the

events similar in respect to the data being imputed, and then the

respondents' information (known as donor data) are used to derive

the imputed  result for the nonrespondents (known as recipients).

The imputation  procedure  can  either  be  of  a  stochastic  or

deterministic  type.    The  most  common  of  these  types  are,

respectively,  the   hot-deck  imputation  method  and  the  mean

imputation method.


        Typically  the  procedure  for  both  hot-deck  and  mean

imputation methods  begins by  sorting the  surveyed units into a

set of  imputation cells.   Cells  are formed  by  combining  the

values of  the auxiliary  variables used to match surveyed units.

Adjacent cells  may be  collapsed if  a cell  has a deficiency in

donors.  Then the value for a recipient variable may be chosen at

random from  the donors  within its cell (hot-deck method), or be

set to  the mean  value calculated  across the donors in its cell

(mean method).

6.2     General Imputation Strategy for ACSUS Data

        Both unit  and item nonresponse imputation were performed

on the  ACSUS provider  data.   Only  billing  data  for  medical

providers was imputated.  Information collected from patients was

not imputed, due to a low nonresponse rate.

        For imputation, the patient-reported health care services

and provider  bills for  services  (events)  were  arranged  into

imputation categories by the following.

           Whether the  data were  collected in  Round 1  or  2.
            (Imputation was  necessary when  data were  collected
            from a provider in one round but, although requested,
            data were not collected in the other.)

           Whether the patient was adult or pediatric.


           Whether the  records for a patient/provider pair fell
            into one of the following care types:

            -   Inpatient Hospital and Nursing Home Stays.

            -   Visits by  Separately Billing  Doctors Associated
                with Inpatient Stays.  (Only for adult patients.)

            -   Emergency Room  Visits, classified  as associated
                with an inpatient stay or not.

            -   Ambulatory Visits,  including visits  to Hospital
                Clinics,  Community   Clinics,  Private   Medical
                Doctors, Mental Health Services for Psychological
                Counseling, and Medical Practitioners.

            -   Home Health Visits from Medical Personnel, Social
                Workers/Case Managers, or Helpers and other types
                of caregivers.

            -   Prescription Medications,  classified by  cost or
                by  the   drug  itself  for  the  most  frequency
                purchased medications.

If a  provider event  did not  fall into  one of  these care type

categories, then  this event  was excluded  from  any  imputation

procedure.  For instance, imputation was not performed on Medical

Equipment/Supplies data or to events for which no care type could

be assigned.

        For each category, the events were further organized into

two groups  based on  whether or  not data had been obtained from

the provider  for a  patient and  whether imputation  was  to  be

performed when data were not available.

           Group 1  consisted of  events where  the provider had
            supplied records of events reported by the patient.


           Group 2  consisted of events where the provider was a
            unit nonresponse and met these criteria:

            -   The provider refused;

            -   The  provider   did  not   respond  before   data
                collection was closed;

            -   The provider  was found  to be out of business at
                the time of the survey;

            -   The provider  had purged  all the records for the
                reference period;

            -   The provider had a language barrier or some other
                reason existed for no data; or

            -   There was  no permission form from the patient to
                collect provider data.

If a  patient/provider pair  did not meet one of the criteria for

either Group  1 or  Group 2,  then the  events for  the pair were

excluded from any imputation procedure.

        The ACSUS  imputation  process  proceeded  separately  by

imputation category.   First,  imputation cells  were defined  by

auxiliary  variables,   such  as,  geographic  region  of  health

provider, disease  stage of  the patient,  etc.   The  choice  of

variables was  dependent on  the care  type and  whether adult or

pediatric data were being imputed.  See Section 6.2.1 for details

on cell definitions.

        Then provider  records associated  with Group  1 received

item nonresponse  imputation for  missing charge  components by a

combination of  hot-deck and  mean imputation  methods.   Section


6.2.2 generally outlines this procedure.  These methods varied by

care type,  but not  across rounds or between adult and pediatric

patients.  Table 6.1 lists the charge components imputed for each

care type.

Table 6-1.  Imputed Charge Components by Care Type

        After the  item nonresponse  imputation was completed for

Group 1, a unit nonresponse imputation by the hot-deck method was

completed for  Group 2,  using Group  1 as donors.  For a Group 2

patient-reported event,  a similar  provider  survey  event  from

Group 1  (as defined  by imputation cells) was randomly selected,

and all  the billing  information associated  with this event was

imputed for the Group 2 patient-reported event.  Events that were

imputed were added to the Group 1 event file as separate records.

This imputation  strategy was  used to preserve the relationships

between billing  data and  provider care  types, as  well  as  to

preserve the  multivariate relationships within the components of

the billing information.


6.2.1   Auxiliary Variables Used To Define Imputation Cells

        The variables  listed  below  were  used  to  define  the

imputation cells.   These  variables were  defined by  the source

data   (charge    information)   separately   within   imputation

categories,  and   therefore,  varied  in  use  among  imputation

categories.   For example,  for the  adult Inpatient Stays, seven

different auxiliary variables were used to define similar events.

In  contrast,  the  imputation  process  for  pediatric  patients

utilized only three auxiliary variables for the same care type.

        In the imputations using the hot-deck methods, cells were

defined as  having either  hard  or  soft  boundaries.    A  hard

boundary cell  could not  be collapsed  with another  cell during

imputation.   On the  other hand,  a soft  boundary cell variable

would be  collapsed with another if, within a hard boundary cell,

there was  a deficiency in donors.  Cells with fewer donor events

than recipients  were combined  automatically with  events in the

adjacent soft  boundary cell  until  the  number  of  donors  was

sufficient or  until all  the events  in the  hard boundary cells

were used.   Deficient  cells with hard boundaries were collapsed

manually according to a predefined protocol.

        Auxiliary variables used during ACSUS imputation were the



           Geographic Group (classified by provider charges)

           Type of Provider Ownership (hospitals only)

           Patient's Insurance Status

           Patient's Disease Stage

           Patient's Exposure Route

           Association  of  an  Emergency  Room  Visit  with  an
            Inpatient Stay

           Type of Reported Ambulatory Care

           Type of Home Health Care

           Type of Prescription Medication

           Length of Stay (for Inpatient Stays)


           Total Charge, or Total Charge per Day or per Visit
           Number of  Days of  Home Health  Care Reported by the

           Number of  Prescription Medication Purchases Reported
            by the Patient

6.2.2   Group 1 - Item Nonresponse Imputation Procedures

        Imputation for  this group  was broken  down  into  cases

based on  the patterns of missing information and particular care

types.   The imputation  methods did  not differ  across adult or

pediatric patients.   Provider records which contained no missing

charge components  were used  as donors.   Imputation  cells were

defined according to care type.

        For  Inpatient   Stays,   Separately   Billing   Doctors,

Emergency Room  Visits, and Ambulatory Visits the same imputation

procedures were  used.   If the  total charge  appeared to  be  a

copayment (zero to ten dollars), then all billing information was

set to  missing  for  imputation  purposes  and  all  the  charge

components were  imputed.  For inpatient stays whose total charge

was imputed,  length of  stay was  also imputed.   Three cases of

missing data were imputed as follows.

        Case 1:  Total and  component charges  missing.  The hot-
        deck method  was utilized  to impute  the total  and  all
        charge components from a single donor.  Payment data were
        set to missing.


        Case 2:  Total charge  present, but all component charges
        missing.   The hot-deck method was utilized to impute all
        charge components  from a  single  donor.    The  imputed
        components were adjusted to the original nonimputed total
        by keeping  the same  proportions of  components to total
        that were on the donor event.  Payment data were retained
        as collected.

        Case 3:  Total and  some component  charges missing.  The
        mean method  was utilized.  For each respective component
        charge, the mean (or mean per diem) charge was calculated
        from those  donors where the component charge was greater
        than zero.   This mean (or mean per diem) charge was used
        to impute  the missing  component.   The total charge was
        imputed by  summing across components.  Payment data were
        set to missing.

        For Home  Health Visits  item nonresponse  imputation was

accomplished by  the mean  method.   If the total charge was zero

dollars, then  all billing  information was  set to  missing  for

imputation purposes  and the  total  charge  was  imputed.    The

imputed total  charge was  obtained by  calculating the  mean per

visit charge  from the  donors in the cell and multiplying by the

recipient's number  of visits.   There  were no components of the

total charge.

        Item nonresponse  imputation for Prescription Medications

was broken  down into two cases.  Imputation in both cases was by

the mean  method.  If the total charge was zero dollars, then all

billing information  was set  to missing  for imputation purposes

and the  total charge  was imputed.   There were no components of

the total charge.


        Case 1: Top 30 most frequently use medications for adults
        or top  10 for  pediatric patients,  with  at  least  one
        matching donor  on dosage.   If  a  recipient  event  was
        matched to  at least  one donor  with the same medication
        and exact dosage, the imputed value was calculated as the
        mean charge  per  quantity  from  the  matching  donor(s)
        multiplied by the recipient's quantity.  Note that donors
        and recipients  must have  had both  dosage and  quantity
        present.  The payment data were set to missing.

        Case 2:  Medications not in Case 1.  Within an imputation
        cell the recipient events received the mean charge across
        the donors.  The dosage and quantity of the drug were set
        to missing as well as the payment.

6.3     Limitations of the Imputed ACSUS Data

        The imputation  methods for  the ACSUS  data attempted to

preserve the  relationships between  response  items  subject  to

imputation by  sorting records  in such a way as to group records

by  characteristics   related  to   the   item   being   imputed.

Furthermore, attempts were made to check for outliers in the data

before and after imputation.

        However, if  provider survey  data  are  to  be  used  to

explore relationships  between charge  and payment  data, caution

should be  exercised.  The imputation process considered only the

charge  components   as  described   in  Section  6.2.    Billing

information, other  than charges,  was imputed  to missing during

the ACSUS  item nonresponse  imputation process.  This imputation

took place  because the  imputed charges  did not  have any  real


relationship to  other original  billing information.    Original

charge and  payment data  would be  more appropriate  to  use  in

exploring these relationships.

        The assumption behind the imputation schemes used here is

that after  controlling for  available auxiliary information, the

missing values  are missing  at random.   If this assumption does

not hold,  then estimates using the imputed values could still be

biased due to nonresponse.

6.4     Imputation Variables

        The  provider   billing  files  contain  three  types  of

variables: original,  imputed,  and  imputation  flag  variables.

This section  describes each  of  these  variable  types  and  is

intended to  provide the  user with  an understanding  of how  to

conduct analyses with the imputed data.

6.4.1   Original Variables

        Original variables  are  defined  as  the  original  data

elements as  they appear  in  the  data  collection  instruments.

Original variables  do not  contain  any  imputed  data.    Users


interested in  examining only  unimputed charge  data should  use

these variables for their analyses.

        The provider  billing  survey  collected  information  on

total charges  for a  particular type  of service  event and, for

some  services,   it  also  collected  information  on  component

charges.   Total charges  for  an  event  are  contained  in  the

variable PTE_CHRG.   Component  charges reside  in  a  series  of

variables with the naming convention PEC_*, where * refers to the

specific charge  component.   For example,  PEC_LAB is the amount

charged for laboratory services and PEC_RM is the room charge.

        The provider billing survey also collected information on

total payments  and, for some services, the source of the payment

and the reason that no payment was received.  The amount coded in

variable PTE_PAY  represents the total payment for an event.  The

amounts paid  by specific  sources are  contained in  a series of

variables with the naming convention PEP_*, where * refers to the

source.   For example,  PEP_PRVI is  the amount  paid by  private

insurance and  PEP_MED is  the amount  paid by Medicare/Medicaid.

The reason  that no  payment was  received is  represented  by  a

series of  variables PEN_*,  where *  refers to  the reason.  For

example, PEN_CARE represents Medicare assignment.


6.4.2   Imputed Variables

        For each  original variable  which may have been imputed,

the provider  billing files  have a  counterpart  variable  which

contains either  the imputed  or original  value.   For  example,

PTE_ICHR is  the counterpart  of PTE_CHRG.  If total charges were

imputed for  an event  then the  total imputed charge is coded in

PTE_ICHR.   If total  charges were  not imputed for an event then

the total  charge in  PTE_CHRG is  copied to PTE_ICHR and the two

variables have the same value.

        Imputed charge components reside in a series of variables

with the naming convention PEC_I*, where * refers to the specific

charge component.   For example, PEC_ILAB is the imputed/original

amount  charged  for  laboratory  services  and  PEC_IRM  is  the

imputed/original room  charge.   Similar to  PTE_CHRG, the PEC_I*

variables contain  the  same  as  the  original  variable  if  no

imputation was done.

        The  imputed   amounts  paid   by  specific  sources  are

contained in  a series  of variables  with the  naming convention

PEP_I*, where  * refers  to the source.  For example, PEP_IPRV is

the  imputed/original   amount  paid  by  private  insurance  and

PEP_IMED    is    the    imputed/original    amount    paid    by


Medicare/Medicaid.   The  imputed  reason  that  no  payment  was

received is represented by the series of variables PEN_I*.

        The  provider   billing  files   contain  a   series   of

miscellaneous imputation  variables.   These include  an  imputed

length of  stay (ILOS2) for inpatient stays and imputed drug type

(IDRUGCD), quantity (IQTY) and dose (IDOSE) for pharmacy events.

6.4.3   Flag Variables

        Each imputed  variable has  a counterpart  flag  variable

which is  coded to  reflect the type of imputation that was done.

The  imputation   flag  variables   use  four   different  naming

conventions, as follows.

        FGT_I* -  These are a series of imputation flags that are
        assigned  to   total  amount  variables.    For  example,
        FGT_ICHR is  the flag  variable assigned  to the  imputed
        total charge  (PTE_ICHR), FGT_IPAY  is  assigned  to  the
        imputed total amount paid (PTE_IPAY), and so on.

        FGC_I* -  These imputation  flags are assigned to imputed
        component charge  variables.  For example, FGC_IRM is the
        imputation flag  assigned  to  the  imputed  room  charge
        variable (PEC_IRM).

        FGP_I* -  These flags  are assigned  to imputed source of
        payment variables.   For example, FGP_IPRV is assigned to
        the imputed amount paid by private insurance (PEP_IPRV).

        FGN_I* -  These are a series of imputation flags that are
        assigned to  the  imputed  reason  that  no  payment  was
        received.   For example,  FGN_ICAR  is  assigned  to  the
        imputed reason for non-payment due to Medicare assignment
        variable (PEN_ICARE).


        FLG_I*  -  These  flags  are  assigned  to  a  series  of
        miscellaneous imputed  variables.   For example, FLG_IDOS
        is assigned to the imputed drug dosage variable (IDOSE).

        Each of  the imputation flag variables described above is

coded using  a series of alpha and numeric codes that reflect the

type of imputation that was conducted.  Valid codes for each flag

variable and their meanings are described in the provider billing

codebook.  An imputation flag is set to zero if no imputation was

done for  a particular variable.  An imputation flag variable may

contain multiple  codes if  more than  one type of imputation was

conducted (e.g.,  both missing  item and  event imputation) for a

particular variable.

        In addition  to the  imputation flag  variables described

above, there  is an overall event flag (IEVENTFG) which indicates

whether or  not the  entire unit  nonresponse record was imputed.

IEVENTFG is  coded Y  when the  record was  imputed and  is blank

otherwise.   Therefore, analysts  interested in excluding imputed

events from  their analyses  can use  this variable  to  identify

which records to remove.