Chapter 12

Good Research Practices for the Conduct of Observational Database Studies

Bradley C. Martin
Brenda Motheral
John Brooks
Bill Crown
Peter Davey
Dave Hutchins
Paul Stang

Abstract

12.1 Introduction

12.2 Checklist and Discussion

Acknowledgments

References

 

Abstract

Retrospective databases describing health information have become a common source of data for investigators to explore a wide range of economic, epidemiologic, safety, and effectiveness studies. This chapter describes an abridged checklist containing 10 of the most important points to consider when evaluating or designing a retrospective database study. As a quick guide, the 10 points are described summarily in table form followed by a more detailed discussion of each point. This checklist can be viewed as a guide to assess the nuances commonly encountered in retrospective observational studies in the health care arena. Some familiarity with general research principles is assumed and, in order to adequately assess some questions, relevant research training or additional reading will be required.

 

12.1 Introduction

Retrospective databases describing health information have become a common source of data for investigators to explore a wide range of economic, epidemiologic, safety, and effectiveness studies. An important strength of most retrospective databases is that they allow researchers to examine medical care utilization as it occurs in routine clinical care. They often provide large study populations and longer observation periods, allowing for examination of specific subpopulations. In addition, retrospective databases provide a relatively inexpensive and expedient approach for answering the time-sensitive questions posed by decision makers. Two recent studies have suggested that adequately controlled observational studies produce results similar to randomized controlled trials (Concato et al., 2000; Benson and Hartz, 2000). Analyses derived from retrospective observational sources also present an array of limitations and factors that must be considered when conducting an investigation. Because treatment patterns and outcome measures are only observed and never randomized, investigators must overcome selection bias, or endogeneity, that influences treatment selection and the propensity to have some outcome of interest. In addition to the issues of selection bias, researchers using retrospective data sources, particularly those derived from paid claims not collected for research purposes, must address other factors such as data quality including missing data, HIPAA, timeliness, developing appropriate operational definitions, and local coding conventions and practice patterns.

In 2000, the International Society for Pharmacoeconomics and Outcomes Research (ISPOR) convened an expert panel to develop good research practice guidelines for retrospective database studies. The panel members met and developed several drafts and presented prior versions of the checklist to the ISPOR membership at their U.S. and European meetings to solicit feedback resulting in a checklist accepted by the ISPOR Board of Directors in 2002. The complete checklist has been published (Motheral et al., 2003) and is available on the Internet (http://www.ispor.org/workpaper/healthscience/ret_dbTFR0203.asp). In 2008–2009, ISPOR as well as the U.S. Food and Drug Administration (FDA) were developing additional guidelines on the conduct of observational studies to estimate treatment effects derived from retrospective data sources. Interested readers should follow up with these organizations to identify the latest guidance. The guidelines that are being developed at the time of this writing will offer readers more clarity and detail regarding specific research design and statistical issues common in these types of studies.

The checklist presented in this chapter represents an abridged checklist containing 10 of the most important points to consider when evaluating a retrospective database study. The complete checklist contains 27 points or questions and should be used when designing a retrospective study or when a more thorough and detailed review of a retrospective database study is warranted, such as when one is serving as a journal referee.

Numerous databases are available for use by researchers, particularly within the U.S. Because the databases have varying purposes, their content can vary dramatically. Accordingly, the unique advantages and disadvantages of a particular database must be considered. In conducting or reviewing a database study, it is important to assess whether the database is suitable for addressing the research question and whether the investigators have used an appropriate methodology in reaching the study conclusions. While the checklist was written in the form of 10 questions to guide readers and decision-makers as they consider a database study, it can also serve as a guide to researchers designing, analyzing, or reporting retrospective studies. This checklist is intended to raise general issues, not to offer detailed prescriptive recommendations. Some familiarity with general research principles is assumed and, in order to adequately assess some questions, relevant research training or additional reading will be required. This is particularly true for assessing questions relating to statistics. Other important issues in the use of retrospective databases, including patient confidentiality, credibility, and study sponsorship, are not addressed in this checklist. Readers should consult with their own professional society’s guides and their own institutional guidelines on conflicts of interest, research involving human subjects, or other guidelines to inform them on these important issues. Other chapters in this text describe the statistical techniques and the SAS programming steps that can be used to implement some of the issues covered in this chapter; however, this checklist does not offer specific SAS programming steps.

 

12.1.1 How Should the Checklist Be Used?

This checklist was developed primarily for the commonly used medical claims or encounter-based databases but could potentially be used to assess retrospective studies that employ other types of databases, such as disease registries and national survey data. The checklist is meant to serve as a supplement to already available checklists for economic evaluations (Clemens et al., 1995; Weinstein et al., 1996). Only those issues that are unique to database studies or are particularly problematic in database research were included in the checklist. Not every question will be applicable to every study, but the checklist should prompt researchers to at least consider important factors affecting the quality of the study.

 

12.2 Checklist and Discussion

Topic Question
1. Database Relevance Has the database content and study population been described in sufficient
detail to determine the rationale for using the database to answer the research question and to assess how the findings can be interpreted in the context of other organizations?
2. Database Quality Have the reliability and validity of the data been described, including any data quality checks and handling of missing data?
3. Research Plan Has an a priori research plan been developed, including a rationale for selecting the particular research design, and have the potential limitations of that design been acknowledged?
4. Sample Selection Have inclusion and exclusion criteria been used to derive the final sample from the initial population, has the rationale for their use been described, and has the impact of these criteria on sample representativeness been discussed?
5. Variable Definitions Has a rationale and/or supporting literature for the selection criteria and variable definitions been provided and were sensitivity analyses performed for definitions or criteria that are controversial, uncertain, or novel?
6. Resource Valuation For studies that examine costs, has a method and rationale for valuing resources (costs, charges, payments, fee schedules) been described, and is it consistent with the study perspective?
7. Confounding If the goal of the study is to examine treatment effects, have the authors adequately controlled for confounding variables through use of a comparison group and i) multivariate statistical techniques or ii) stratification of the sample by different levels of the confounding variables to compare outcomes?
8. Statistical Analysis Have the appropriate statistical techniques been used, taking into account the particular nuances of utilization and cost data, such as skewness and correlations within and among population subgroups?
9. Practical Significance Has the practical significance of the findings been explained by discussing the statistical versus clinical or economic significance of the results and the variance explained/goodness of fit of the statistical models?
10. Theoretical Basis Has a theory for the findings been provided and have alternative explanations for the observed findings been discussed?

1. Relevance: Has the database content and study population been described in sufficient detail to determine the rationale for using the database to answer the research question and to assess how the findings can be interpreted in the context of other organizations?

Each database represents a particular situation in terms of study population, benefit coverage, and service organization. To appropriately interpret a study, key attributes should be described, including the sociodemographic and health care profile of the population, limitations on available services, such as those imposed by socialized medicine, plan characteristics, and benefit design (for example, physician reimbursement approach, cost-sharing for office visits, drug exclusions, mental health carve outs). For example, in an economic evaluation that compares two drugs, it would be important to know the formulary status of the drugs as well as any other pharmacy benefit characteristics that could affect the use of the drugs, such as step therapy, compliance programs, and drug utilization review programs.

2. Data Quality: Have the reliability and validity of the data been described, including any data quality checks and handling of missing data?

With any research data set, quality assurance checks are necessary to determine the reliability and validity of the data, keeping in mind that reliability and validity are not static attributes of a database but can vary dramatically depending on the questions asked and the analyses performed. Quality checks are particularly important with administrative databases from health care payers and providers because the data were originally collected for purposes other than research, most often for claims processing and payment. This fact creates a number of potential challenges for conducting research.

First, services may not be captured in the claims database because the particular service is not covered by the plan sponsor or because the service is carved out and not captured in the data set (for example, mental health). Second, data fields that are not required for reimbursement may be particularly unreliable. Similarly, data from providers who are paid on a capitated basis often have limited utility because providers may not be required to report detailed utilization information. Third, changes in reporting/coding over time or differences across study groups can result in unreliable data as well. It is common for procedure codes and drug codes, among others, to change over time, and the frequency with which particular codes are used can change over time as well, often in response to changes in health plan reimbursement policies.

For all these reasons, investigators should describe the quality assurance checks performed and any steps taken to normalize the data or otherwise eliminate data suspected to be unreliable or invalid, particularly when there is the potential to bias results to favor one study group over another (for example, outliers). The authors should describe any relevant changes in reporting/coding that may have occurred over time and how such variation affects the study findings. Data quality should be addressed even when the data have been pre-processed (for example, grouped into episodes) prior to use by the researcher. Examples of important quality checks include missing and out-of-range values, consistency of data (for example, patient age), claim duplicates, and comparison of data figures to established norms (for example, rates of asthma diagnosis compared with prevalence figures). Some studies cite previous literature in which the database’s reliability and validity have been examined.

3. A Priori Research Plan: Has an a priori research plan been developed, including a rationale for selecting the particular research design, and have the potential limitations of that design been acknowledged?

One of the easiest ways to drive results in a certain direction would be to impose post hoc changes in the research plan or design. A research plan describing sample inclusion criteria, variable definitions, model specifications, and statistical approaches should be developed prior to initiating the research and the results based on the a priori plan should be described. Naturally, investigators acquire new knowledge through the conduct of a study, and it is often important to implement post hoc design changes. The research report, however, should clearly describe any post hoc decisions and, when relevant, report the results of both the a priori plan and after applying any post hoc decisions.

Many research designs (for example, pre-post with control group) are available to the investigator, each with particular strengths and weaknesses, depending on setting, research question, and data. The investigator should provide a clear rationale for the selection of the design and describe the salient strengths and weaknesses of the design, including how potential biases will be addressed.

4. Sample Selection: Have inclusion and exclusion criteria been used to derive the final sample from the initial population, has the rationale for their use been described, and has the impact of these criteria on sample representativeness been discussed?

The inclusion/exclusion criteria are the minimum rules that are applied to each potential subject’s data in an effort to define a study group(s). Regardless of the database used, the inclusion/exclusion criteria can dramatically change the composition of the study group(s). Has a description been provided of the subject number for the total population, sample and after application of each inclusion and exclusion criterion? In other words, is it clear who and how many individuals were excluded and why?

Second, was there a discussion of the impact of study inclusion and exclusion criteria on study findings, because the inclusion/exclusion criteria can bias the selection of the population and distort the applicability of the study findings? For example, continuous eligibility during the study period is a common inclusion criterion for database studies. However, in government entitlement programs where eligibility is determined monthly, limiting the study population to only those with continuous eligibility would tend to include the sickest patients because they would most likely remain in conditions that make them eligible for coverage. The extent to which this would affect the applicability of study findings depends upon the study question.

5. Variable Definitions: Has a rationale and/or supporting literature for the selection criteria and variable definitions been provided and were sensitivity analyses performed for definitions or criteria that are controversial, uncertain, or novel?

Operational definitions are required to identify cases (subjects) and endpoints (outcomes), often using diagnoses codes, medication uses, and/or procedure codes to indicate the presence or absence of a disease or treatment. The operational definition(s) for all variables should be provided because different definitions can potentially lead to different results and interpretations. For example, investigators attempting to identify group(s) of persons with a particular disorder (Alzheimer’s disease) should provide a rationale and, when possible, cite evidence that a particular set of coding (ICD-9-CM, CPT-4, Drug Intervention) criteria is valid. Ideally, this evidence would take the form of validation against a primary source but more often will involve the citation of previous research.

When there is controversial evidence or uncertainty about such definitions, the investigator should perform a sensitivity analysis using alternative definitions to examine the impact of these different ways of defining events. Sensitivity analysis tests different values or combinations of factors that define a critical measure in an effort to determine how those differences in definition affect the results and interpretation. Databases allow investigators to perform sensitivity analyses in a hierarchical fashion, or caseness, where the analysis is conducted using different definitions or levels of certainty (for example, definite, probable, and possible cases).

For economic evaluations, a particularly challenging issue is the identification of disease-related costs in a claims database. For example, when studying depression, does one include only services with a depression diagnosis, those with a depression-related code (for example, anxiety), or all services regardless of the accompanying diagnosis code? As mentioned earlier, sensitivity analyses of varying operational definitions are important in these situations.

6. Resource Valuation: For studies that examine costs, has a method and rationale for valuing resources (costs, charges, payments, fee schedules) been described, and is it consistent with the study perspective?

As with any economic evaluation, reviewers should ensure that the resource costs included in the analysis match the responsibilities of the decision-maker whose perspective is taken in the research. For example, if the study is from the perspective of the insurer, the resource list should include only those resources that will be paid for by the insurer, which would exclude member co-pays and noncovered services (for example, over-the-counter medications).

Likewise, the resource should be valued in a manner that is consistent with the perspective. For a variety of reasons, the resource price information available within retrospective databases may provide an imperfect measure of the actual resource price. Typically, claims data provide a number of cost figures, including the submitted charge, eligible charge, amount paid, and member co-pay. The perspective of the study determines which cost figure to use. Rarely would charge be used as few, if any, actually pay this price. When the perspective is the insurer or plan sponsor, one typically expects the amount paid to be used to value the resource consumed. However, if trying to generalize findings beyond a specific plan, an investigator may use an average discount off charge minus an average member co-pay to arrive at an amount paid. This standardized amount is then applied to actual utilization.

That said, reported costs may not always reflect additional discounts, rebates, and other negotiated arrangements. These additional price considerations can be particularly important for economic evaluations of drug therapies, where rebates can represent a significant portion of the drug cost. In addition, prices will vary over time with inflation and across geographic areas with differences in the cost of living. In most cases, prices should be adjusted to a reference year and place using relevant price indexes.

7. Confounding: If the goal of the study is to examine treatment effects, have the authors adequately controlled for confounding variables through use of a comparison group and i) multivariate statistical techniques or ii) stratification of the sample by different levels of the confounding variables to compare outcomes?

One of the greatest dangers in retrospective database studies is incorrectly attributing an effect to a treatment that is actually due, at least partly, to some other variable. If the investigation attempts to make inferences about a particular intervention, a design in which there is no control group is rarely adequate. Without a control group (persons not exposed to an intervention) or comparison group (persons exposed to a different intervention), there often exist too many potential biases that could otherwise account for an observed treatment effect. Even with a control group, failure to account for the effects of all variables that have an important influence on the outcome of interest can lead to biased estimates of treatment effects. Two common approaches for addressing this problem include using regression modeling techniques and stratifying the sample by different levels of the confounding variables, comparing treatments within strata/potential confounders (for example, age, gender). Each of these approaches has strengths and weaknesses.

8. Statistical Analysis: Have the appropriate statistical techniques been used, taking into account the particular nuances of utilization and cost data, such as skewness and correlations within and among population subgroups?

Statistical methods are based upon a variety of underlying assumptions. Often these stem from the distributional characteristics of the data being analyzed. As a result, in any given retrospective analysis, some statistical methods will be more appropriate than others. Authors should explain the reasons why they chose the statistical methods that were used in the analysis. There is rarely, if ever, a statistical estimation approach that is singularly the most appropriate. When there is uncertainty in selecting various statistical or modeling approaches, sensitivity analyses should ideally be conducted to explore the impact the modeling approach has on study findings. There are instances when the modeling approach can have profound impacts on the study results, particularly when contrasting instrumental variable approaches with traditional regression-based approaches (Stukel et al., 2007).

9. Practical Significance: Has the practical significance of the findings been explained by discussing the statistical versus clinical or economic significance of the results and the variance explained/goodness of fit of the statistical models?

In retrospective database studies, the sample sizes are often extremely large, which can render potentially unmeaningful differences to be statistically significantly different. Furthermore, in studies with relatively small sample sizes, the large variance in cost data can render meaningful differences statistically insignificant. Accordingly, it is imperative that both the statistical and the clinical or economic relevance of the findings be discussed.

In addition, authors should provide the reader with information about how well the model predicts what it is intended to predict. Numerous approaches, such as goodness of fit or split samples, can be used. For example, in ordinary least squares regression models, the adjusted R-square, which measures the proportion of the variance in the dependent variable explained by the model, is a useful measure. Nonlinear models have less intuitive goodness-of-fit measures. Models based on micro-level data (for example, patient episodes) can be good fits even if the proportion of the variance in the outcome variable that they explain is 10% or less. In fact, models based on micro-level data that explain more than 50% of the variation in the dependent variable should be viewed with suspicion.

10. Theoretical Basis: Has a theory for the findings been provided and have alternative explanations for the observed findings been discussed?

Because large sample sizes render many statistically significant findings of questionable meaning, it is essential that the investigator provide a theory (economic, clinical, behavioral, and so on) that explains the observed findings. The examination of causal relationships is a particular challenge with retrospective database studies because subjects are not randomized to treatments. Accordingly, the burden is on the author to rule out plausible alternative explanations to the findings when examining relationships between two variables.

 

Acknowledgments

We would like to recognize the efforts of Fredrik Berggren, James Chan, Mary Ann Clark, Sueellen Curkendall, Bill Edell, Shelah Leader, Marianne McCollum, Newell McElwee, and John Walt, who are reference group members that provided comments on earlier drafts.

 

References

Benson, K., and A. J. Hartz. 2000. “A comparison of observational studies and randomized, controlled trials.” The New England Journal of Medicine 342(25): 1878–1886.

Clemens, K., R. Townsend, F. Luscombe, et al. 1995. “Methodological and conduct principles for pharmacoeconomic research.” PharmacoEconomics 8(2):1 69–174.

Concato, J., N. Shah, and R. I. Horwitz. 2000. “Randomized, controlled trials, observational studies, and the hierarchy of research designs.” The New England Journal of Medicine 342(25): 1887–1892.

Motheral, B., J. Brooks, M. A. Clark, et al. 2003. “A checklist for retrospective database studiestreport of the ISPOR task force on retrospective databases.” Value in Health 6(2): 90–97.

Stukel, T. A., E. S. Fisher, D. E. Wennberg, et al. 2007. “Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods.” The Journal of the American Medical Association 297 (3): 278–285.

Weinstein, M. C., J. E. Siegel, M. R. Gold, S. Mark, et al. 1996. “Recommendations of the panel on cost-effectiveness in health and medicine.” The Journal of the American Medical Association 276(15): 1253–1258.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset