Preparing for Analysis

Problem Formulation

Problems arise in health care from the desire to increase efficiency, contain costs, improve the quality of care, and seek new clinical knowledge. All too often, data is collected and analysis is begun on problems that are ill-posed. This inevitably results in rework, frustrated stakeholders, project delays, and wasted resources. Time spent on developing a clear, complete problem statement is time well spent. Initially, many requests for data analysis are the result of an idea or question that is not fully thought through. A good first step is to write the problem down as you understand it and have stakeholders review it. Once you have incorporated all of their comments and are satisfied that you have captured the problem correctly, you will need to get consensus from all stakeholders before proceeding. Several iterations may be required until the problem is properly captured.
Central to the problem statement is the research question, which is a concise statement of the desired inquiry. A research question must be specific so that data can be obtained and analyzed to answer the question. For example, “what is a good hospital stay?” is not specific enough. What must be specified are the characteristics of a “good” hospital stay (e.g., quality of care, cost, food, condition of the facility) and from whose perspective (e.g., patient, clinician, insurer). From the research question, the data needed can be identified and the appropriate statistical methods selected. Everything rests on the problem statement and research question.
Consider the following example. A hospital board has set a goal of improving patient satisfaction of hospital stay while maintaining quality care and keeping costs low. A patient survey is initiated to first assess current patient satisfaction. The survey will address patient satisfaction with overall quality of care, overall cost to the patient, food quality, and condition of the facility. In addition to Likert-scale ratings questions, the survey will also include some choice model questions to determine patients’ priorities in the various potential improvement areas. For example, “Which of the following is more important to you: improvements in overall cost, improvements in qualifications/training/prestige of health care professionals, or improvements in patient comfort?” At the conclusion of this survey phase, the board will identify the areas of poor patient satisfaction and will rank these according to patient prioritization. In the second phase of this study, the board will propose potential improvements for the identified areas, and will investigate the costs associated with these changes. A recommendation report will be drafted and presented to the hospital administration.

Data Acquisition

Once the problem statement is complete, data must be obtained that can be analyzed to address the research question. There are three choices: collect the data, use existing data, or combine both.
Data can be collected by an experiment, observation, or survey. Such studies must be carefully designed in order to obtain high quality data and can require considerable time, effort and funding. When human subjects are involved, approvals must be obtained from an institutional review board. Studies are designed to generate the data that will meet the needs of your specific problem.
Data from an existing source may be quickly obtained but since it was not generated for your specific purpose, additional processing may be needed. An additional concern when using existing sources is the quality of the data and the potential for bias based on how the data was collected. When using existing data, strive to obtain data from reputable, independent sources. For example, if you are studying mortality due to gun violence, data from a government or independent organization is preferable to that from a political or special interest group. When using existing data, understanding the study design and conditions under which the data was obtained is often needed to select appropriate statistical methods.
The manner in which you acquire data, either by collecting it yourself or obtaining it from someone else, will depend on the problem, your project timeline, and the resources available. If the cost to obtain the necessary data exceeds your available resources you should reduce the scope of your inquiry or delay the project until sufficient funding is available. Your statistical analysis, resulting conclusions and actions depend on the quality of your data. More detail on study designs encountered in health care can be found in the text by Rossner (2015). Fowler (2013) provides information on conducting survey research.

Data Preparation and Management

Inevitably, some data processing must be performed regardless of the way in which your data was obtained. In some cases, extensive effort is needed to prepare the data for analysis. It is common to encounter data errors, missing values, and data not in the required format. Project plans frequently underestimate the time and effort required for such data preparation.
Defining data elements and their associated units is a vital part of data preparation. Such documentation is of value to the analyst in reporting results and conducting future investigations along with other analysts who make use of the data in the future. Such definitions are essential in large data collections where data elements are often “encoded,” meaning a numeric code is stored instead of a more descriptive entry. This has the advantage of reducing the size of the data repository, minimizing data entry errors, and allows descriptions to be modified over time. Codebooks provide the detailed data definitions. While the use of codes is efficient from a data management perspective, they are not generally meaningful to stakeholders. Therefore, the code descriptions will be needed when preparing reports and visualizations for stakeholders. When collecting your own data, be sure that data definitions and units are documented in the initial data preparation phase. JMP provides features to facilitate documentation such as column notes, value labels, and the ability to store documents and data together in a JMP Project.
JMP provides many features for manipulating data. Data Operations and JMP Features in this Casebook summarizes some of the commonly used data operations that are illustrated in this casebook. This is by no means an exhaustive list. More detail on data management can be found in the DAMA Guide to the Data Management Body of Knowledge (DAMA International, 2017).
Table 1.1 Data Operations and JMP Features in this Casebook
Operation
JMP Feature
Data element definition and documentation
Column Information > Notes
Assessing the extent of missing data
Consumer Research > Categorical > Count Missing Responses
Arithmetic transformations/units conversion
Formula Editor
Assign correct measurement level
Data and Modeling Type
Combine data from two data sets, linking by common data element(s)
Tables > Join
Subsetting data
Rows > Data Filter
Concatenating data elements
Formula Editor function Concat
Create a new variable from existing data elements (derived data)
Formula Editor
Cols > Recode
The selection of an appropriate statistical method depends on the measurement levels of the response and predictor variables. Measurement levels are expressed in JMP by defining the modeling and data type for each column. In some circumstances, data elements imported into JMP will not default to the correct data or modeling type. Prior to beginning a JMP analysis check that each data element is assigned the correct data and modeling types.
Another important aspect of data management in health care settings is to protect and control access to data, particularly patient-level data containing personal health information. Organizational data that is propriety, such as personnel and intellectual property information, will also require protection and access controls.
Last updated: October 12, 2017
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset