CHAPTER 3

Sampling

Sampling is a process of selecting a representative subset of a population for analysis. Sampling and analysis are part of a process to answer research questions. Populations and sampling methods are different, depending on the size and variation of the population, and the question to be answered. For example in a nationwide study of political opinions, people answer questions about their tendency to vote for candidates, or supporting a political party. For potential customers of a new baby stroller, people answer questions about their interest in this product, and their tendency to purchase it. Both studies are nationwide, but the target population is not the same. In the first example, the population is made of all registered voters, and those who plan to register before election day. Younger than 18 and noncitizen residents are not part of the relevant population. Market research for the baby stroller has a population different from potential voters. It is limited to families who expect children. An obvious reason for sampling is that measuring characteristics of the whole population is often not feasible, and in some cases impossible.

Definitions and Concepts

Sampled population is the entire group of people or entities, from which samples are taken. Individuals and entities, elements of the population, hold characteristics of interest for researchers. By measuring these characteristics, we can estimate population parameters such as mean, standard deviation, or proportion of groups.

An individual asked about voting in a particular election is a citizen, old enough to vote, and is an element of the sample. Each element has an answer (characteristic) we are interested in. We can estimate proportions of different groups of voters by estimating population parameters from a sample. The population of voters is a finite population since it is limited to registered voters.

The population of fuel injection units (elements) in an auto parts manufacturing facility, is all fuel injections manufactured through a specific production line of that facility. These fuel injection units have characteristics that determine their quality. Characteristics can be physical measurements like size and blemishes, or performance outcomes such as the unit functionality or frequency of failure during a test period. We are interested in measurement of these characteristics to use them for estimating population parameters such as average or proportion defectives. These numbers help us understand and improve overall quality of the production system.

While it is ideal to ask everybody about voting, or put every single unit of production through a measurement test, it is not feasible to do so. In most cases, costs of examining the entire population are prohibitive. Some populations like the list of households in the country, change every day. Keeping track of all households all the time is not feasible either. We use a sampling frame which is the pool of elements in a population we can sample. Populations like registered voters are finite. There is a known list of potential voters. In each state, county, and city we have a known number of people in this pool. Randomly selecting elements of this population is different from the population of fuel injection units, which is infinite. The production of units continues to add new elements. We sample only those elements produced during a specific period.

Representative samples can provide a good estimate of population parameters. There are different methods of sampling depending on the nature of the population and characteristics of interest.

Simple random sample is a method of randomly selecting elements given that each element has equal chance of being chosen, and expected characteristics of elements do not have different subsets associated with their other characteristics. An example of this method is randomly selecting 500 individuals of a particular ethnicity for medical research. This method is used when we are not aware of other differences among the group. Having more data about the population may encourage using a different method of sampling.

Stratified random sampling is used when the population has subgroups. These subgroups have a meaningful effect on the value of characteristics we measure. In our example of voters, if there is an expected difference between the tendencies of voters in one state vs. another, we need to consider the proportion of these subgroups in our sample. Sampling for demographics or student opinion about a topic at college of business should include the same proportion of finance, accounting, supply chain management, and other majors, as the population of students of the college.

Random sampling is a process in which we select elements from an infinite population. Random sampling is used when simple random sampling is not possible. One example is a very large population in which tracking all elements in not feasible. Another example is an ongoing process, manufacturing or service, which regularly produces new elements. Developing a frame and taking a representative simple random sample is not possible in this situation. Random sampling is similar to simple random sampling. The difference is that not all elements have equal chance of being chosen.

Systematic sampling is a method in which choosing elements starts from a random point or a specific time and continues at equal intervals. This can be a time (i.e., every hour) or one out of a given number of elements in sequence (i.e., every 200th unit). The interval can be calculated by dividing the population size by the desired sample size. If daily production is 10,000 units and we plan for a sample size of 50, every 200th unit on production line is sampled.

Confidence Level: In quality control process, we consider a level of confidence (i.e., 95%) for our results. Assuming a normal distribution of observations, for a 95% confidence level there is α = 5% probability of error (see Figure 3.1). We assign α/2 = 2.5% of this probability to the upper (right), and 2.5% to the lower (left) tail of distribution. Confidence interval is the area under the curve containing 95% of the space. Margin of error is the distance between the mean and each area of error.

image

Figure 3.1 Confidence interval in a normal distribution

A margin of error equal to 3 standard deviations is a popular limit for control charts, which translates to 99.74% confidence.

Sample size is determined before we start sampling. There are multiple methods for determining sample size. Since our sample mean image is not identical to population mean (µ), we consider the difference of the two as sampling error (E). Standard error of the sample is also determined by dividing the standard deviation of population (σ) by square root of sample size:

image

We have the following relationship between margin of error, confidence level, population standard deviation, and the sample size:

image

Solving for sample size “n”:

image

We subgroup samples by adding up or averaging characteristics’ measurements on elements of our samples.

Quality Control Sampling

Quality control charts are demonstration of characteristics measured on samples, and their confidence intervals. In these visual representation of samples we can easily distinguish the trends, variations, and if there are samples with unacceptable characteristics. These diagnostic steps indicate the presence of potential or existing problems in process. There might be a large difference between output parameters such as average size, and the target value, or too much variation in output.

Two types of characteristics are measured in quality control; variables and attributes. Continuous numbers measures variables such as time, weight, length, and volume. Attributes like number of defects in a lot, number of blemishes on a car, and number of failures through a 100-hour test of a unit, are measured by discrete values. These measurements follow different distributions. Control charts used for demonstrating the position of samples relative to control limits, vary based on these distributions. We introduce a few popular control charts in this chapter.

Average and Range Control Charts

Using continuous numbers for measuring a variable provides more detail information comparing to discrete values and measuring attributes. In variable measurement, each sample includes a number of elements. Using characteristics of interest measured in these elements, we can calculate two important parameters for each sample; mean and range. Means of samples are data points for developing average chart. We can observe this chart for indication of mean variations, and if the process works to produce elements with expected characteristics. Sample ranges are difference of the maximum and minimum measurement for characteristic of interest. Ranges are data points for range chart, which demonstrates the extent of variations in measurements of each sample.

Upper and lower control limits for average chart are developed based on the overall mean of sample means image, and the desired confidence level (see Figure 3.2). We will establish confidence interval within 3 standard deviations from the overall mean. A control chart for average of samples has a horizontal line representing the overall average, or a given target value for the characteristic of interest, as its centerline. This chart has upper and lower control limits, at a distance of 3 standard errors above and below the mean line. Researchers project the average value of each sample on this chart. For an “in control” process, all these projections are between the upper and lower control limits. Any point beyond the two control limits indicates the process is “out of control.”

image

Figure 3.2 Control limits

Formulas for average chart upper and lower control limits:

image

Note that these formulas divide the standard deviation by the square root of n—which is appropriate for the average of all observations. However, batching is different—the variance of batch averages is NOT divided by the square root of n—the batching will take care of that.

In range charts, we have only one observation as “range” for each sample. The formulas for upper and lower control limits of range chart simplify as the following:

image

Standard deviation used in range chart formula is the standard deviation of sample ranges.

In the example data file (extract shown in Table 3.1), we have recordings of samples taken from a steel forging production line. Steel bars made for a designed length. Quality control person takes samples and measure their length every hour, starting 8:30 a.m., followed by seven other samples through 3:30 p.m. There are four machines; A, B, C, and D. Four operators, Liam, Mike, Jacob, and Luke work on these machines. A sampling plan is to take one sample from each machine at every sampling time. Each bar is measured and logged, along with the day of the sampling, sampling time, the machine, and the operator working on that machine at that time. We have eight samples per day, each including four elements (bars), for a total of 32 bars a day. Sampling records for 20 days, records of 640 bars are available.

Table 3.1 Extract of control sampling data

image

We are interested in overall performance of the production process. Consistency of the characteristic (length) is our priority. Length is a variable, so we will develop a pair of average and range charts.

We can subgroup elements in different ways. Average and range of sample measurements in each “day,” or each “sampling time” are two different ways of subgrouping. We will start with subgrouping by day.

Use “Bar_Defects” worksheet. Insert column headers “Date,” “Average_day,” and “Range_day” in the first row of columns G, H, and I. Column “Day” requires a sequence of 1 to 20. Column “Average_Day” lists averages of measurements for all days. You can create this list by using the following formula in cell H2, and copying it through range H2:H21. This formula adds up lengths grouped by day, and divides the result by 32, the sample size. All absolute and relative cell references properly work to return the average length of all samples for each day:

=SUMIF($A$2:$A$641, G2,$C$2:$C$641)/32

Another way of calculating average by group is to store the data in a database and run a query for average of lengths, grouped by day. An SQL script for calculating average daily measurements from data stored in Access table named “Defects_Data” is:

SELECT Day, AVG(Length) AS Average_Length

FROM Defects_Data

GROUP BY Day;

The following query on the same table returns the range of measurements for each day:

SELECT Day, (MAX(Length) − MIN(Length)) AS Range

FROM Defects_Data

GROUP BY Day;

Copy results into your Excel worksheet (Table 3.2).

Use the list of days, averages, and ranges in columns G, H, and I for calculating control limits and creating charts. The following formulas calculate upper and lower control limits for average chart. Use the upper and lower control limit formulas for average chart:

image

Table 3.2 Results copied

image

Upper and lower control limits for range chart are calculated by the following formulas:

UCLR = image + Za/2 · σ = 0.311 + Za/2(0.0574) If a Z of 3 were used, this=0.483

LCLR = imageZa/2 · σ = 0.311 − Za/2(0.0574) If a Z of 3 were used, this=0.139

Use Excel function STDEV.S( ) for calculating the standard deviation of length measurements. Figure 3.3 demonstrates complete control limit calculations in Excel (this example uses a Z of 3—you can replace that with the appropriate normal distribution Z score for α/2 to obtain limits at specified probabilities in assignments):

Build the Average Chart: average chart has five components:

1. Mean line: a straight line projecting calculated average of the measurements

2. Upper control limit: a straight line demonstrating the upper control limit

3. Lower control limit: a straight line demonstrating the lower control limit

4. Observations: average points calculated. We demonstrate these points as dots on the chart

5. Subgroup markers: mark the values of subgroups on the horizontal axis of chart

image

Figure 3.3 Control chart calculation in Excel

Table 3.3 Excel input to generate average charts

image

Build charts by copying values in five columns (Table 3.3):

Column A: Days, listed 1 to 20 and used for marking subgroups. This column does not have a header.

Column B: Average measurements for all days.

Column C: Overall average image.

Column D: Upper Control Limit.

Column E: Lower Control Limit.

All values in each column of C, D, and E are the same, creating a straight horizontal line.

Select the entire data range and insert a line chart. The resulting average chart includes upper and lower control limits, the mean line, subgroup markers on the horizontal axis, and projection of daily averages as a line. We can identify the observations where average is beyond the control limits (see Figure 3.4).

For illustrating averages as points, do the following steps:

Select Average_day line and right click on the selected line

Screen menu pops up. Select “Change Series Chart Type”

“Change Chart Type” dialog box opens. Change “Average_ day” type from line to scatter

Click OK

Figure 3.5 demonstrates the sequence of changing one data series chart type.

image

Figure 3.4 Average chart control limit plot

image

Figure 3.5 Control limit interpretation

Range Chart: create a table similar to the one used for average chart. List of days in the first column make subgroup markers. Second column lists the ranges of measurements for all days. Third column fills with average of ranges, values in the second column. Fourth and fifth column are upper and lower control limits. Insert a line chart on this data and change “Range_day” data series chart type to Scatter, similar to the change in average chart. The range chart in Figure 3.6 shows no out of control observation:

image

Figure 3.6 Range chart plot

Control Charts for Attributes (C-Charts)

In quality control, we do not always measure a variable. Tracking all details may not be required or not feasible in all cases. For example when controlling for blemishes on car bodies after painting, or imperfections of carpets out of a production line, we do not need to measure the exact size or color of the spots. Counting and logging the number of nonconformities are practical solutions in these cases. Size, shape, and color of these blemishes may not play a significant role in quality control decisions. The important data we need is the frequency of nonconformities.

We discuss two methods for recording and analyzing nonconformities. The first method is counting the number of nonconformities in each element of the sample. We use C-Chart in this method. The second method sets acceptance criteria for elements. Any element conforming to all expected specifications is acceptable, otherwise considered a defect. Proportion of defects in samples can determine how the production process conforms to specifications. P-Chart is used in this method.

C-Charts: Counting the number of quality problems in each sampled unit or time interval is one of the inspection methods. Examples include inspection of 50-yard carpet roles, 10-mile segments of a highway checked for maintenance spots, and inspection of one element, typically a complex product sampled from a production line. A car may have many problems right out of the assembly line, but the manufacturer tries to fix these problems rather than calling it a defect and scraping it. A roll of carpet may have a variety of problems such as discolored spots and pieces of fabric that do not match the designed pattern. Frequency of these problems may downgrade the carpet, but it still holds some value. Tracking these problems is valuable for improving the production process. C-Charts are useful for assessing overall performance of a process and its stability. They are a tool for comparing the system performance before and after a process change.

Finding each instance of imperfection in a time interval or in each element sampled, is an event. Frequency of events follows a Poisson distribution. C-Chart is used for tracking and demonstrating frequency of defects on more complex, and usually more expensive products and services that cannot be simply ruled out as defective. Similar to other control charts, C-Chart projects the sampling results around a centerline. Upper and lower control limit formulas establish limits at +/− 3 standard deviations around the mean. In a Poisson distribution, mean and variance are equal:

image = Average number of events in one interval=Variance of the number of events in intervals.

Given “Ci” is the number of defects counted for the ith element and “k” represents the number of elements in our sample:

image

Upper and lower control limit formulas for a C-Chart are:

image

In “C-Chart” worksheet, we have an example data set of 50 cars inspected for cosmetic, electrical, and mechanical problems. Each car received a sample number, used for a log of the number of problems observed. In this sampling we track the number of problems regardless of size, type, intensity, or effect on the status of that element. Calculating average number of defects is the first step for creating a C-Chart for this sample. If we want 90% control limits, Za/2 = 1.645. We can plug in the calculated average to upper and lower control limit formulas:

Table 3.4 Excel data for C-chart

image

image

Should the lower control limit come out to be a negative number, use zero instead. Table 3.4 demonstrates formulas in Excel. The only change we need is substituting all results in column “E” with zero and inserting a line chart. Change chart type for data series “Observed Defects” to scatter.

Probability Control Charts (P-Charts)

We can inspect elements of samples and count them as acceptable or defective based on their conformance to expected specifications (P-charts). Consider finding a defective element as success for the process of inspection. Failure to find a nonconformity results in element passing the test as acceptable. This is a Bernoulli process. We try each element in sample and mark it as success or failure (pass or found a problem). Probability of “success,” finding a nonconformity, follows a binomial distribution. We have the sample size and percentage of defective elements, the two key factors for calculating the mean and standard deviation of a binomial distribution. Data collected from multiple samples can provide enough information to develop the control limits and project each sample’s percentage defective. The mean and standard deviation of binomial distribution formulas are:

image

In these formulas:

ni = Sample size (may be constant or vary for different samples)

image = Average sample size

Pi = proportion nonconformity (defective) of sample

image = Average proportion defective

σp = standard deviation of nonconformities

Upper and lower control limit formulas:

image

In Excel exercise file “P-Chart” worksheet, we have records of eCity online customer orders processed at the central warehouse. One hundred orders from each week sampled and inspected for errors during the past 25 weeks. Sample sizes listed in column “B” and number of orders found to be nonconforming in column “C.” In this case, we do not have a list or type of problems in customer order processing. All we know is the sample size and number of nonconforming orders. We can calculate the proportion of these nonconformities, then the average proportion of observations in cell “F2” with this formula:

=SUM(C2:C26)/SUM(B2:B26)

If image were 0.3 and n were 25, upper and lower control limit formulas at a 90% confidence level would be:

=0.3+1.645×SQRT(0.3×(1-0.3)/25)=0.451

=0.3−1.645×SQRT(0.3×(1-0.3)/25)=0.149

Lower control limits could turn out to be greater than one or negative. In such cases, set the limit to one or zero, since we cannot have probabilities outside these values.

Conclusions

Control charts are a fundamental tool for quality control. From the operations management perspective, this is an important tool that has been used for average, range, count, and probability contexts, each demonstrated in this chapter. Probably more fundamental is the concept of probability distribution upon which control charts rely. Quality data very often finds that variance of output is normally distributed. That is not the only possible distribution, but is the assumption made in the formulas used in this chapter. The key point is that you can specify any probability level, with half of the error expected above and below the confidence limits you generate (because normally distributed data is symmetric).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset