Chapter 7

Sampling

Abstract

Population in statistics means the whole of the information, which comes under the purview of statistical investigation. Sample is a selected portion of the population. A sample drawn from a population provides valuable information about its parent population. Sampling is a tool, which enables us to draw conclusions about the characteristics of the population after studying only those items that are included in the sample. In this chapter we discuss types of sampling, sampling distributions. The method of determining the size of a sample is discussed.

Keywords

Population; sample; systematic sampling; finite correction factor; sampling with replacement; sampling without replacement and central limit theorem

7.1 Introduction

In this chapter we briefly introduce sampling theory. The aim of the theory is to get as much information as possible, ideally the whole of the information about the population from which the sample has been drawn. We begin our discussion by defining the term population.

7.2 Population

The aggregate of all units pertaining to a study is called population or universe.

It is a collection of items or individuals or observations having common fundamental characteristics.

If a population consists of finite or fixed number of items or values then it is called a finite population.

A population which is not finite is called an infinite population. It contains endless succession of values.

7.3 Sample

A part of the population is called a sample. It is a subset of a population.

A population may be finite or infinite according to the number of observations in it. The number of observations included in a finite sample is called the size of a sample.

If the size of the sample is less than or equal to 30 (i.e., n≤30) then the sample is called small sample.

If n>30, the sample is known as a large sample.

A sample must be representative of the population from which it is selected. It should be free from any influence that causes any difference between the sample value and the population value. The sample yield precise estimates. A good sample must be adequate in size in order to be reliable. Hence the requirement of a good sample is representativeness, adequacy, and avoiding bias.

7.4 Sampling

The process of drawing a sample from a population is called sampling.

Sampling reduces the time and, cost of a study and it saves labor. Sampling demands thorough knowledge of sampling methods and procedures. Sampling methods may be classified into two types:

1. Probability or random sampling.

2. Nonprobability or nonrandom sampling.

Types of sampling methods: There are various methods of sampling. We briefly explain some of these methods in the following sections.

7.5 Random Sampling

Sampling in which every member of a parent population has an equal chance of being included is called a random sampling. According to Moser and Alton “a random method of selection is one which gives each of the N units in the population to be covered a calculable probability of being selected.”

Under random sampling, the universe is clearly defined, every element in the population has an equal chance of being represented and the scope for bias is limited. To avoid bias one can use lottery method or random numbers.

7.6 Simple Random Sampling

In this method each member of the population has an equal chance of being selected as subject. The entire process of sampling is done in a single step with each subject selected independently of the other members of the population.

We can define simple random sample as follows: When a sample size of n is drawn from a population size of N in such a way that every possible sample size n has the same chance (probability) of being selected, then the sample is known as simple random sampling.

There are many methods to proceed with simple random sampling. The most primitive and mechanical would be the lottery method. We can also use the table of random numbers. One of the best things about simple random sampling is the ease of assembling the sample. It is also considered as a fair way of selecting a sample from a given population. Another key feature of simple random sampling is its representativeness of the population. If the sampling is done with replacement, then there are Nn samples of size n. If the sampling is done without replacement, then the number of samples size n is NCrimage.

7.7 Stratified Sampling

Stratification is the process of dividing the members of the population into homogeneous subgroups called “Strata” before sampling. The strata should be mutually exclusive such that every element of the population must be assigned only one stratum. The strata should also be collectively exhaustive. From among the strata, the sample is drawn according to the size determined. The sample can be drawn randomly. The stratified sampling allows the use of smaller sample than does simple random sampling with greater precision and consequent saving in time and money.

7.8 Systematic Sampling

It is a method of selecting sample members from a larger population according to a random starting point and a fixed periodic interval. This method is applied when complete list of the population is available. The most common form of systematic sampling is an equal-probability method. In this approach progression through the list is treated circularly, with a return to the top, once the end of the list is passed. The sampling starts by selecting an element from the list at random and then every kth element in the frame is selected, where k, the sampling interval, which is calculated as:

k=Nn

image

where n is the sample size and N is the population size.

Systematic sampling is applied only when the population from which the sample selected is logically homogeneous.

This makes systematic sampling functionally similar to simple random sampling. Systematic sampling is to be applied only if the given population is logically homogeneous. The main advantage of the systematic sampling is its simplicity. The disadvantage of this method is that the process of selecting the sample can interact with a hidden periodic trait to each other. The population is divided into various “clusters” or groups and from each group of cluster, sample is selected randomly. Ideally the clusters chosen should be dissimilar so that the sample is as representative of the population as possible. Cluster is used when “natural” but relatively homogeneous groupings are evident in a population, the total clustering saves traveling time consequently reducing the cost. It is useful for surveying employees in a particular industry, where individual companies can form clusters. Clustering has some disadvantages. The units close to each other may be similar and so less likely to represent the whole population. Clustering has larger sampling error than simple random sampling.

One of the most common quantities used to summarize a set of data is its center. The center is a single value, chosen in such a way that it gives a reasonable approximation of normality. There are many ways to approximate the center of a set of data. One of the most familiar and useful measures of center is the mean, however, using only the mean to approximate normality can often be misleading. To obtain a better understanding of what is considered normal, other measures of central tendency such as the median, the trimmed mean, and the trimean may be utilized in addition to the mean.

Parameters: The statistical constants of the population viz., the mean, the variance, etc., are known as the parameters.

Statistics: The statistical concepts of the sample, computed from the members of the sample, to determine the parameters of the population from which the sample has been drawn, are known as statistics.

7.9 Sample Size Determination

Sample size determination is the act of choosing the number of observations included in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice the sample size used in a study is determined based on the expense of data collection and the need to have sufficient statistical power.

Determining sample size is a very important issue because samples that are too large may waste time, resources, and money, while samples that are too small may lead to inaccurate results. In many cases, we can easily determine the minimum sample size needed to estimate a process parameter, such as the population mean μ.

When a sample data is collected and the sample mean x¯image is calculated, that sample mean is typically different from the population mean μ. This difference between the sample and population means can be thought of as an error. The margin of error E is the maximum difference between the observed sample mean x¯image and the true value of the population mean μ

E=zα/2σn

image

where zα/2image is known as the critical value, the positive z value that is at the vertical boundary for the area of α/2 in the right tail of the standard normal distribution; σ is the population standard deviation; n is the sample size.

image

Rearranging this formula, we can solve for the sample size necessary to produce results accurate to a specified confidence and margin of error.

n=[zα/2σE]2

image

This formula can be used when you know σ and want to determine the sample size necessary to establish, with a confidence of 1−α, the mean value μ to within ±E. You can still use this formula if you do not know your population standard deviation σ and you have a small sample size. Although it is unlikely that you know σ when the population mean is not known, you may be able to determine σ from a similar process.

Sampling theory is based on sampling. It deals with statistical inferences drawn from sampling results and are of the following types:

1. Statistical estimation

2. Tests of significance

3. Statistical inference

Statistical information is the procedure by which we reach a conclusion about a population on the basis of the information contained in a sample drawn from that population.

7.10 Sampling Distribution

The distribution of all possible values that can be assumed by some statistic, computed by some statistic, computed from samples of the same size randomly, is called the sampling distribution population.

To construct sampling distribution we randomly draw all possible samples of size n from a finite population of size N and compute the statistic of interest for each sample. We then list in one column the varied distinct observed values of the statistic and in another column list the corresponding frequency of occurrence of each distinct observed value of the statistic. We usually are interested in knowing the mean, variance, and the functional form of the sampling distribution.

When a population element can be selected more than one time, the sampling is known as sampling with replacement and when a population element can be selected only one time, the sampling is known as sampling without replacement.

In general when sampling is with replacement, the number of possible samples is equal to Nn.

And when the sampling is without replacement, the number of possible samples is NCnimage.

Standard error: The square root of the variance of the sampling distribution, i.e., σ/n is called the standard error of mean or simply standard error.

Example 7.1: Consider the five numbers 6, 8, 10, 12, and 14 representing a population of size N=5.

The mean μ of the population is

μ=xiN=6+8+10+12+145=505=10

image

σ2=(xiμ)2N=(4)2+(2)2+(0)2+(2)2+(4)25=405=8

image

S2=(xiμ)2N1=(4)2+(2)2+(0)2+(2)2+(4)24=404=10

image

All possible samples of size 2 from the population with replacement are shown below:

(6, 6) (6, 8) (6, 10) (6, 12) (6, 14)
(8, 6) (8, 8) (8, 10) (8, 12) (8, 14)
(10, 6) (10, 8) (10, 10) (10, 12) (10, 14)
(12, 6) (12, 8) (12, 10) (12, 12) (12, 14)
(14, 6) (14, 8) (14, 10) (14, 12) (14, 14)

Image

Number of samples with replacement=Nn=52=25

The sample means of the above samples are shown below:

6 7 8 9 10
7 8 9 10 11
8 9 10 11 12
9 10 11 12 13
10 11 12 13 14

Image

The table above is known as sampling distribution of means.

The mean of the sampling distribution of means

μx¯=x¯1Nn=6+7+7++14Nn=25025=10

image

We observe that the mean of the population

=ThemeanofsamplingdistributionofmeansThevarianceofthesamplingdistributionofmeans

image

σx¯2=(610)2+(710)2++(1410)225=10025=4

image

We now consider all samples without replacement that can be drawn from the given population:

(6, 8) (6, 10) (6, 12) (6, 14)
(8, 10) (8, 12) (8, 14)  
(10, 12) (10, 14)   
(12, 14)    

Image

The variance of this sampling distribution is

σx¯2=(x¯1μx¯)2Ncn=3010=3

image

When sampling is without replacement from a finite population, the sampling distribution of x¯image will have mean μ and variance

σx¯2=σ2nNnN1

image

where the factor (Nn)/(N1)image is called finite correction factor, which can be ignored when the sample is small compared to the population size.

Sampling from normally distributed population: When sampling is from normally distributed population, the distribution of the sample mean will be normal. The mean μx¯image of the distribution of x¯image will be equal to the mean of the population from which the samples were drawn, and the variance σx¯2image of the distribution of x¯image will be equal to the variance of the population divided by the sample size.

Central limit theorem: When sampling is from a normally distributed population we refer to an important theorem, which is known as the central limit theorem. The theorem is stated as follows:

“Given a population of any nonnormal functional form with mean population μ and finite variance σ2image, the sampling distribution of x¯image, computed from samples of size n from this population, will have the mean μ and variance σ2/nimage and will be approximately normally distributed when the sample size is large.”

The mathematical formulation of the central theorem is that the distribution of (x¯μ)/(σ/n)image approaches a normal distribution with mean 0 and variance 1 as n.

Exercise 7.1

1. Define the terms

a. Population

b. Sample

2. Define

a. Random sampling

b. Stratified sampling

3. Define

a. Systematic Sampling

b. Sampling with replacement

c. Sampling without repetition

4. Explain central theorem

5. What is sampling distribution?

6. Consider the population consisting of the numbers 3, 5, 7, 9, and 11. Find all the samples of size 2 and show that the mean of the population is equal to the mean of sampling distribution of mean?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset