Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 2
OBSERVATIONS AND THEIR ANALYSIS

2.1 INTRODUCTION

Sets of data can be represented and analyzed using either graphical or numerical methods. Simple graphical analyses to depict trends commonly appear in newspapers or on television. A plot of the daily variation of the closing Dow Jones industrial average over the past year is an example. A bar chart showing daily high temperatures over the past month is another. Also, data can be presented in numerical form and be subjected to numerical analysis. Instead of using the bar chart, the daily high temperatures could be tabulated and their mean computed. In surveying, observational data can also be represented and analyzed either graphically or numerically. In this chapter, some rudimentary methods for doing so are discussed.

2.2 SAMPLE VERSUS POPULATION

Due to time and financial constraints in statistical analyses, generally, only a small sample of data is collected from a much larger, possibly infinite population. For example, political parties may wish to know the percentage of voters who support their candidate. It would be prohibitively expensive to query the entire voting population to obtain the desired information. Instead, polling agencies select a subset of voters from the voting population. This is an example of population sampling.

As another example, suppose that an employer wishes to determine the relative measuring capabilities of two prospective new employees. The candidates could theoretically spend days or even weeks demonstrating their abilities. Obviously, this would not be very practical, so instead, the employer could have each person record a sample of readings, and from the readings predict the person's abilities. For instance, the employer could have each candidate read a micrometer 30 times. The 30 readings would represent a sample of the entire population of possible readings. In fact, in surveying, every time that distances, angles, or elevation differences are measured, samples are being collected from an infinite population of measurements.

From the preceding discussion, the following definitions can be made:

Population. A population consists of all possible measurements that can be made on a particular item or procedure. Often, a population has an infinite number of data elements.
Sample. A sample is a subset of data selected from the population.

TABLE 2.1 Fifty Readings

22.7	25.4	24.0	20.5	22.5
22.3	24.2	24.8	23.5	22.9
25.5	24.7	23.2	22.0	23.8
23.8	24.4	23.7	24.1	22.6
22.9	23.4	25.9	23.1	21.8
22.2	23.3	24.6	24.1	23.2
21.9	24.3	23.8	23.1	25.2
26.1	21.2	23.0	25.9	22.8
22.6	25.3	25.0	22.8	23.6
21.7	23.9	22.3	25.3	20.1

2.3 RANGE AND MEDIAN

Suppose that a one-second (1″) micrometer theodolite is used to read a direction 50 times. The second's portions of the readings are shown in Table 2.1. These readings constitute what is called a data set. How can these data be organized to make them more meaningful? How can one answer the question: Are the data representative of readings that should reasonably be expected with this instrument and a competent operator? What statistical tools can be used to represent and analyze this data set?

One quick numerical method used to analyze data is to compute its range, also called dispersion. Range is the difference between the highest and lowest values. It provides an indication of the precision of the data. From Table 2.1, the lowest value is 20.1 and the highest is 26.1. Thus, the range is 26.1−20.1, or 6.0. The range for this data set can be compared with ranges of other sets, but this comparison has little value when the two sets differ in size. For instance, would a set of 100 data points with a range of 8.5 be better than the set in Table 2.1? Clearly, other methods of statistically analyzing data sets would be useful.

To assist in analyzing data, it is often helpful to list the values in order of increasing size. This was done with the data of Table 2.1 to produce the results shown in Table 2.2. By looking at this ordered set, it is possible to determine quickly the data's middle value or midpoint. In this example, it lies between the values of 23.4 and 23.5. The midpoint value is also known as the median. Since there is an even number of values in this example, the median is given by the average of the two values closest to (which straddle) the midpoint. That is, the median is assigned the average of the 25th and 26th entries in the ordered set of 50 values, and thus for the data set of Table 2.2, the median is the average of 23.4 and 23.5 or 23.45.

TABLE 2.2 Data in Ascending Order

20.1	20.5	21.2	21.7	21.8
21.9	22.0	22.2	22.3	22.3
22.5	22.6	22.6	22.7	22.8
22.8	22.9	22.9	23.0	23.1
23.1	23.2	23.2	23.3	23.4
23.5	23.6	23.7	23.8	23.8
23.8	23.9	24.0	24.1	24.1
24.2	24.3	24.4	24.6	24.7
24.8	25.0	25.2	25.3	25.3
25.4	25.5	25.9	25.9	26.1

2.4 GRAPHICAL REPRESENTATION OF DATA

Although an ordered numerical tabulation of data allows for some data distribution analysis, it can be improved with a frequency histogram, usually simply called a histogram. Histograms are bar graphs that show the frequency distributions in data. To create a histogram, the data are divided into classes. These are subregions of data that usually have a uniform range in values, or class width. Although there are no universally applicable rules for the selection of class width, generally 5 to 20 classes are used.

As a rule of thumb, a data set of 30 values may have only 5 or 6 classes, whereas a data set of 100 values may have as many as 15 to 20 classes. In general, the smaller the data set, the lower the number of classes used.

The histogram class width (range of data represented by each histogram bar) is determined by dividing the total range by the number of classes to be used. For example, consider the data of Table 2.2. If they were divided into seven classes, the class width would be the range divided by the number of classes, or 6.0/7 = 0.857, or 0.86. The first class interval is found by adding the class width to the lowest data value. For the data in Table 2.2, the first class interval is from 20.1 to (20.1 + 0.86), or 20.96. This class interval includes all data from 20.1 up to, but not including, 20.96. The next class interval is from 20.96 up to (20.96 + 0.86), or 21.82. Remaining class intervals are found by adding the class width to the upper boundary value of the preceding class. The class intervals for the data of Table 2.2 are listed in column (1) of Table 2.3.

TABLE 2.3 Frequency Table

(1) Class Interval	(2) Class Frequency	(3) Class Relative Frequency
20.10 – 20.96	2	2/50 = 0.04
20.96 – 21.82	3	3/50 = 0.06
21.82 – 22.67	8	8/50 = 0.16
22.67 – 23.53	13	13/50 = 0.26
23.53 – 24.38	11	11/50 = 0.22
24.38 – 25.24	6	6/50 = 0.12
25.24 – 26.10	7	7/50 = 0.14
	∑ = 50	∑ = 50/50 = 1

After creating class intervals, the number of data values in each interval is tallied. This is called the class frequency. Obviously, having data ordered consecutively as shown in Table 2.2 aids greatly in this counting process. Column (2) of Table 2.3 shows the class frequency for each class interval of the data in Table 2.2.

Often, it is also useful to calculate the class relative frequency for each interval. This is found by dividing the class frequency by the total number of observations. For the data in Table 2.2, the class relative frequency for the first class interval is 2/50 = 0.04. Similarly, the class relative frequency of the fourth interval (from 22.67 to 23.53) is 13/50 = 0.26. The class relative frequencies for the data of Table 2.2 are given in column (3) of Table 2.3. Notice that the sum of all class relative frequencies is always 1. The class relative frequency enables easy determination of percentages, which are called percentage points. For instance, the class interval from 21.82 to 22.67 contains 16% (0.16 × 100%) of the sample observations.

A histogram is a bar graph plotted with either class frequencies or relative class frequencies on the ordinate, versus values of the class interval bounds on the abscissa. Using the data from Table 2.3, the histogram shown in Figure 2.1 was constructed. Notice that in this figure, relative frequencies have been plotted as ordinates.

Illustration of Frequency histogram. — **FIGURE 2.1** Frequency histogram.

Histograms drawn with the same ordinate and abscissa scales can be used to compare two different data sets. If one data set is more precise than the other, it will have comparatively tall bars in the center of the histogram, with relatively short bars near its edges. Conversely, the less precise data set will yield a wider range of abscissa values, with shorter bars at the center.

A summary of items seen easily on a histogram include:

Whether the data are symmetrical about a central value
The range or dispersion in the measured values
The frequency of occurrence of the measured values
The steepness of the histogram, which is an indication of measurement precision

Figure 2.2 shows several possible histogram shapes. Figure 2.2(a) depicts a histogram that is symmetric about its central value with a single peak in the middle. Figure 2.2(b) is also symmetric about the center but has a steeper slope than Figure 2.2(a), with a higher peak for its central value. Assuming the ordinate and abscissa scales to be equal, the data used to plot Figure 2.2(b) are more precise than those used for Figure 2.2(a). Symmetric histogram shapes are common in surveying practice, as well as in many other fields. In fact, they are so common that the shapes are said to be examples of a normal distribution. In Chapter 3, the reasons why these shapes are so common are discussed.

Illustration of Common histogram shapes. — **FIGURE 2.2** Common histogram shapes.

Figure 2.2(c) has two peaks and is said to be a bimodal histogram. In the histogram of Figure 2.2(d), there is a single peak with a long tail to the left. This results from a skewed data set, and in particular, these data are said to be skewed to the right. The data of histogram Figure 2.2(e) is skewed to the left.

In surveying, the varying histogram shapes just described result from variations in personnel, physical conditions, and equipment. For example, repeated observations of a long distance made with an EDM instrument and by taping. By EDM procedure would probably produce data having a very narrow range, and thus the resulting histogram would be narrow and steep with a tall central bar like that in Figure 2.2(b). The histogram of the same distance measured by tape and plotted at the same scales would probably be wider with neither sides as steep nor the central value as great, like that shown in Figure 2.2(a). Since observations in surveying practice tend to be normally distributed, bimodal or skewed histograms from measured data are not expected. The appearance of such a histogram should lead to an investigation for the cause of this shape. For instance, if a data set from an EDM calibration plots as a bimodal histogram, it could raise questions about whether the instrument or reflector were moved during the measuring process, or if atmospheric conditions changed dramatically during the session. Similarly, a skewed histogram in EDM work may indicate the appearance of a weather front that stabilized over time. The existence of multipath errors in GNSS observations could also produce these types of histogram plots.

2.5 NUMERICAL METHODS OF DESCRIBING DATA

Numerical descriptors are values computed from a data set that are used to interpret the data's precision or quality. Numerical descriptors fall into three categories: (1) measures of central tendency, (2) measures of data variation, and (3) measures of relative standing. These categories are all called statistics. Simply described, a statistic is a numerical descriptor computed from sample data.

2.6 MEASURES OF CENTRAL TENDENCY

Measures of central tendency are computed statistical quantities that give an indication of the value within a data set that tends to exist at the center. The arithmetic mean, median, and mode are three such measures. They are described as follows:

Arithmetic mean: For a set of n observations, y₁, y₂,…, y_n, the arithmetic mean is the average of the observations. It's value, $images$ , is computed from the following equation:
(2.1) $images$

Typically, the symbol $images$ is used to represent the sample's arithmetic mean and the symbol μ is used to represent the population mean. Otherwise, the same equation applies. Using Equation (2.1), the mean of the observations in Table 2.2 is 23.5.
Median: As mentioned previously, this is the midpoint of a sample set when arranged in ascending or descending order. One-half of the data are above the median and one-half are below it. When there are an odd number of quantities, only one such value satisfies this condition. For a data set with an even number of quantities, the average of the two observations that straddle the midpoint is used to represent the median. Due to the relatively small number of observations in surveying, it is seldom used.
Mode: Within a sample of data, the mode is the most frequently occurring value. It is seldom used in surveying because of the relatively small number of values observed in a typical set of observations. And in small sample sets, several different values may occur with the same frequency, and hence, the mode can be meaningless as a measure of central tendency. The mode for the data in Table 2.2 is 23.8. It is possible for a set of data to have more than one mode. A common example is a data set with two modes, which is said to be bimodal.

2.7 ADDITIONAL DEFINITIONS

Nine other terms, pertinent to the study of observations and their analysis, are listed and defined here:

True value, μ: A quantity's theoretically correct or exact value. As noted in Section 1.3, the true value can never be determined.
Error, ε: The error is the difference between any individual observed quantity and its true value. The true value is simply the population's arithmetic mean if all repeated observations have equal precision. Since the true value of an observed quantity is indeterminate, errors are also indeterminate and are therefore only theoretical quantities. As given in Equation (1.1), and repeated for convenience here, errors are expressed as
(2.2) $images$

where y_i is the individual observation associated with ε_i, and μ is the true value for that quantity.
Most probable value, $images$ : The most probable value is that value for a measured quantity, which, based upon the observations, has the highest probability of occurrence. It is derived from a sample set of data rather than the population and is simply the mean, if the repeated observations have the same precision.
Residual, v: A residual is the difference between any individual measured quantity and the most probable value for that quantity. Residuals are the values that are used in adjustment computations since most probable values can be determined. The term error is frequently used when residual is meant, and although they are very similar and behave in the same manner, there is this theoretical distinction. The mathematical expression for a residual is
(2.3) $images$

where v_i is the residual in the ith observation, y_i, and $images$ is the most probable value for the unknown.
Degrees of freedom: Also called redundancies, the degrees of freedom are the number of observations that are in excess of the number necessary to solve for the unknowns. In other words, the number of degrees of freedom equals the number of redundant observations (see Section 1.6). As an example, if a distance between two points is measured three times, one observation would determine the unknown distance and the other two are redundant. These redundant observations reveal the discrepancies and inconsistencies in observed values. This, in turn, makes possible the practice of adjustment computations for obtaining the most probable values based on the measured quantities.
Variance, σ²: This is a value by which the precision for a set of data is given. Population variance applies to a data set consisting of an entire population. It is the mean of the squares of the errors and is given by:
(2.4) $images$

Sample variance applies to a sample set of data. It is an unbiased estimate for the population variance given in Equation (2.4) and is calculated as:

(2.5) $images$

Note that Equations (2.4) and (2.5) are identical except that ε has been changed to v, and n has been changed to (n − 1) in Equation (2.5). The validity of these modifications is demonstrated in Section 2.10.

It is important to note that the simple algebraic average of all errors in a data set cannot be used as a meaningful precision indicator. This is because random errors are as likely to be positive as negative, and thus the algebraic average will equal zero. This fact is shown for a population of data in the following simple proof. Summing Equation (2.2) for n samples gives

(a) $images$

Then substituting Equation (2.1) into Equation (a) yields:

(b) $images$

Similarly, it can be shown that the mean of all residuals of a sample data set equals zero.
Standard error, σ: This is the square root of the population variance. From Equation (2.4) and this definition, the following equation is written for the standard error:
(2.6) $images$

where n is the number of observations and $images$ is the sum of the squares of the errors. Note that the population variance, σ², and standard error, σ, are indeterminate because true values, and hence errors, are indeterminate.

As will be explained in Section 3.5, 68.3% of all observations in a population data set lie within ±σ of the true value, μ. Thus, the larger the standard error, the more dispersed are the values in the data set and the less precise is the measurement.
Standard deviation, S: This is the square root of the sample variance. It is calculated using the following expressions
(2.7) $images$

where S is the standard deviation, n − 1 the degrees of freedom, and $images$ the sum of the residuals squared. Standard deviation is an estimate for the standard error of the population. Since the standard error cannot be determined, the standard deviation is a practical expression for the precision of a sample set of data. Residuals are used rather than errors because they can be calculated from most probable values whereas errors cannot be determined. As discussed in Section 3.5 for a sample set of data, 68.3% of the observations will theoretically lie between the most probable value plus and minus the standard deviation, S. The meaning of this statement will be clarified in example that follows.
Standard deviation of the mean: Because all observed values contain errors, the mean, which is computed from a sample set of measured values, will also contain error. The standard deviation of the mean is computed from the sample standard deviation according to the following equation:
(2.8) $images$

Notice that as n approaches infinity, $images$ approaches 0. This illustrates that as the size of the sample set approaches the total population, the computed mean $images$ will approach the true mean μ. This equation is derived in Section 6.2.3.

2.8 ALTERNATIVE FORMULA FOR DETERMINING VARIANCE

From the definition of residuals, Equation (2.5) is rewritten as

(2.9) $images$

Expanding Equation (2.9) yields

Substituting Equation (2.1) for $images$ into Equation (c), and dropping the bounds for the summation,

(d) $images$

Expanding Equation (d),

(e) $images$

Rearranging Equation (e) and recognizing that $images$ occurs n times in Equation (e) yields

(f) $images$

Adding the summation symbol to Equation (f) yields

(g) $images$

Factoring and regrouping similar summations in Equation (g) produces

(h) $images$

Multiplying the last term in Equation (h) by n/n yields

(i) $images$

Finally by substituting Equation (2.1) in Equation (i), the following expression for the variance results:

(2.10) $images$

Using Equation (2.10), the variance of a sample data set can be computed by subtracting n times the square of the data's mean from the summation of the squared individual observations. With this equation, the variance and the standard deviation can be computed directly from the data. However, it should be stated that with large numerical values, Equation (2.10) may overwhelm a hand-held calculator or a computer working in single precision. The data should be centered or Equation (2.5) used when this problem arises. Centering a data set involves subtracting a constant value (usually, the arithmetic mean or something near the mean) from all values in a data set. By doing this, the values are modified to a smaller, more manageable size.

2.9 NUMERICAL EXAMPLES

EXAMPLE 2.1

Using the data from Table 2.2, determine the sample set's mean, median, and mode and the standard deviation using both Equations (2.7) and (2.10). Also plot its histogram. (Recall that the data of Table 2.2 result from the second's portion of 50 total station directions.)

SOLUTION

Mean: From Equation (2.1) and using the $images$ value from Table 2.4, we have

$images$

TABLE 2.4 Data Arranged for the Solution of Example 2.1

No.	y	v	v²	No.	y	v	v²	No.	y	v	v²	No.	y	v	v²
1 2 3 4 5 6 7 8 9 10 11 12	20.1 20.5 21.2 21.7 21.8 21.9 22.0 22.2 22.3 22.3 22.5 22.6	3.4 3.0 2.3 1.8 1.7 1.6 1.5 1.3 1.2 1.2 1.0 0.9	11.56 9.00 5.29 3.24 2.89 2.56 2.25 1.69 1.44 1.44 1.00 0.81	13 14 15 16 17 18 19 20 21 22 23 24	22.6 22.7 22.8 22.8 22.9 22.9 23.0 23.1 23.1 23.2 23.2 23.3	0.9 0.8 0.7 0.7 0.6 0.6 0.5 0.4 0.4 0.3 0.3 0.2	0.81 0.64 0.49 0.49 0.36 0.36 0.25 0.16 0.16 0.09 0.09 0.04	25 26 27 28 29 30 31 32 33 34 35 36 37	23.4 23.5 23.6 23.7 23.8 23.8 23.8 23.9 24.0 24.1 24.1 24.2 24.3	0.1 0.0 −0.1 −0.2 −0.3 −0.3 −0.3 −0.4 −0.5 −0.6 −0.6 −0.7 −0.8	0.01 0.00 0.01 0.04 0.09 0.09 0.09 0.16 0.25 0.36 0.36 0.49 0.64	38 39 40 41 42 43 44 45 46 47 48 49 50	24.4 24.6 24.7 24.8 25.0 25.2 25.3 25.3 25.4 25.5 25.9 25.9 26.1	−0.9 −1.1 −1.2 −1.3 −1.5 −1.7 −1.8 −1.8 −1.9 −2.0 −2.4 −2.4 −2.6	0.81 1.21 1.44 1.69 2.25 2.89 3.24 3.24 3.61 4.00 5.76 5.76 6.76
												∑=	1175	0.0	92.36

Median: Since there is an even number of observations, the data's midpoint lies between the values that are numerically 25th and 26th from the beginning of the ordered set. These values are 23.4 and 23.5, respectively. Averaging these observations yields 23.45″.

Mode: The mode, which is the most frequently occurring value, is 23.8″. It appears three times in the sample.

Range, class width, histogram: These data were developed in Section 2.4, with the histogram plotted in Figure 2.1.

Standard deviation: Table 2.4 lists the residuals [computed using Equation (2.3)], and their squares, for each observation.

From Equation (2.7) and using the value of 92.36 from Table 2.4 as the sum of the squared residuals, the standard deviation for the sample set is computed as

$images$

Summing the squared y-values of Table 2.4 yields

$images$

Using Equation (2.10), the standard deviation for the sample set is

$images$

By demonstration in Example 2.1, it can be seen that Equations (2.7) and (2.10) will yield the same standard deviation for a sample set. Notice that the number of observations within a single standard deviation from the mean, that is, between (23.5″ − 1.37″) and (23.5″ + 1.37″), or between 22.13″ and 24.87″, is 34. This represents 34/50 × 100% or 68% of all observations in the sample and matches the theory noted earlier. Also note that the algebraic sum of residuals is zero as was earlier demonstrated by Equation (b).

The histogram, shown in Figure 2.1, plots class relative frequencies versus class values. Notice how the values tend to be grouped about the central point. This is an example of a precise data set.

EXAMPLE 2.2

The data set shown below also represents the seconds' portion of 50 observations of a direction. Compute the mean, median, and mode, and use Equation (2.10) to determine the standard deviation. Also construct a histogram. Compare the data of this example with those of Example 2.1.

34.2	33.6	35.2	30.1	38.4	34.0	30.2	34.1	37.7	36.4
37.9	33.0	33.5	35.9	35.9	32.4	39.3	32.2	32.8	36.3
35.3	32.6	34.1	35.6	33.7	39.2	35.1	33.4	34.9	32.6
36.7	34.8	36.4	33.7	36.1	34.8	36.7	30.0	35.3	34.4
33.7	34.1	37.8	38.7	33.6	32.6	34.7	34.7	36.8	31.8

SOLUTION

Table 2.5, which arranges each observation and its square in ascending order, is first prepared.

TABLE 2.5 Data Arranged for the Solution of Example 2.2

No.	y	y²	No.	y	y²	No.	y	y²	No.	y	y²
1	30.0	900.00	13	33.5	1122.25	25	34.7	1204.09	38	36.3	1317.69
2	30.1	906.01	14	33.6	1128.96	26	34.7	1204.09	39	36.4	1324.96
3	30.2	312.04	15	33.6	1128.96	27	34.8	1211.04	40	36.4	1324.96
4	31.8	1011.24	16	33.7	1135.69	28	34.8	1211.04	41	36.7	1346.89
5	32.2	1036.84	17	33.7	1135.69	29	34.9	1218.01	42	36.7	1346.89
6	32.4	1049.76	18	33.7	1135.36	30	35.1	1232.01	43	36.8	1354.24
7	32.6	1062.76	19	34.0	1156.00	31	35.2	1239.04	44	37.7	1421.29
8	32.6	1062.76	20	34.1	1162.81	32	35.3	1246.09	45	37.8	1428.84
9	32.6	1062.76	21	34.1	1162.81	33	35.3	1246.09	46	37.9	1436.41
10	32.8	1075.84	22	34.1	1162.81	34	35.6	1267.36	47	38.4	1474.56
11	33.0	1089.00	23	34.2	1169.64	35	35.9	1288.81	48	38.7	1497.69
12	33.4	1115.56	24	34.4	1183.36	36	35.9	1288.81	49	39.2	1536.64
						37	36.1	1303.21	50	39.3	1544.49
										1737.0	60,584.48

Mean: $images$

Median: The median is between the 25th and 26th values, which are both 34.7″. Thus, the median is 34.7″.

Mode: The data have three different values that occur with a frequency of 3. Thus, the modes for the data set are the three values 32.6″, 33.7″, and 34.1″.

Range: The range of the data is 39.3″ − 30.0″ = 9.3″.

Class width: For comparison purposes, the class width of 0.86 is taken since it was used for the data in Table 2.2. Since it is desired that the histogram be centered about the data's mean value, the central interval is determined by adding and subtracting one-half of the class width (0.43″) to the mean. Thus, the central interval is from (34.74″ − 0.43″), or 34.31″, to (34.74″ + 0.43″), or 35.17″. To compute the remaining class intervals, the class width is subtracted, or added, to the bounds central interval as necessary until all the data are contained within the bounds of the intervals. Thus, the interval immediately preceding the central interval will be from (34.31″ − 0.86″), or 33.45″, to 34.31″, and the interval immediately following the central interval will be from 35.17″ to (35.17″ + 0.86″), or 36.03″. In a similar fashion, the remaining class intervals were determined, and a class frequency chart was constructed as shown in Table 2.6. Using this table, the histogram of Figure 2.3 was constructed.

TABLE 2.6 Frequency Table for Example 2.2

Class	Class Frequency	Class Relative Frequency
29.15 – 30.01	1	0.02
30.01 – 30.87	2	0.04
30.87 – 31.73	0	0.00
31.73 – 32.59	3	0.06
32.59 – 33.45	6	0.12
33.45 – 34.31	11	0.22
34.31 – 35.17	7	0.14
35.17 – 36.03	6	0.12
36.03 – 36.89	7	0.14
36.89 – 37.75	1	0.02
37.75 – 38.61	3	0.06
38.61 – 39.47	3	0.06
	∑ = 50	∑ = 1.00

FIGURE 2.3 Histogram for Example 2.2.

Variance: By Equation (2.10), using the sum of observations squared in Table 2.5, the sample variance is

$images$

and the sample standard deviation is

$images$

The number of observations that actually fall within the bounds of the mean ±S (i.e., between 34.74″ ±2.22″) is 30. This is 60% of all the observations and closely approximates the theoretical value of 68.3%. These bounds and the mean value are shown as dashed lines in Figure 2.3.

Comparison: The data set of Example 2.2 has a larger standard deviation (±2.22″) than that of Example 2.1 (±1.37″). The range for the data of Example 2.2 (9.3″) is also larger than that of Example 2.1 (6.0″). Thus, the data set of Example 2.2 is less precise than that of Example 2.1. A comparison of the two histograms shows this precision difference graphically. For example, note that the histogram in Figure 2.1 is narrower in width and taller at the center than the histogram in Figure 2.3.

2.10 ROOT MEAN SQUARE ERROR AND MAPPING STANDARDS

Today maps can be obtained in hardcopy and digital form. Thus, modern map accuracy standards are often based a statistical quantity known as the root mean square error (RMSE). In order to check the accuracy of a map, it is required to determine the discrepancy between the coordinates as determined from points on a map versus those same points observed with a higher-order check survey. Obviously, these points must be well-defined on the map and on the surface of the earth in order to obtain coordinate values in the same locations during the field survey.

Since the check survey must be of higher accuracy than the map, these discrepancies are often considered to be residuals where the results of the check survey are considered to be without error, and thus represent the true values. This concept of values determined from a higher-order survey being considered true values is not without precedence. For example, the distances listed on the coordinate datasheet for a station or the lengths listed on an EDM calibration baseline report are often considered the true values since the surveys determining these values are considered to be more accurate than what a typical field survey would yield.

RMSE is defined as the square root of the average of squared residuals for values tested. Thus, for map accuracy standards, the differences between coordinates and elevations of points obtained from a map and their values as determined by a check survey are used to determine the accuracies of the map. Mathematically, RMSE is denoted as

(2.11) $images$

where n is the number of tested samples from the map, f(x_i) is the position of the point obtained from the map, x_i is the position obtained from the check survey, and f(x_i) – x_i the residual error for the point.

As an example consider the data shown in Table 2.7. The map coordinate values for the well-defined points are shown in columns (1) – (3). The check survey coordinate values are shown in columns (4) – (6). The discrepancies shown in columns (7) – (9) are computed as the difference between the mapped and surveyed coordinate values, which are listed as residuals. Thus, the residual for the x coordinate of point 1 is computed as 672,571.819 – 672,571.777 = 0.042 m. Similarly all other coordinate differences are determined.

TABLE 2.7 Map Coordinates versus Surveyed Checkpoint Coordinates

	Mapped Points			Check Points				Residuals
Point	(1) x (m)	(2) y (m)	(3) z (m)	(4) E (m)	(5) N (m)	(6) H (m)	(7) ΔE	(8) ΔN	(9) ΔH
1	672,571.819	410,943.912	79.832	672,571.777	410,943.930	79.865	0.042	−0.018	−0.033
2	671,203.830	418,741.450	72.483	671,203.869	418,741.425	72.457	−0.039	0.025	0.026
3	671,203.830	426,812.590	91.565	671,203.847	426,812.566	91.627	−0.017	0.024	−0.062
4	660,396.717	427,222.982	99.340	660,396.666	427,222.983	99.377	0.051	−0.001	−0.037
5	637,824.897	425,170.999	77.340	637,824.849	425,170.984	77.306	0.048	0.015	0.034
6	638,372.093	409,165.527	71.839	638,372.105	409,165.507	71.830	−0.012	0.020	0.009
7	638,782.490	416,963.064	70.133	638,782.487	416,963.064	70.137	0.003	0.000	−0.004
8	651,504.788	426,128.591	78.531	651,504.796	426,128.575	78.571	−0.008	0.016	−0.040
9	643,980.848	422,571.819	81.486	643,980.840	422,571.835	81.414	0.008	−0.016	0.072
10	645,212.038	408,755.130	86.276	645,212.063	408,755.129	86.282	−0.025	0.001	−0.006
11	667,236.662	409,575.923	93.552	667,236.659	409,575.935	93.514	0.003	−0.012	0.038
12	664,911.081	422,982.216	75.410	664,911.068	422,982.213	75.387	0.013	0.003	0.023
13	655,198.358	415,595.075	71.369	655,198.395	415,595.052	71.433	−0.037	0.023	−0.064
14	647,674.419	414,774.282	80.002	647,674.397	414,774.273	80.041	0.022	0.009	−0.039
15	654,787.962	421,340.629	89.366	654,787.971	421,340.648	89.315	−0.009	−0.019	0.051
16	663,269.494	416,279.070	78.303	663,269.486	416,279.044	78.328	0.008	0.026	−0.025
17	656,984.354	408,996.255	81.205	656,984.379	408,996.226	81.209	−0.025	0.029	−0.004
18	668,113.208	431,698.113	76.001	668,113.253	431,698.112	76.087	−0.045	0.001	−0.086
19	655,660.377	431,797.820	72.424	655,660.433	431,797.795	72.446	−0.056	0.025	−0.022
20	643,962.264	430,943.396	84.189	643,962.266	430,943.429	84.179	−0.002	−0.033	0.010
						Sum	0.017378	0.007031	0.033783
						S	±0.030	±0.019	±0.042
						RMSE	±0.029	±0.019	±0.041
						RMSE_r	±0.035

The standard deviations for each coordinate type are computed as

$images$

Using Equation (2.11), the RMSE in ΔE (RMSE_ΔE), ΔN (RMSE_y), and ΔH (RMSE_ΔH) are computed from their corresponding residuals as

$images$

From the RMSE_ΔE and RMSE_ΔN, a radial value for the horizontal positional accuracy is determined as

$images$

Following these computations, standards usually require a certain confidence level for the horizontal and vertical components of the map.

2.11 DERIVATION OF THE SAMPLE VARIANCE (BESSEL'S CORRECTION)

Recall from Section 2.7 that the denominator of the equation for sample variance was n − 1, while the denominator of the population variance was n. A simple explanation for this difference is that one observation is necessary to compute the mean ( $images$ ), and thus, only n − 1 observations remain for the variance's computation. A derivation of Equation (2.5) will clarify.

Consider a sample size of n drawn from a population with a mean, μ, and standard error of σ. Let y_i be an observation from the sample, then

(j) $images$

where $images$ is the error or deviation of the sample mean. Squaring and expanding Equation (j) yields

$images$

Summing all the observations in the sample from i equaling 1 to n yields

(k) $images$

Since by definition of the sample mean $images$

(l) $images$

Equation (k) becomes

(m) $images$

Repeating this calculation for many samples, the mean value of the left-hand side of Equation (m) will (by definition of σ²) tend to nσ². Similarly, by Equation (2.8), the mean value of $images$ will tend to n times the variance of $images$ since ε represents the deviation of the sample mean from the population mean. Thus, $images$ where $images$ is the variance in $images$ as $images$ . The above discussion and Equation (m) results in

(n) $images$

Rearranging Equation (n) produces

(o) $images$

Thus, from Equation (o) and recognizing the left side of the equation as $images$ for a sample set of data, it follows that

(p) $images$

In other words, for a large number of random samples, the value of $images$ tends to $images$ . That is, S² is an unbiased estimate of the population's variance.

2.12 SOFTWARE

A Windows-based, statistical software package called STATS is available on the companion website for this book. It can be used to quickly perform statistical analysis of data sets as presented in this chapter. The data file used in STATS is simply a listing of the individual observations. For example, in Example 2.1, the data file can be entered as it is shown in Table 2.1. After saving this file, the “Histogram data” option under the programs menu is selected. After entering the appropriate file into the software, the software performs the computations discussed in this chapter and plots a frequency histogram of the data using the user-specified class interval or the desired number of classes.

Additionally, an electronic book is provided on the companion website for this book. To view the electronic book interactively, the Mathcad® software is required. However for those you do not have a copy of Mathcad®, html files of the electronic book are also on the website. The electronic book demonstrates most of the numerical examples provided in this book. In particular, the electronic book c2.xmcd demonstrates the use of Mathcad® to solve Examples 2.1 and 2.2.

Also, a spreadsheet can be used to perform the computations in this chapter. For example, Microsoft Excel® has functions for determining the mean, median, mode, standard deviation, and histogram data. The average function computes the mean for a selected set of data. The stdev.s function computes the standard deviation for a selected sample set of data. Similarly, the mode and median functions determine these values for a set of data, and the min and max functions determine the minimum and maximum values for the data. Additionally, with an available plug-in, the software can automatically histogram the data based on the first number of the class intervals, which is known as a bin number. These functions are demonstrated for Example 2.1 in the file c2.xls on the companion website.

Many of the chapters have programming problems listed at the end of each chapter. The electronic book demonstrates the rudiments of programming these problems. Other programs on the companion website include MATRIX and ADJUST. MATRIX can be used to solve problems involving matrices in this book. ADJUST has working least squares adjustment examples discussed in this book. ADJUST can be used to check solutions of many of the problems in this book.

The installation software for the programs ADJUST, MATRIX, and STATS is available in the zip file on the website. This software is available as an aid in learning the material in this book. Purchasers of this book may install this software on their computers. The spreadsheet and worksheet files discussed in this book can be copied from the companion website to your computer. The Mathcad® e-book should be copied to the handbook subdirectory of the Mathcad program. If you do not own Mathcad, html files of the e-book are provided, which can be copied viewed once you've unpacked the zipped archive from the companion website. Readers should refer to Appendix G for specific details about the software on the website.

PROBLEMS

Note: Partial answers to problems marked with an asterisk are given in Appendix H.

*2.1 The optical micrometer of a precise differential level is set and read 10 times as 8.801, 8.803, 8.798, 8.801, 8.799, 8.802, 8.802, 8.804, 8.800, and 8.802. What value would you assign to the operator's ability to set micrometer on this instrument?
2.2 A distance measured in units of meters is observed 10 times as 186.499, 186.498, 186.495, 186.499, 186.498, 186.489, 186.489, 186.498, 186.500, and 186.491. What is the:
1. (a) Range of the data?
2. (b) Mean?
3. (c) Median?
4. *(d) Mode?
2.3 Using the data in Problem 2.2 tabulate the residuals and compute the variance, standard deviation, and standard deviation of the mean.
2.4 The second's portion of 10 pointings and readings for a particular direction 26.5, 27.4, 31.5, 27.4, 24.8, 25.7, 33.1, 29.0, 28.8, and 26.0. What is the:
1. *(a) Largest discrepancy in the data?
2. (b) Mean?
3. (c) Median?
4. (d) Mode?
2.5 Using the data in Problem 2.4, tabulate the residuals and compute the variance, standard deviation, and standard deviation of the mean.
2.6 The second's portion of 32 pointings and readings for a particular direction made using a 1″ total station with a 0.1″ display are: 48.9, 48.8, 48.6, 49.0, 48.9, 47.8, 47.8, 48.8, 49.1, 48.0, 48.0, 48.2, 48.9, 48.6, 48.8, 48.9, 48.2, 48.5, 48.5, 49.1, 48.6, 47.8, 47.8, 48.1, 49.0, 48.0, 49.1, 48.4, 47.9, 48.2, 47.9, and 48.1.
1. (a) What is the mean of the data set?
2. (b) Construct a frequency histogram of the data using seven uniform-width class intervals.
3. *(c) What are the variance and standard deviation of the data?
4. (d) What is the standard deviation of the mean?
2.7 An EDM instrument and reflector are set at the ends of a baseline that is 200.014 m long. Its length is measured 21 times, with the following results: 200.014, 200.013, 200.007, 200.016, 200.011, 200.015, 200.012, 200.018, 200.014, 200.012, 200.011, 200.009, 200.019, 200.016, 200.009, 200.016, 200.015, 200.018, 200.016, 200.007, and 200.014.
1. (a) What are the mean, median, and standard deviation of the data?
2. (b) Construct a histogram of the data with seven intervals and describe its properties. On the histogram lay off the sample standard deviation from both sides of the mean.
3. (c) How many observations are between $images$ , and what percentage of observations does this represent?
2.8 Answer Problem 2.7 with the following additional observations. 200.009, 200.015, 200.010, 200.016, 200.010, and 200.011.
2.9 Answer Problem 2.8 with the following additional observations. 200.010, 200.016, 200.016, and 200.015.

2.10 A distance was measured in two parts with a 100-ft. steel tape and then in its entirety with a 200-ft. steel tape. Five repetitions were made by each method. What are the mean, variance, and standard deviation for each method of measurement?

Distance measured with 100-ft tape	Distances measured with 200-ft tape
Section 1 100.006, 100.004, 100.001, 100.006, 100.005	186.778, 186.776, 186.781, 186.786, 186.782
Section 2 86.777, 86.779, 86.785, 86.778, 86.774

2.11 Repeat Problem 2.10 using the following additional data for the 200-ft taped distance. 186.781, 186.784, 186.779, 186.778, and 186.776.
2.12 During a triangulation project, an observer made 16 readings for each direction. The seconds portion of the directions to Station Orion are listed 26.9, 27.5, 27.1, 26.5, 25.6, 27.2, 27.4, 26.6, 26.9, 26.1, 27.4, 27.3, 27.7, 26.4, 28.4, and 27.4.
1. (a) Using a 0.5″ class interval, plot the histogram using relative frequencies for the ordinates.
2. (b) Analyze the data and note any abnormalities.
3. (c) As a supervisor, would you recommend reobservation of the station?
2.13 The particular line in a survey is measured three times on four separate occasions. The resulting 12 observations in units of meters are 536.191, 536.189, 536.187, 536.202, 536.200, 536.203, 536.202, 536.201, 536.199, 536.196, 536.205, and 536.202.
1. (a) Compute the mean, median, and mode of the data.
2. (b) Compute the variance and standard deviation of the data.
3. (c) Using a class width of 0.004 m, plot a histogram of the data, and note any abnormalities that may be present.
2.14 Repeat Problem 2.13, but use a class width of 0.003 m in part (c).
2.15 During a triangulation project, an observer made 32 readings for each direction using a 3″ total station. The second's portions of the directions are listed below. Using 7 class intervals, plot the histogram with relative frequencies for the ordinates. Analyze the data and state whether this set appears to be reasonable. 18, 17, 19, 13, 23, 18, 14, 18, 22, 21, 22, 17, 17, 20, 20, 16, 24, 16, 17, 20, 21, 20, 22, 16, 26, 21, 17, 21, 25, 24, 25, and 20.

2.16 Two students have an argument over who can turn an angle better. To resolve the argument, they agree to each measure a single angle 10 times. The results of the observations are:

Student A	Student B
108°26′10″, 108°26′10″, 108°26′08″, 108°26′10″, 108°26′10″, 108°26′05″, 108°26′04″, 108°26′10″, 108°26′11″, 108°26′05″	108°26′12″, 108°26′11″, 108°26′09″, 108°26′13″, 108°26′12″, 108°26′01″, 108°26′01″, 108°26′11″, 108°26′14″, 108°26′03″

(a) What are the means and variances of both data sets?
(b) Construct a histogram of each data set using a 3″ class width.
(c) Which student performed the best in this situation?

Use the program STATS to do Problems 2.17–2.21.

2.17 Use the program STATS to compute the mean, median, mode, and standard deviation of the data in Table 2.2 and plot a centered histogram of the data using nine intervals.
2.18 Problem 2.6.
2.19 Problem 2.7.
2.20 Problem 2.10.
2.21 Problem 2.16.

2.22 Compute the standard deviation, root mean square error, and horizontal root mean square error for the map and check survey coordinates listed in the following table.

	Map Coordinates				Check Survey
Point	e (m)	n (m)	h (m)	E (m)	N (m)	H (m)
1	643,012.990	382,012.235	151.012	643,012.978	382,012.236	151.029
2	643,018.605	382,008.065	145.000	643,018.602	382,008.045	144.986
3	643,018.538	382,001.677	157.675	643,018.525	382,001.672	157.676
4	643,027.819	382,002.114	152.290	643,027.813	382,002.111	152.295
5	643,025.532	382,007.696	148.788	643,025.534	382,007.700	148.776
6	643,033.905	382,006.250	150.051	643,033.894	382,006.239	150.047
7	643,034.443	382,002.517	159.903	643,034.430	382,002.489	159.916
8	643,028.021	382,015.161	154.465	643,028.029	382,015.162	154.455
9	643,034.510	382,010.184	154.197	643,034.501	382,010.175	154.200
10	643,034.510	382,019.938	150.347	643,034.506	382,019.951	150.333
11	643,026.138	382,022.895	153.941	643,026.142	382,022.899	153.959
12	643,020.959	382,014.623	151.639	643,020.953	382,014.629	151.654
13	643,014.268	382,022.626	147.369	643,014.250	382,022.639	147.359
14	643,011.308	382,018.725	153.243	643,011.313	382,018.714	153.231
15	643,002.498	382,020.575	151.955	643,002.502	382,020.556	151.954
16	643,003.137	382,013.614	155.297	643,003.141	382,013.629	155.285
17	643,008.080	382,009.344	145.239	643,008.076	382,009.349	145.241
18	643,002.532	382,005.880	153.661	643,002.534	382,005.885	153.669
19	643,006.231	382,002.988	149.267	643,006.240	382,002.997	149.270
20	643,002.061	382,001.206	150.317	643,002.054	382,001.211	150.315

PRACTICAL EXERCISES

2.23 Using a total station, point and read horizontal circle to a well-defined target. With the tangent screw or jog-shuttle mechanism, move the instrument of the point and repoint on the same target. Record this reading. Repeat this process 50 times. Perform the calculations of Problem 2.6 using this data set.
2.24 Determine your EDM/reflector constant, K, by observing the distances between three points that are online, as shown in the figure. Distance AB should be roughly 60 m long and BC roughly 90 m long with B situated at some location between A and C. From measured values AC, AB, and BC, the constant K can be determined as follows:

Since

$images$

then

$images$

When establishing the line, be sure that AB ≠ BC and that all three points are precisely on a straight line. Use three tripods and tribrachs to minimize setup errors and be sure that all are in adjustment. Measure each line 20 times with the instrument in the metric mode. Be sure to adjust the distances for the appropriate temperature and pressure and for differences in elevation. Determine the 20 values of K and analyze the sample set. What is the mean value for K, and what is its standard deviation?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 2: OBSERVATIONS AND THEIR ANALYSIS

Create new playlist

Sign In

Sign Up

2.1 INTRODUCTION

2.2 SAMPLE VERSUS POPULATION

2.3 RANGE AND MEDIAN

2.4 GRAPHICAL REPRESENTATION OF DATA

2.5 NUMERICAL METHODS OF DESCRIBING DATA

2.6 MEASURES OF CENTRAL TENDENCY

2.7 ADDITIONAL DEFINITIONS

2.8 ALTERNATIVE FORMULA FOR DETERMINING VARIANCE

2.9 NUMERICAL EXAMPLES

2.10 ROOT MEAN SQUARE ERROR AND MAPPING STANDARDS

2.11 DERIVATION OF THE SAMPLE VARIANCE (BESSEL'S CORRECTION)

2.12 SOFTWARE

PROBLEMS

PRACTICAL EXERCISES

Table of Contents for
CHAPTER 2: OBSERVATIONS AND THEIR ANALYSIS