Chapter 10. Mathematical and Statistical Charting Techniques

In This Chapter

  • Drawing circles with a scatter chart

  • Connecting data points to the chart axes or the origin

  • Creating frequency distributions and histograms

  • Plotting a normal curve

  • Calculating the area under a curve

  • Creating a box plot

  • Plotting 1– and 2–variable mathematical functions

  • Simulating a 3–D scatter plot

This chapter describes some charting techniques that may be useful for those who are mathematically or statistically inclined. The charts in this chapter use techniques that might not be obvious.

Drawing a Circle with an XY Series

This section describes how to create a scatter chart that displays a perfect circle. To do so, you need two ranges, one for the x values and another for the y values. The number of data points in the series determines the smoothness of the circle.

The example in Figure 10-1 uses 13 points to create a circle with an origin of 0, 0 and a radius of 1. This series uses the Smoothed Line option (on the Marker Line Style tab of the Format Data Series dialog box). When this option is not set, the circle is not very smooth, and its component lines are clearly visible.

This scatter chart uses 13 data points to define a circle.

Figure 10-1. This scatter chart uses 13 data points to define a circle.

Note

The examples in this section (plus a few additional examples) are available on the companion CD–ROM. The filename is scatter chart circles.xlsx.

To draw a circle on a chart, generate a series of values such as the ones shown in column A. The numeric series starts with 0 and has 30–degree increments. The ranges that are used in the chart are in columns C and D. The formula in C2 is as follows:

=COS(RADIANS(A2))

The formula in D2 is as follows:

=SIN(RADIANS(A2))

The formulas in C2 and D2 are copied down to subsequent rows.

Note

To plot a circle with more data points, you need to adjust the increment value in column A (the final value should always be 360). The increment is 360 divided by the number of data points minus 1. The more data points used, the smoother the circle.

Tip

The circle won't look accurate unless the plot area is adjusted so that a vertical unit is exactly the same length as a horizontal unit. The examples in this section all use the same major unit for both axes.

The next example, which builds on the previous example, demonstrates how to use a scatter chart series to draw circles around data points in a chart. Figure 10-2 shows a scatter chart that contains four series: The first series (range A2:B4) plots the three data points (as markers only, no line). Three additional series (ranges B9:C21, D9:E21, and F9:G21) plot a circle around each point (as lines, not markers).

Using XY series to draw circles around data points.

Figure 10-2. Using XY series to draw circles around data points.

The three circle series use formulas similar to those described in the previous section, but these formulas allow a specific origin and radius. The radius of each circle is defined by the entries in row 7.

Connecting Scatter Chart Data Points to the Axes with Error Bars

The example in Figure 10-3 shows a scatter chart in which each data point is projected to the x–and y–axis. These lines are created with error bars. After you create the basic scatter chart using the data in A2:B9, choose Chart Tools

Connecting Scatter Chart Data Points to the Axes with Error Bars
Using error bars to project each data point to the axes.

Figure 10-3. Using error bars to project each data point to the axes.

You need to adjust each set of error bars separately. For the vertical error bars, specify the Percentage option and enter 100 Repeat this for the horizontal error bars.

Note

The examples in this section and the section that follows are available on the companion CD–ROM. The filename is connecting data points.xlsx.

Connecting XY Points to the Origin

Figure 10-4 shows a scatter chart in which each data point is connected to the origin. This type of chart requires an additional series. The chart shown in the figure displays five data points (as markers), and the chart has an additional data series displayed as lines.

Using additional data series to connect each data point with the chart's origin.

Figure 10-4. Using additional data series to connect each data point with the chart's origin.

The first series uses the data in B3:C7 and is set to display markers (but no line) and data labels. The second series uses the data in B10:C23. This range duplicates the data in B3:C7, but with an additional point at the chart's origin.

After creating the scatter chart using the data in B3:C7, follow these steps to add the additional data series:

  1. Select the range B10:C23 and press Ctrl+C.

  2. Select the chart and choose Home

    Using additional data series to connect each data point with the chart's origin.
  3. In the Paste Special dialog box, select New Series, Values (Y) in Columns, and Categories (X Values) in First Column. The result will be a new series overlaid on top of the original series.

  4. Use the Line Color tab in the Format Data Series dialog box to specify a line for that series. Then format the line any way you like.

Creating Frequency Distributions and Histograms

A frequency distribution is a summary table that shows the frequency of each value in a range. For example, an instructor may create a frequency distribution of test scores. The table would show the count of test scores in various numeric ranges. A chart created from a frequency distribution is often referred to as a histogram.

Excel provides a number of ways to create frequency distributions. You can:

  • Use the FREQUENCY function

  • Use the Analysis ToolPak add–in

  • Use a pivot table

Note

The following sections cover the FREQUENCY function and the Analysis ToolPak options. Refer to Chapter 11 for examples of using a pivot table to create a histogram.

Note

The examples in the following sections are available on the companion CD–ROM. The filename is frequency distributions.xlsx.

Using the FREQUENCY Function

Excel's FREQUENCY function provides a relatively easy way to create a frequency distribution. This function always returns an array, so you must use it in an array formula entered into a multicell range.

Figure 10-5 shows a workbook with data in range A2:A1001 (named Data). These values range from 43 to 100. The range C5:C14 contains the bins used for the frequency distribution. Each cell in this bin range contains the upper limit for the bin. In this case, the bins consist of <=55, 56-60, 61-65, and so on. See the nearby sidebar, "Creating bins for a frequency distribution," to discover an easy way to create a bin range.

To create the frequency distribution, select a range of cells that correspond to the number of cells in the bin range — in this example, range D5:D14. Then enter the following array formula into the selected range:

=FREQUENCY(Data, C5:C14)

Note

Enter an array formula by pressing Ctrl+Shift+Enter.

Creating a frequency distribution for the data in column A.

Figure 10-5. Creating a frequency distribution for the data in column A.

The array formula is entered into all the selected cells and returns the count of values in the Data range that fall into each bin. You can then create other formulas that make use of the frequencies. For example, column E displays cumulative frequencies, and column F shows cumulative percent.

The combination chart was created using the frequency data in column D (the column series) and the cumulative percent data in column F (the line series). The histogram suggests that the data approximates a normal distribution.

Using the Analysis ToolPak to Create a Frequency Distribution

If you install the Analysis ToolPak add–in, you can use the Histogram option to create a frequency distribution. Select Data

Using the Analysis ToolPak to Create a Frequency Distribution
The Analysis ToolPak's Histogram dialog box.

Figure 10-6. The Analysis ToolPak's Histogram dialog box.

Note

If the Data

The Analysis ToolPak's Histogram dialog box.

Specify the range for your data (Input Range). If you've created a bin range, specify that range — otherwise, leave it blank and the program will generate bins automatically. Specify the upper–left cell for the results (Output Range) and then select any options. Figure 10-7 shows a frequency distribution (and chart) created with the Histogram option.

A frequency distribution and chart generated by the Analysis ToolPak's Histogram option.

Figure 10-7. A frequency distribution and chart generated by the Analysis ToolPak's Histogram option.

Warning

A potential problem with using this technique is that the frequency distribution consists of values, not formulas. Therefore, if you make any changes to your input data, you need to rerun the Histogram procedure to update the results.

Using Adjustable Bins to Create a Histogram

Figure 10-8 shows a worksheet with student grades listed in column B (67 students total). Columns D and E contain formulas that calculate the upper and lower limits for bins, based on the entry in cell E1 (named BinSize). For example, if BinSize is 10 (as in the figure), each bin contains 10 scores (1-10, 11-20, and so on).

The chart uses two named formulas. The name Categories is defined as follows:

=OFFSET(Sheet3!$E$4, 0,0, ROUNDUP(100/BinSize, 0))
The chart displays a histogram; the contents of cell E1 determine the number of categories.

Figure 10-8. The chart displays a histogram; the contents of cell E1 determine the number of categories.

The name Frequencies is defined as follows:

=OFFSET(Sheet3!$F$4, 0,0, ROUNDUP(100/BinSize, 0))

The net effect is that the chart adjusts automatically when you change the BinSize cell. Figure 10-9 shows the chart when the bin size is 6.

The previous chart, after changing the bin size.

Figure 10-9. The previous chart, after changing the bin size.

Note

See Chapter 7 for more about creating charts that use named formulas in their SERIES formulas.

Plotting a Normal Curve

Figure 10-10 shows two scatter charts that display a normal distribution and the cumulative normal distribution. The top chart uses the data in columns A and B. The bottom chart uses the data in columns A and C. Cell B1, named Mean, controls the mean of the distribution, and cell B2, named SD, controls the standard deviation.

The scatter charts display the normal distribution and the cumulative normal distribution.

Figure 10-10. The scatter charts display the normal distribution and the cumulative normal distribution.

Note

The examples in this section (plus several other examples) are available on the companion CD–ROM. The filename is normal distribution.xlsx.

Column A contains formulas that generate values ranging from —3 SD units to +3 SD units. The formula in cell A5, which was copied to the cells below, is as follows:

=(–SD*3)+Mean

Column B contains formulas that generate the height of the curve for a given mean and standard deviation. The formula in cell B5 is as follows:

=NORMDIST(A5, Mean,SD, FALSE)

The formulas in column C also use the NORMDIST function, but the fourth argument is set to TRUE. The formula in C5 is as follows:

=NORMDIST(A5, Mean,SD, TRUE)

In some cases, you may want to compare a histogram created from your data with the theoretical normal distribution. Figure 10-11 shows an example of how this can be done. The chart is a combination chart with two value axes. The data consists of 2, 600 data points in column A. Simple formulas in column D calculate key statistics for the data.

This combination chart displays a histogram (columns) along with the normal distribution curve.

Figure 10-11. This combination chart displays a histogram (columns) along with the normal distribution curve.

The histogram is generated from the data in column G, using the FREQUENCY function (see the section "Using the FREQUENCY Function," earlier in this chapter). The normal distribution curve uses the data in column H. The formula in cell H2, which is copied to the cells below, is as follows:

=NORMDIST(F2,$D$4,$D$5, FALSE)

The companion CD–ROM contains another example that applies a scaling factor to the theoretical values. The theoretical data is multiplied by the number of data points (2, 600) times the bin size (6). After this transformation, both data series can use a single value axis.

Plotting Z–Scores with Standard Deviation Bands

Figure 10-12 shows a scatter chart that plots 500 values. Each data point in column A is converted to a z–score (column B), and these values are used in the chart. A z–score is a way of standardizing data, such that the transformed data has a mean of 0 and a standard deviation of 1. The midpoint on the vertical axis corresponds to the average data value, and the gridlines correspond to standard deviation units.

This combination chart uses a bar chart series to display horizontal bands that correspond to standard deviation units.

Figure 10-12. This combination chart uses a bar chart series to display horizontal bands that correspond to standard deviation units.

Note

This example is available on the companion CD–ROM. The filename is z–score plot.xlsx.

Formulas calculate the mean and standard deviation of the data, and these cells are given names (Mean and SD). The z–score calculation is done with simple formulas. Cell B2, for example, contains this formula:

=(A2–Mean)/SD

Because the chart plots transformed data, the chart can be used for any data set without modification.

Calculating the Area under a Curve

If you use a scatter chart to generate a curve, you may need to calculate the area under the curve. I start with an elementary example, shown in Figure 10-13. The gridlines in this chart are separated by one unit, so calculating the area under this curve can be done manually. It consists of 10.5 square units (nine complete squares plus three half–squares).

Calculating the area under this curve is a no–brainer.

Figure 10-13. Calculating the area under this curve is a no–brainer.

Note

The examples in this section are available on the companion CD–ROM. The filename is area under a curve.xlsx.

If you don't feel like counting squares, you can take a more calculated approach and use formulas to calculate the area under a curve. This is known as the "trapezoid" method. A trapezoid, as you may recall, is a four–sided figure with two parallel sides. This method essentially divides the area under the curve into a series of trapezoids and then calculates the area of each one. The area under the curve is the sum of the trapezoid areas.

To calculate the area of a trapezoid, multiply the "average" height by the base. In the preceding example (Figure 10-13), the left side of the first trapezoid has a height of 1 and the right side has a height of 2. The average height is 1.5. The base is one unit, so the area of the first trapezoid is 1.5. The area of the second trapezoid is 2, and so on.

The formulas in column C calculate the area for each trapezoid. Cell C2, for example, contains the following formula:

=((B2+B3)/2)*(A3–A2)

This formula is copied down to accommodate the number of data points. Note that the last cell (cell C7) is empty. That's because each formula refers to the subsequent row, and the formula is not valid for the last row of data. The formula in 99 simply adds these segment areas together.

This formula works fine — except when negative values are involved. In such a case, the formula gets much more complex because triangles (as well as trapezoids) enter the picture. The curve shown in Figure 10-14 presents more of a challenge because it has negative values. When using the previous formula to calculate trapezoid areas for this chart, the result is 3.5, which is clearly incorrect.

Calculating the area under this curve requires complex formulas.

Figure 10-14. Calculating the area under this curve requires complex formulas.

When negative numbers are involved, a more complex formula is required. The following formula (entered in cell C2) is a general–purpose formula that works in all situations:

=IF(B2*B3>=0, ABS(((B2+B3)/2)*(A3–A2)),ABS(((B2^2+B3^2)/(B2–B3)/2)*(A3–A2)))

The formula uses an IF function that determines whether the calculation returns the area of a trapezoid or the area of two triangles. In this example, the formula is used four times to yield the final result. The first, second, and third calculations compute the area of a trapezoid. The fourth calculation, however, computes the area of the two triangles that result from the line crossing the x–axis. The sum of the areas of these two triangles is 0.83. The total area under the curve is 4.83.

It's important to understand that the area calculation is approximate. Generally, the accuracy of the calculation increases with the number of data points that define the curve. Figure 10-15 shows three charts, all of which plot a sine curve. The charts vary, however, in the number of data points used and, subsequently, in the number of area calculations performed. The calculated area under the curve ranges from 220.01 to 229.16, the latter being the most accurate.

Calculating the area under a sine curve, with varying numbers of data points.

Figure 10-15. Calculating the area under a sine curve, with varying numbers of data points.

Creating a Box Plot

A box plot (sometimes known as a quartile plot or a box and whisker plot) is often used to summarize data. Figure 10-16 shows a box plot created for four groups of subjects. Each group has a diagram, the height of which represents the numerical range of the data (minimum and maximum values). The "boxes" represent the 25th through the 75th percentile. The horizontal line inside the box is the median value (or 50th percentile). This type of chart enables the viewer to make quick comparisons among groups of data.

Note

The example in this section is available on the companion CD–ROM. The CD contains an additional example that creates a box plot using a line chart with high–low lines and up/down bars. The filename is box plot.xlsx.

The raw data appears in columns A:D. The data is summarized in range F1:J7, with simple formulas. The following table lists the formulas for Group 1 (G3:G7). These formulas were copied to the three columns to the right.

This box plot summarizes the data in columns A through D.

Figure 10-16. This box plot summarizes the data in columns A through D.

Cell

Calculation

Formula

G3

Minimum

=MIN(A$2:A$26)–

G4

25th Percentile

=PERCENTILE(A$2:A$26, 0.25)

G5

Median

=MEDIAN(A$2:A$26)

G6

75th Percentile

=PERCENTILE(A$2:A$26, 0.75)

G7

Maximum

=MAX(A$2:A$26)

The summary data must be transformed to create the box plot. This transformation is done in the section labeled Chart Data (F11:J14). This range contains simple formulas that calculate the difference between the row values in the Summary Range section. For example, the formula in cell G12 is as follows:

=G4–G3

Follow these steps to create the box plot:

  1. Select the range F10:J14 and create a stacked column chart.

    Notice that Series 5 is not included in the range select. However, this range is used in Step 5.

  2. Select Series 1 and set the border color to No Line and the fill to No Fill.

  3. Select Series 2 and set the border color to No Line and the fill to No Fill.

  4. With Series 2 still selected, choose Chart Tools

    This box plot summarizes the data in columns A through D.
  5. Select Series 4 and choose Chart Tools

    This box plot summarizes the data in columns A through D.

You'll probably want to delete the legend because it provides no meaningful information.

The only two bars that remain visible are Series 3 and Series 4. Vertical error bars extend to cover the space occupied by the hidden Series 2 and Series 5 (which is not plotted). You can adjust the gap width to adjust the width of the boxes.

You can also create a horizontal box plot by starting with a stacked bar chart and using the same series of steps.

Plotting Mathematical Functions

The examples in the following sections demonstrate how to plot mathematical functions that use one variable (a 2–D line chart) and two variables (a 3–D surface chart).

Note

The examples make use of Excel's Data Table feature, which enables you to evaluate a formula using varying input values. Coverage of this feature is beyond the scope of this book. Excel's Help provides a good overview.

Plotting Functions with One Variable

A scatter chart is useful for plotting various mathematical and trigonometric functions. For example, Figure 10-17 shows a plot of the SIN function. The chart plots y for values of x (expressed in radians) from −5 to +5 in increments of 0.5. Each pair of x and y values appears as a data point in the chart, and the points connect with a line.

Note

This workbook, named function plots.xlsx, is available on the companion CD–ROM.

This chart plots the SIN(x).

Figure 10-17. This chart plots the SIN(x).

The function is expressed as follows:

y = SIN(x)

The corresponding formula in cell B2 (which is copied to the cells below) is as follows:

=SIN(A2)

Figure 10-18 shows a general–purpose, single–variable plotting application. The data for the chart is calculated by a data table in columns I:J and is not shown in the figure.

Note

This example, named function plot 2D.xlsx, is available on the companion CD–ROM.

A general–purpose, single–variable plotting workbook.

Figure 10-18. A general–purpose, single–variable plotting workbook.

To use this application, follow these steps:

  1. Enter a formula in cell B7. The formula should contain at least one x variable. In the figure, the formula in cell B3 is as follows:

    =SIN(PI()*x)*(PI()*x)
  2. Enter the minimum value for x in cell B8.

  3. Enter the maximum value for x in cell B9.

The formula in cell B7 displays the value of y for the minimum value of x. The data table, however, evaluates the formula for 200 equally spaced values of x, and these values appear in the chart. In addition, the chart's title displays the function that's plotted.

Plotting Functions with Two Variables

The preceding section describes how to plot functions that use a single variable (x). You also can plot functions that use two variables. For example, the following function calculates a value of z for various values of two variables (x and y):

z = SIN(x)*COS(y)

Figure 10-19 shows a surface chart that plots the value of z for 21 x values ranging from 2 to 5 (in 0.15 increments), and for 21 y values ranging from −3 to 0 (also in 0.15 increments).

Using a surface chart to plot a function with two variables.

Figure 10-19. Using a surface chart to plot a function with two variables.

Figure 10-20 shows a general–purpose, two–variable plotting application, similar to the 2–D function plot workbook described in the previous section. The data for the chart is a 25 × 25 data table range in columns K:AL (not shown in the figure).

Note

This example, named function plot 3D.xlsm, is available on the companion CD–ROM. The workbook contains VBA macros to assist in rotating the chart.

A general–purpose, two–variable plotting workbook.

Figure 10-20. A general–purpose, two–variable plotting workbook.

Follow these steps to use this application:

  1. Enter a formula in cell B3. The formula should contain at least one x variable and at least one y variable. In the figure, the formula in cell B3 is as follows:

    =SIN(x)*COS(y*x)
  2. Enter the minimum x value in cell B4 and the maximum x value in cell B5.

  3. Enter the minimum y value in cell B6 and the maximum y value in cell B7.

The formula in cell B3 displays the value of z for the minimum values of x and y. The data table evaluates the formula for 25 equally spaced values of x and 25 equally spaced values of y. These 625 values are plotted in the surface chart.

Creating a 3–D Scatter Plot

One of the most requested chart types for Excel is a 3–D scatter plot. Unfortunately, however, Microsoft has not paid attention to these requests. This type of chart is like an XY scatter chart but with an added "depth" dimension (Z).

Andy Pope, a charting expert from the U.K., sent me an interesting workbook that simulates a 3–D scatter plot. Each data point is entered as three coordinate values (x, y, and z). Formulas then transform the data to make it appear in a 3–D space. The chart includes additional series to display the x–, y–, and z–axes, as well as the cube that encompasses the 3–D space.

I took Andy's idea and spent many hours refining it and (I hope) making it easier to understand. The result, shown in Figure 10-21, has quite a few accoutrements. The three scroll bars control the angle of each of the three axes, and the buttons perform animations.

A simulated 3–D scatter plot.

Figure 10-21. A simulated 3–D scatter plot.

Note

This example, named 3D scatterplot.xlsm, is available on the companion CD–ROM. The workbook contains VBA macros to assist in rotating the chart and hiding various elements.

The x values range from 8.0 to 43.4. The y values are calculated by this formula:

=SIN(A3)/(A3*0.5)

The z values are generated by this formula:

=COS(A3)/(A3*0.5)

The resulting curve is a spiraling corkscrew.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset