Chapter 18. Performing Magic with Array Formulas

<feature><title>In This Chapter</title> </feature>

The preceding chapter provides an introduction to arrays and array formulas and presented some basic examples to whet your appetite. This chapter continues the saga and provides many useful examples that further demonstrate the power of this feature.

I selected the examples in this chapter to provide a good assortment of the various uses for array formulas. You can use most of them as-is. You will, of course, need to adjust the range names or references used. Also, you can modify many of the examples easily to work in a slightly different manner.

Working with Single-Cell Array Formulas

As I describe in the preceding chapter, you enter single-cell array formulas into a single cell (not into a range of cells). These array formulas work with arrays contained in a range or that exist in memory. This section provides some additional examples of such array formulas.

On the CD-ROM

The examples in this section are available on the companion CD-ROM. The file is named single-cell array formulas.xlsx.

Summing a range that contains errors

You may have discovered that the SUM function doesn’t work if you attempt to sum a range that contains one or more error values (such as #DIV/0! or #N/A). Figure 18.1 shows an example. The formula in cell C11 returns an error value because the range that it sums (C4:C10) contains errors.

An array formula can sum a range of values, even if the range contains errors.

Figure 18.1. An array formula can sum a range of values, even if the range contains errors.

The following array formula, in cell C13, overcomes this problem and returns the sum of the values, even if the range contains error values:

{=SUM(IF(ISERROR(C4:C10),"",C4:C10))}

This formula works by creating a new array that contains the original values but without the errors. The IF function effectively filters out error values by replacing them with an empty string. The SUM function then works on this “filtered” array. This technique also works with other functions, such as AVERAGE, MIN, and MAX.

New Feature

If only Excel 2007 users will use your worksheet, you can use this more efficient version, which uses the new IFERROR function:

{=SUM(IFERROR(G1:G7,""))}

Counting the number of error values in a range

The following array formula is similar to the previous example, but it returns a count of the number of error values in a range named Data:

{=SUM(IF(ISERROR(Data),1,0))}

This formula creates an array that consists of 1s (if the corresponding cell contains an error) and 0s (if the corresponding cell does not contain an error value).

You can simplify the formula a bit by removing the third argument for the IF function. If this argument isn’t specified, the IF function returns FALSE if the condition is not satisfied (that is, the cell does not contain an error value). In this context, Excel treats FALSE as a 0 value. The array formula shown here performs exactly like the previous formula, but it doesn’t use the third argument for the IF function:

{=SUM(IF(ISERROR(Data),1))}

Actually, you can simplify the formula even more:

{=SUM(ISERROR(Data)*1)}

This version of the formula relies on the fact that:

TRUE * 1 = 1

and

FALSE * 1 = 0

Summing the n largest values in a range

The following array formula returns the sum of the 10 largest values in a range named Data:

{=SUM(LARGE(Data,ROW(INDIRECT("1:10"))))}

The LARGE function is evaluated 10 times, each time with a different second argument (1, 2, 3, and so on up to 10). The results of these calculations are stored in a new array, and that array is used as the argument for the SUM function.

To sum a different number of values, replace the 10 in the argument for the INDIRECT function with another value.

If the number of cells to sum is contained in cell C17, use the following array formula, which uses the concatenation operator (&) to create the range address for the INDIRECT function:

{=SUM(LARGE(Data,ROW(INDIRECT("1:"&C17))))}

To sum the n smallest values in a range, use the SMALL function instead of the LARGE function.

Computing an average that excludes zeros

Figure 18.2 shows a simple worksheet that calculates average sales. The formula in cell B14 is

=AVERAGE(B5:B12)
The calculated average includes cells that contain a 0.

Figure 18.2. The calculated average includes cells that contain a 0.

Two of the sales staff had the week off, however, so including their 0 sales in the calculated average doesn’t accurately describe the average sales per representative.

Note

The AVERAGE function ignores blank cells, but it does not ignore cells that contain 0.

The following array formula returns the average of the range but excludes the cells containing 0:

{=AVERAGE(IF(B5:B12<>0,B5:B12))}

This formula creates a new array that consists only of the nonzero values in the range. The AVERAGE function then uses this new array as its argument.

You also can get the same result with a regular (non-array) formula:

=SUM(B5:B12)/COUNTIF(B5:B12,"<>0")

This formula uses the COUNTIF function to count the number of nonzero values in the range. This value is divided into the sum of the values.

Determining whether a particular value appears in a range

To determine whether a particular value appears in a range of cells, you can choose Edit Determining whether a particular value appears in a range Find and do a search of the worksheet. But you also can make this determination by using an array formula.

Figure 18.3 shows a worksheet with a list of names in A5:E24 (named NameList). An array formula in cell D3 checks the name entered into cell C3 (named TheName). If the name exists in the list of names, the formula displays the text Found. Otherwise, it displays Not Found.

Using an array formula to determine whether a range contains a particular value.

Figure 18.3. Using an array formula to determine whether a range contains a particular value.

The array formula in cell D3 is

{=IF(OR(TheName=NameList),"Found","Not Found")}

This formula compares TheName to each cell in the NameList range. It builds a new array that consists of logical TRUE or FALSE values. The OR function returns TRUE if any one of the values in the new array is TRUE. The IF function uses this result to determine which message to display.

A simpler form of this formula follows. This formula displays TRUE if the name is found and returns FALSE otherwise.

{=OR(TheName=NameList)}

Yet another approach uses the COUNTIF function in a non-array formula:

=IF(COUNTIF(NameList,TheName)>0,"Found","Not Found")

Counting the number of differences in two ranges

The following array formula compares the corresponding values in two ranges (named MyData and YourData) and returns the number of differences in the two ranges. If the contents of the two ranges are identical, the formula returns 0.

{=SUM(IF(MyData=YourData,0,1))}

Note

The two ranges must be the same size and of the same dimensions.

This formula works by creating a new array of the same size as the ranges being compared. The IF function fills this new array with 0s and 1s. (0 if a difference is found, and 1 if the corresponding cells are the same.) The SUM function then returns the sum of the values in the array.

The following formula, which is simpler, is another way of calculating the same result:

{=SUM(1*(MyData<>YourData))}

This version of the formula relies on the fact that:

TRUE * 1 = 1

and

FALSE * 1 = 0

Returning the location of the maximum value in a range

The following array formula returns the row number of the maximum value in a single-column range named Data:

{=MIN(IF(Data=MAX(Data),ROW(Data), ""))}

The IF function creates a new array that corresponds to the Data range. If the corresponding cell contains the maximum value in Data, the array contains the row number; otherwise, it contains an empty string. The MIN function uses this new array as its second argument, and it returns the smallest value, which corresponds to the row number of the maximum value in Data.

If the Data range contains more than one cell that has the maximum value, the row of the first maximum cell is returned.

The following array formula is similar to the previous one, but it returns the actual cell address of the maximum value in the Data range. It uses the ADDRESS function, which takes two arguments: a row number and a column number.

{=ADDRESS(MIN(IF(Data=MAX(Data),ROW(Data), "")),COLUMN(Data))}

The previous formulas work only with a single-column range. The following variation works with any sized range and returns the address of the smallest value in the range named Data:

{=ADDRESS(MIN(IF(Data=MAX(data),ROW(Data), "")),
MIN(IF(Data=MAX(data),COLUMN(Data), "")))}

Finding the row of a value’s nth occurrence in a range

The following array formula returns the row number within a single-column range named Data that contains the nth occurrence of the value in a cell named Value:

{=SMALL(IF(Data=Value,ROW(Data), ""),n)}

The IF function creates a new array that consists of the row number of values from the Data range that are equal to Value. Values from the Data range that aren’t equal to Value are replaced with an empty string. The SMALL function works on this new array and returns the nth smallest row number.

The formula returns #NUM! if the Value is not found or if n exceeds the number of the values in the range.

Returning the longest text in a range

The following array formula displays the text string in a range (named Data) that has the most characters. If multiple cells contain the longest text string, the first cell is returned.

{=INDEX(Data,MATCH(MAX(LEN(Data)),LEN(Data),FALSE),1)}

This formula works with two arrays, both of which contain the length of each item in the Data range. The MAX function determines the largest value, which corresponds to the longest text item. The MATCH function calculates the offset of the cell that contains the maximum length. The INDEX function returns the contents of the cell containing the most characters. This function works only if the Data range consists of a single column.

Determining whether a range contains valid values

You may have a list of items that you need to check against another list. For example, you may import a list of part numbers into a range named MyList, and you want to ensure that all the part numbers are valid. You can do so by comparing the items in the imported list to the items in a master list of part numbers (named Master).

The following array formula returns TRUE if every item in the range named MyList is found in the range named Master. Both ranges must consist of a single column, but they don’t need to contain the same number of rows.

{=ISNA(MATCH(TRUE,ISNA(MATCH(MyList,Master,0)),0))}

The array formula that follows returns the number of invalid items. In other words, it returns the number of items in MyList that do not appear in Master.

{=SUM(1*ISNA(MATCH(MyList,Master,0)))}

To return the first invalid item in MyList, use the following array formula:

{=INDEX(MyList,MATCH(TRUE,ISNA(MATCH(MyList,Master,0)),0))}

Summing the digits of an integer

I can’t think of any practical application for the example in this section, but it’s a good demonstration of the power of an array formula. The following array formula calculates the sum of the digits in a positive integer, which is stored in cell A1. For example, if cell A1 contains the value 409, the formula returns 13 (the sum of 4, 0, and 9).

{=SUM(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)*1)}

To understand how this formula works, start with the ROW function, as shown here:

{=ROW(INDIRECT("1:"&LEN(A1)))}

This function returns an array of consecutive integers beginning with 1 and ending with the number of digits in the value in cell A1. For example, if cell A1 contains the value 409, the LEN function returns 3, and the array generated by the ROW functions is

{1,2,3}

Cross-Ref

For more information about using the INDIRECT function to return this array, see Chapter 17.

This array is then used as the second argument for the MID function. The MID part of the formula, simplified a bit and expressed as values, is the following:

{=MID(409,{1,2,3},1)*1}

This function generates an array with three elements:

{4,0,9}

By simplifying again and adding the SUM function, the formula looks like this:

{=SUM({4,0,9})}

This formula produces the result of 13.

Note

The values in the array created by the MID function are multiplied by 1 because the MID function returns a string. Multiplying by 1 forces a numeric value result. Alternatively, you can use the VALUE function to force a numeric string to become a numeric value.

Notice that the formula doesn’t work with a negative value because the negative sign is not a numeric value. The following formula solves this problem by using the ABS function to return the absolute value of the number. Figure 18.4 shows a worksheet that uses this formula in cell B4.

{=SUM(VALUE(MID(ABS(A4),ROW(INDIRECT("1:"&LEN(ABS(A4)))),1)))}
An array formula calculates the sum of the digits in an integer.

Figure 18.4. An array formula calculates the sum of the digits in an integer.

The formula was copied down to calculate the sum of the digits for other values in column A.

Summing rounded values

Figure 18.5 shows a simple worksheet that demonstrates a common spreadsheet problem: rounding errors. As you can see, the grand total in cell E7 appears to display an incorrect amount. (That is, it’s off by a penny.) The values in column E use a number format that displays two decimal places. The actual values, however, consist of additional decimal places that do not display due to rounding (as a result of the number format). The net effect of these rounding errors is a seemingly incorrect total. The total, which is actually $168.320997, displays as $168.32.

Using an array formula to correct rounding errors.

Figure 18.5. Using an array formula to correct rounding errors.

The following array formula creates a new array that consists of values in column E, rounded to two decimal places:

{=SUM(ROUND(E4:E6,2))}

This formula returns $168.31.

You also can eliminate these types of rounding errors by using the ROUND function in the formula that calculates each row total in column E (which does not require an array formula).

Summing every nth value in a range

Suppose that you have a range of values and you want to compute the sum of every third value in the list—the first, the fourth, the seventh, and so on. One solution is to hard-code the cell addresses in a formula. But a better solution is to use an array formula.

Note

In Figure 18.6, the values are stored in a range named Data, and the value of n is in cell D2 (named n).

An array formula returns the sum of every nth value in the range.

Figure 18.6. An array formula returns the sum of every nth value in the range.

The following array formula returns the sum of every nth value in the range:

{SUM(IF(MOD(ROW(INDIRECT("1:"&COUNT(Data)))-1,n)=0,Data,""))}

This formula returns 70, which is the sum of every third value in the range.

This formula generates an array of consecutive integers, and the MOD function uses this array as its first argument. The second argument for the MOD function is the value of n. The MOD function creates another array that consists of the remainders when each row number is divided by n. When the array item is 0 (that is, the row is evenly divisible by n), the corresponding item in the Data range will be included in the sum.

You find that this formula fails when n is 0 (that is, when it sums no items). The modified array formula that follows uses an IF function to handle this case:

{=IF(n=0,0,SUM(IF(MOD(ROW(INDIRECT("1:"&COUNT(data)))-
1,n)=0,data,"")))}

This formula works only when the Data range consists of a single column of values. It does not work for a multicolumn range or for a single row of values.

To make the formula work with a horizontal range, you need to transpose the array of integers generated by the ROW function. Excel’s TRANPOSE function is just the ticket. The modified array formula that follows works only with a horizontal Data range:

{=IF(n=0,0,SUM(IF(MOD(TRANSPOSE(ROW(INDIRECT("1:"&COUNT(Data))))-
1,n)=0,Data,"")))}

Removing non-numeric characters from a string

The following array formula extracts a number from a string that contains text. For example, consider the string ABC145Z. The formula returns the numeric part, 145.

{=MID(A1,MATCH(0,(ISERROR(MID(A1,ROW(INDIRECT
("1:"&LEN(A1))),1)*1)*1),0),LEN(A1)-SUM((ISERROR
(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)*1)*1)))}

This formula works only with a single embedded number. For example, it fails with a string like X45Z99 because the string contains two embedded numbers.

Determining the closest value in a range

The array formula that follows returns the value in a range named Data that is closest to another value (named Target):

{=INDEX(Data,MATCH(SMALL(ABS(Target-Data),1),ABS(Target-Data),0))}

If two values in the Data range are equidistant from the Target value, the formula returns the first one in the list. Figure 18.7 shows an example of this formula. In this case, the Target value is 45. The array formula in cell D5 returns 48—the value closest to 45.

An array formula returns the closest match.

Figure 18.7. An array formula returns the closest match.

Returning the last value in a column

Suppose that you have a worksheet that you update frequently by adding new data to columns. You may need a way to reference the last value in column A (the value most recently entered). If column A contains no empty cells, the solution is relatively simple and doesn’t require an array formula:

=OFFSET(A1,COUNTA(A:A)-1,0)

This formula uses the COUNTA function to count the number of nonempty cells in column A. This value (minus 1) is used as the second argument for the OFFSET function. For example, if the last value is in row 100, COUNTA returns 100. The OFFSET function returns the value in the cell 99 rows down from cell A1 in the same column.

If column A has one or more empty cells interspersed, which is frequently the case, the preceding formula won’t work because the COUNTA function doesn’t count the empty cells.

The following array formula returns the contents of the last nonempty cell in the first 500 rows of column A:

{=INDEX(A1:A500,MAX(ROW(A1:A500)*(A1:A500<>"")))}

You can, of course, modify the formula to work with a column other than column A. To use a different column, change the four column references from A to whatever column you need. If the last nonempty cell occurs in a row beyond row 500, you need to change the two instances of 500 to a larger number. The fewer rows referenced in the formula, the faster the calculation speed.

Caution

You can[‘t use this formula, as written, in the same column with which it’s working. Attempting to do so generates a circular reference. You can, however, modify it. For example, to use the function in cell A1, change the references so that they begin with row 2 instead of row 1.

Returning the last value in a row

The following array formula is similar to the previous formula, but it returns the last nonempty cell in a row (in this case, row 1):

{=INDEX(1:1,MAX(COLUMN(1:1)*(1:1<>"")))}

To use this formula for a different row, change the 1:1 reference to correspond to the row.

Ranking data with an array formula

Often, computing the rank orders for the values in a range of data is helpful. If you have a worksheet containing the annual sales figures for 20 salespeople, for example, you may want to know how each person ranks, from highest to lowest.

If you’ve used the Excel program’s RANK function, you may have noticed that the ranks produced by this function don’t handle ties the way that you may like. For example, if two values are tied for third place, the RANK function gives both of them a rank of 3. You may prefer a commonly-used approach that assigns each an average (or midpoint) of the ranks—in other words, a rank of 3.5 for both values tied for third place.

Figure 18.8 shows a worksheet that uses two methods to rank a column of values (named Sales). The first method (column C) uses the Excel RANK function. Column D uses array formulas to compute the ranks.

Ranking data with the Excel program’s RANK function and with array formulas.

Figure 18.8. Ranking data with the Excel program’s RANK function and with array formulas.

The following is the array formula in cell D5:

{=SUM(1*(B5<=Sales))-(SUM(1*(B5=Sales))-1)/2}

This formula is copied to the cells below it.

Note

Each ranking is computed with a separate array formula, not with an array formula entered into multiple cells.

Each array function works by computing the number of higher values and subtracting one half of the number of equal values minus 1.

Working with Multicell Array Formulas

The preceding chapter introduced array formulas entered into multicell ranges. In this section, I present a few more array multicell formulas. Most of these formulas return some or all of the values in a range, but rearranged in some way.

On the CD-ROM

The examples in this section are available on the companion CD-ROM. The file is named multi-cell array formulas.xlsx.

Returning only positive values from a range

The following array formula works with a single-column vertical range (named Data). The array formula is entered into a range that’s the same size as Data and returns only the positive values in the Data range. (Zeroes and negative numbers are ignored.)

{=INDEX(Data,SMALL(IF(Data>0,ROW(INDIRECT("1:"&ROWS(Data)))),
ROW(INDIRECT("1:"&ROWS(Data)))))}

As you can see in Figure 18.9, this formula works, but not perfectly. The Data range is A5:A24, and the array formula is entered into C5:C24. However, the array formula displays #NUM! error values for cells that don’t contain a value.

Using an array formula to return only the positive values in a range.

Figure 18.9. Using an array formula to return only the positive values in a range.

This modified array formula, entered into range E5:E24, use the IFERROR function to avoid the error value display:

{=IFERROR(INDEX(Data,SMALL(IF(Data>0,ROW(INDIRECT("1:"&ROWS(Data)))),RO
W(INDIRECT("1:"&ROWS(Data))))),"")}

The IFERROR function is new to Excel 2007. For compatibility with older versions, use this formula:

{=IF(ISERR(SMALL(IF(Data>0,ROW(INDIRECT("1:"&ROWS(Data)))),
ROW(INDIRECT("1:"&ROWS(Data))))),"",INDEX(Data,SMALL(IF
(Data>0,ROW(INDIRECT("1:"&ROWS(Data)))),ROW(INDIRECT
("1:"&ROWS(Data))))))}

Returning nonblank cells from a range

The following formula is a variation on the formula in the preceding section. This array formula works with a single-column vertical range named Data. The array formula is entered into a range of the same size as Data and returns only the nonblank cell in the Data range.

{=IFERROR(INDEX(Data,SMALL(IF(Data<>"",ROW(INDIRECT("1:"&ROWS(Data)))),
ROW(INDIRECT("1:"&ROWS(Data))))),"")}

For compatibility with versions prior to Excel 2007, use this formula:

{=IF(ISERR(SMALL(IF(Data<>"",ROW(INDIRECT("1:"&ROWS(Data)))),
ROW(INDIRECT("1:"&ROWS(Data))))),"",INDEX(Data,SMALL(IF(Data
<>"",ROW(INDIRECT("1:"&ROWS(Data)))),ROW(INDIRECT("1:"&ROWS
(Data))))))}

Returning a list of unique items in a range

If you have a single-column range named Data, the following array formula returns a list of the unique items in the range (the list with no duplicated items):

{=INDEX(Data,SMALL(IF(MATCH(Data,Data,0)=ROW(INDIRECT
("1:"&ROWS(Data))),MATCH(Data,Data,0),""),ROW(INDIRECT
("1:"&ROWS(Data)))))}

This formula doesn’t work if the Data range contains any blank cells. The unfilled cells of the array formula display #NUM!.

The following modified version eliminates the #NUM!. display by using the Excel 2007 IFERROR function.

{=IFERROR(INDEX(Data,SMALL(IF(MATCH(Data,Data,0)=ROW(INDIRECT
("1:"&ROWS(data))),MATCH(Data,Data,0),""),ROW(INDIRECT
("1:"&ROWS(Data))))),"")}

Figure 18.10 shows an example. Range A5:A23 is named Data, and the array formula is entered into range C5:C23. Range E5:E23 contains the array formula that uses the IFERROR function.

Using an array formula to return unique items from a list.

Figure 18.10. Using an array formula to return unique items from a list.

Displaying a calendar in a range

Figure 18.11 shows the results of one of my favorite multicell array formulas, a “live” calendar displayed in a range of cells. If you change the date at the top, the calendar recalculates to display the dates for the month and year.

Displaying a calendar by using a single array formula.

Figure 18.11. Displaying a calendar by using a single array formula.

On the CD-ROM

This workbook is available on the companion CD-ROM. The file is named array formula calendar.xlsx.

To create this calendar in the range B2:H9, follow these steps:

  1. Select B2:H2 and merge the cells by choosing Home On the CD-ROM Alignment On the CD-ROM Merge & Center.

  2. Enter a date into the merged range. The day of the month isn’t important.

  3. Enter the abbreviated day names in the range B3:H3.

  4. Select B4:H9 and enter this array formula. Remember, to enter an array formula, use Ctrl+Shift+Enter (not just Enter).

    {=IF(MONTH(DATE(YEAR(B2),MONTH(B2),1))<>MONTH(DATE(YEAR(B2),
    MONTH(B2),1)-(WEEKDAY(DATE(YEAR(B2),MONTH(B2),1))-1)+
    {0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1),"",
    DATE(YEAR(B2),MONTH(B2),1)-(WEEKDAY(DATE(YEAR(B2),MONTH(B2),1))-1)+
    {0;1;2;3;4;5}*7+{1,2,3,4,5,6,7}-1)}
  5. Format the range B4:H9 to use this custom number format: d. This step formats the dates to show only the day. Use the Custom category in the Number tab of the Format Cells dialog box to specify this custom number format.

  6. Adjust the column widths and format the cells as you like.

Change the month and year in cell B2, and the calendar will update automatically. After creating this calendar, you can copy the range to any other worksheet or workbook.

The array formula actually returns date values, but the cells are formatted to display only the day portion of the date. Also, notice that the array formula uses array constants.

Cross-Ref

See Chapter 17 for more information about array constants.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset