Figure 5-1: Each element of a character vector is a bit of text, also known as a string.
Chapter 5
Getting Started with Reading and Writing
In This Chapter
Representing textual data with character vectors
Working with text
Creating, converting, and working with factors
It’s not for no reason that reading and writing are considered to be two of the three Rs in elementary education (reading, ’riting, and ’rithmetic). In this chapter, you get to work with words in R.
You assign text to variables. You manipulate these variables in many different ways, including finding text within text and concatenating different pieces of text into a single vector. You also use R functions to sort text and to find words in text with some powerful pattern search functions, called regular expressions. Finally, you work with factors, the R way of representing categories (or categorical data, as statisticians call it).
Using Character Vectors for Text Data
Text in R is represented by character vectors. A character vector is — you guessed it! — a vector consisting of characters. In Figure 5-1, you can see that each element of a character vector is a bit of text.
Figure 5-1: Each element of a character vector is a bit of text, also known as a string.
In this section, you take a look at how R uses character vectors to represent text. You assign some text to a character vector and get it to extract subsets of that data. You also get familiar with the very powerful concept of named vectors, vectors in which each element has a name. This is useful because you can then refer to the elements by name as well as position.
Assigning a value to a character vector
You assign a value to a character vector by using the assignment operator (<-
), the same way you do for all other variables. You test whether a variable is of class character
, for example, by using the is.character()
function as follows:
> x <- “Hello world!”
> is.character(x)
TRUE
Notice that x
is a character vector of length 1. To find out how many characters are in the text, use nchar
:
> length(x)
[1] 1
> nchar(x)
[1] 12
This function tells you that x
has length 1 and that the single element in x
has 12 characters.
Creating a character vector with more than one element
x <- c(“Hello”, “world!”)
> length(x)
[1] 2
> nchar(x)
[1] 5 6
Notice that this time, R tells you that your vector has length 2 and that the first element has five characters and the second element has six characters.
Extracting a subset of a vector
To illustrate how to work with vectors, and specifically how to create subsets, we use the built-in datasets letters
and LETTERS
. Both are character vectors consisting of the letters of the alphabet, in lowercase (letters
) and uppercase (LETTERS
). Try it:
> letters
[1] “a” “b” “c” “d” “e” “f” “g” “h” “i” “j” “k”
[12] “l” “m” “n” “o” “p” “q” “r” “s” “t” “u” “v”
[23] “w” “x” “y” “z”
> LETTERS
[1] “A” “B” “C” “D” “E” “F” “G” “H” “I” “J” “K”
[12] “L” “M” “N” “O” “P” “Q” “R” “S” “T” “U” “V”
[23] “W” “X” “Y” “Z”
Let’s return to the topic of creating subsets. To extract a specific element from a vector, use square brackets. To get the tenth element of letters
, for example, use the following:
> letters[10]
[1] “j”
To get the last three elements of LETTERS
, use the following:
> LETTERS[24:26]
[1] “X” “Y” “Z”
> tail(LETTERS, 5)
[1] “V” “W” “X” “Y” “Z”
Similarly, you can use the head()
function to get the first element of a variable. By default, both head()
and tail()
returns six elements, but you can tell it to return any specific number of elements in the second argument. Try extracting the first ten letters
:
> head(letters, 10)
[1] “a” “b” “c” “d” “e” “f” “g” “h” “i” “j”
Naming the values in your vectors
Until this point in the book, we’ve referred to the elements of vectors by their positions — that is, x[5]
refers to the fifth element in vector x
. One very powerful feature in R, however, gives names to the elements of a vector, which allows you to refer to the elements by name.
Looking at how named vectors work
To illustrate named vectors, take a look at the built-in dataset islands
, a named vector that contains the surface area of the world’s 48 largest land masses (continents and large islands). You can investigate its structure with str()
, as follows:
> str(islands)
Named num [1:48] 11506 5500 16988 2968 16 ...
- attr(*, “names”)= chr [1:48] “Africa” “Antarctica” “Asia” “Australia” ...
R reports the structure of islands
as a named vector with 48 elements. In the first line of the results of str()
, you see the values of the first few elements of islands
. On the second line, R reports that the named vector has an attribute containing names
and reports that the first few elements are “Africa”
, “Antarctica”
, “Asia”
, and “Australia”
.
Because each element in the vector has a value as well as a name, now you can subset the vector by name. To retrieve the sizes of Asia, Africa, and Antarctica, use the following:
> islands[c(“Asia”, “Africa”, “Antarctica”)]
Asia Africa Antarctica
16988 11506 5500
You use the names()
function to retrieve the names in a named vector:
> names(islands)[1:9]
[1] “Africa” “Antarctica” “Asia”
[4] “Australia” “Axel Heiberg” “Baffin”
[7] “Banks” “Borneo” “Britain”
This function allows you to do all kinds of interesting things. Imagine you wanted to know the names of the six largest islands. To do this, you would retrieve the names of islands
after sorting it in decreasing order:
> names(sort(islands, decreasing=TRUE)[1:6])
[1] “Asia” “Africa” “North America”
[4] “South America” “Antarctica” “Europe”
Creating and assigning named vectors
Imagine you want to create a named vector with the number of days in each month. First, create a numeric vector containing the number of days in each month. Then use the built-in dataset month.name
for the month names, as follows:
> month.days <- c(31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)
> names(month.days) <- month.name
> month.days
January February March April
31 28 31 30
May June July August
31 30 31 31
September October November December
30 31 30 31
Now you can use this vector to find the names of the months with 31 days:
> names(month.days[month.days==31])
[1] “January” “March” “May”
[4] “July” “August” “October”
[7] “December”
This technique works because you subset month.days
to return only those values for which month.days
equals 31
, and then you retrieve the names of the resulting vector.
Manipulating Text
When you have text, you need to be able to manipulate it, for example by splitting or combining words. You also may want to analyze your text to find out whether it contains certain keywords or patterns.
In this section, you work with the string splitting and concatenation functions of R. Concatenating (combining) strings is something that programmers do very frequently. For example, when you create a report of your results, it’s customary to combine descriptive text with the actual results of your analysis so that the reader of your results can easily digest it.
Finally, you start to work with finding words and patterns inside text, and you meet regular expressions, a powerful way of doing a wildcard search of text.
String theory: Combining and splitting strings
A collection of combined letters and words is called a string. Whenever you work with text, you need to be able to concatenate words (string them together) and split them apart. In R, you use the paste()
function to concatenate and the strsplit()
function to split. In this section, we show you how to use both functions.
Splitting text
First, create a character vector called pangram
, and assign it the value “The quick brown fox jumps over the lazy dog”
, as follows:
> pangram <- “The quick brown fox jumps over the lazy dog”
> pangram
[1] “The quick brown fox jumps over the lazy dog”
To split this text at the word boundaries (spaces), you can use strsplit()
as follows:
> strsplit(pangram, “ “)
[[1]]
[1] “The” “quick” “brown” “fox” “jumps” “over” “the” “lazy” “dog”
In the preceding example, this list has only a single element. Yes, that’s right: The list has one element, but that element is a vector.
To extract an element from a list, you have to use double square brackets. Split your pangram
into words, and assign the first element to a new variable called words
, using double-square-brackets ([[]]
) subsetting, as follows:
words <- strsplit(pangram, “ “)[[1]]
> words
[1] “The” “quick” “brown” “fox” “jumps” “over” “the” “lazy” “dog”
> unique(tolower(words))
[1] “the” “quick” “brown” “fox” “jumps” “over” “lazy”
[8] “dog”
Concatenating text
Now that you’ve split text, you can concatenate these elements so that they again form a single text string.
To concatenate text, you use the paste()
function:
paste(“The”, “quick”, “brown”, “fox”)
[1] “The quick brown fox”
By default, paste()
uses a blank space to concatenate the vectors. In other words, you separate elements with spaces. This is because paste()
takes an argument that specifies the separator. The default for the sep
argument is a space (“ “
) — it defaults to separating elements with a blank space, unless you tell it otherwise.
paste(c(“The”, “quick”, “brown”, “fox”))
[1] “The” “quick” “brown” “fox”
What’s happening here? Why doesn’t paste()
paste the words together? The reason is that, by using c()
, you passed a vector as a single argument to paste()
. The c()
function combines elements into a vector. By default, paste()
concatenates separate vectors — it doesn’t collapse elements of a vector.
For the same reason, paste(words)
results in the following:
[1] “The” “quick” “brown” “FOX” “jumps” “over” “the” “lazy” “DOG”
When you want to concatenate the elements of a vector by using paste()
, you use the collapse
argument, as follows:
paste(words, collapse=” “)
[1] “The quick brown FOX jumps over the lazy DOG”
The collapse
argument of paste
can take any character value. If you want to paste together text by using an underscore, use the following:
paste(words, collapse=”_”)
[1] “The_quick_brown_FOX_jumps_over_the_lazy_DOG”
You can use sep
and collapse
in the same paste
call. In this case, the vectors are first pasted with sep
and then collapsed with collapse
. Try this:
> paste(LETTERS[1:5], 1:5, sep=”_”, collapse=”---”)
[1] “A_1---B_2---C_3---D_4---E_5”
What happens here is that you first concatenate the elements of each vector with an underscore (that is, A_1
, B_2
, and so on), and then you collapse the results into a single string with ---
between each element.
Suppose that you have five objects, and you want to label them “sample 1”
, “sample 2”
, and so on. You can do this by passing a short vector with the value sample
and a long vector with the values 1:5
to paste()
. In this example, the shorter vector is repeated five times:
> paste(“Sample”, 1:5)
[1] “Sample 1” “Sample 2” “Sample 3” “Sample 4” “Sample 5”
Sorting text
What do league tables, telephone directories, dictionaries, and the index pages of a book have in common? They present data in some sorted manner. Data can be sorted alphabetically or numerically, in ascending or descending order. Like any programming language, R makes it easy to compile lists of sorted and ordered data.
Because text in R is represented as character vectors, you can sort these vectors using the same functions as you use with numeric data. For example, to get R to sort the alphabet in reverse, use the sort()
function:
> sort(letters, decreasing=TRUE)
[1] “z” “y” “x” “w” “v” “u” “t” “s” “r” “q” “p”
[12] “o” “n” “m” “l” “k” “j” “i” “h” “g” “f” “e”
[23] “d” “c” “b” “a”
Here you used the decreasing
argument of sort()
.
Try it on your vector words
that you created in the previous paragraph:
> sort(words)
[1] “brown” “DOG” “FOX” “jumps” “lazy”
[6] “over” “quick” “the” “The”
Beware of making any assumptions about the collation order: e.g., in Estonian, Z comes between S and T, and collation is not necessarily character-by-character — in Danish aa sorts as a single letter, after z.
In most cases, lexicographic sorting simply means that the sort order is independent of whether the string is in lowercase or uppercase. For more details, read the help text in ?sort
as well as ?Comparison
.
Finding text inside text
When you’re working with text, often you can solve problems if you’re able to find words or patterns inside text. Imagine you have a list of the states in the United States, and you want to find out which of these states contains the word New. Or, say you want to find out which state names consist of two words.
To solve the first problem, you need to search for individual words (in this case, the word New). And to solve the second problem, you need to search for multiple words. We cover both problems in this section.
Searching for individual words
To investigate this problem, you can use the built-in dataset states.names
, which contains — you guessed it — the names of the states of the United States:
> head(state.names)
[1] “Alabama” “Alaska” “Arizona”
[4] “Arkansas” “California” “Colorado”
Broadly speaking, you can find substrings in text in two ways:
By position: For example, you can tell R to get three letters starting at position 5.
By pattern: For example, you can tell R to get substrings that match a specific word or pattern.
A pattern is a bit like a wildcard. In some card games, you may use the Joker card to represent any other card. Similarly, a pattern in R can contain words or certain symbols with special meanings.
Searching by position
If you know the exact position of a subtext inside a text element, you use the substr()
function to return the value. To extract the subtext that starts at the third position and stops at the sixth position of state.name
, use the following:
> head(substr(state.name, start=3, stop=6))
[1] “abam” “aska” “izon” “kans” “lifo” “lora”
Searching by pattern
To find substrings, you can use the grep()
function, which takes two essential arguments:
pattern
: The pattern you want to find.
x
: The character vector you want to search.
Suppose you want to find all the states that contain the pattern New
. Do it like this:
> grep(“New”, state.name)
[1] 29 30 31 32
The result of grep()
is a numeric vector with the positions of each of the elements that contain the matching pattern. In other words, the 29th element of state.name
contains the word New.
> state.name[29]
New Hampshire
Phew, that worked! But typing in the position of each matching text is going to be a lot of work. Fortunately, you can use the results of grep()
directly to subset the original vector:
> state.name[grep(“New”, state.name)]
[1] “New Hampshire” “New Jersey”
[3] “New Mexico” “New York”
> state.name[grep(“new”, state.name)]
character(0)
Searching for multiple words
So, how do you find the names of all the states with more than one word? This is easy when you realize that you can frame the question by finding all those states that contain a space:
> state.name[grep(“ “, state.name)]
[1] “New Hampshire” “New Jersey”
[3] “New Mexico” “New York”
[5] “North Carolina” “North Dakota”
[7] “Rhode Island” “South Carolina”
[9] “South Dakota” “West Virginia”
The results include all the states that have two-word names, such as New Jersey, New York, North Carolina, South Dakota, and West Virginia.
You can see from this list that there are no state names that contain East. You can confirm this by doing another find:
> state.name[grep(“East”, state.name)]
character(0)
Substituting text
The sub()
function (short for substitute) searches for a pattern in text and replaces this pattern with replacement text. You use sub()
to substitute text for text, and you use its cousin gsub()
to substitute all occurrences of a pattern. (The g
in gsub()
stands for global.)
Suppose you have the sentence He is a wolf in cheap clothing, which is clearly a mistake. You can fix it with a gsub()
substitution. The gsub()
function takes three arguments: the pattern to find, the replacement pattern, and the text to modify:
> gsub(“cheap”, “sheep’s”, “A wolf in cheap clothing”)
[1] “A wolf in sheep’s clothing”
Another common type of problem that can be solved with text substitution is removing substrings. Removing substrings is the same as replacing the substring with empty text (that is, nothing at all).
Imagine a situation in which you have three file names in a vector: file_a.csv
, file_b.csv
, and file_c.csv
. Your task is to extract the a
, b
, and c
from those file names. You can do this in two steps: First, replace the pattern “file_”
with nothing, and then replace the “.csv”
with nothing. You’ll be left with your desired vector:
> x <- c(“file_a.csv”, “file_b.csv”, “file_c.csv”)
> y <- gsub(“file_”, “”, x)
> y
[1] “a.csv” “b.csv” “c.csv”
> gsub(“.csv”, “”, y)
[1] “a” “b” “c”
Revving up with regular expressions
Until this point, you’ve worked only with fixed expressions to find or substitute text. This is useful but also limited. R supports the concept of regular expressions, which allows you to search for patterns inside text.
You may never have heard of regular expressions, but you’re probably familiar with the broad concept. If you’ve ever used an *
or a ?
to indicate any letter in a word, then you’ve used a form of wildcard search. Regular expressions support the idea of wildcards and much more.
Regular expressions allow three ways of making a search pattern more general than a single, fixed expression:
Alternatives: You can search for instances of one pattern or another, indicated by the |
symbol. For example beach|beech
matches both beach and beech.
On English and American English keyboards, you can usually find the | on the same key as backslash ().
Grouping: You group patterns together using parentheses ( )
. For example you write be(a|e)ch
to find both beach and beech.
Quantifiers: You specify whether an element in the pattern must be repeated or not by adding *
(occurs zero or many times) or +
(occurs one or many times). For example, to find either bach or beech (zero or more of a and e but not both), you use b(e*|a*)ch
.
Try the following examples. First, create a new variable with five words:
> rwords <- c(“bach”, “back”, “beech”, “beach”, “black”)
Find either beach or beech using alternative matching:
> grep(“beach|beech”, rwords)
[1] 3 4
This means the search string was found in elements 3 and 4 of rwords
. To extract the actual elements, you can use subsetting with square brackets:
> rwords[grep(“beach|beech”, rwords)]
[1] “beech” “beach”
Now use the grouping rule to extract the same words:
> rwords[grep(“be(a|e)ch”, rwords)]
[1] “beech” “beach”
Lastly, use the quantifier modification to extract bach and beech but not beach:
rwords[grep(“b(e*|a*)ch”, rwords)]
[1] “bach” “beech”
Factoring in Factors
In real-world problems, you often encounter data that can be described using words rather than numerical values. For example, cars can be red, green, or blue (or any other color); people can be left-handed or right-handed, male or female; energy can be derived from coal, nuclear, wind, or wave power. You can use the term categorical data to describe these examples — or anything else that can be classified in categories.
R has a special data structure for categorical data, called factors. Factors are closely related to characters because any character vector can be represented by a factor.
Factors are special types of objects in R. They’re neither character vectors nor numeric vectors, although they have some attributes of both. Factors behave a little bit like character vectors in the sense that the unique categories often are text. Factors also behave a little bit like integer vectors because R encodes the levels as integers.
Creating a factor
To create a factor in R, you use the factor()
function. The first three arguments of factor()
warrant some exploration:
x
: The input vector that you want to turn into a factor.
levels
: An optional vector of the values that x
might have taken. The default is lexicographically sorted, unique values of x
.
labels
: Another optional vector that, by default, takes the same values as levels
. You can use this argument to rename your levels, as we explain in the next paragraph.
Consider the following example of a vector consisting of compass directions:
> directions <- c(“North”, “East”, “South”, “South”)
Notice that this vector contains the value “South”
twice and lacks the value “West”
. First, convert directions
to a factor:
> factor(directions)
[1] North East South South
Levels: East North South
Notice that the levels of your new factor does not contain the value “West”
, which is as expected. In practice, however, it makes sense to have all the possible compass directions as levels of your factor. To add the missing level, you specify the levels
arguments of factor
:
> factor(directions, levels= c(“North”, “East”, “South”, “West”))
[1] North East South South
Levels: North East South West
As you can see, the values are still the same but this time the levels also contain “West”
.
Now imagine that you actually prefer to have abbreviated names for the levels. To do this, you make use of the labels
argument:
> factor(directions, levels= c(“North”, “East”, “South”, “West”), labels=c(“N”, “E”, “S”, “W”))
[1] N E S S
Levels: N E S W
Converting a factor
Sometimes you need to explicitly convert factors to either text or numbers. To do this, you use the functions as.character()
or as.numeric()
.
First, convert your directions
vector into a factor called directions.factor
(as you saw earlier):
> directions <- c(“North”, “East”, “South”, “South”)
> directions.factor <- factor(directions)
> directions.factor
[1] North East South South
Levels: East North South
Use as.character()
to convert a factor to a character vector:
> as.character(directions.factor)
[1] “North” “East” “South” “South”
Use as.numeric()
to convert a factor to a numeric vector. Note that this will return the numeric codes that correspond to the factor levels. For example, “East”
corresponds to 1
, “North”
corresponds to 2
, and so forth:
> as.numeric(directions.factor)
[1] 2 1 3 3
For example, imagine you have a vector that indicates some test score results with the values c(9, 8, 10, 8, 9)
, which you convert to a factor:
> numbers <- factor(c(9, 8, 10, 8, 9))
To look at the internal representation of numbers
, use str()
:
> str(numbers)
Factor w/ 3 levels “8”,”9”,”10”: 2 1 3 1 2
This indicates that R stores the values as c(2, 1, 3, 1, 2)
with associated levels of c(“8”, “9”, “10”)
. Figure 5-2 gives a graphical representation of this difference between the levels and the internal representation.
Figure 5-2: A visual comparison between a numeric vector and a factor.
If you want to convert numbers
to a character vector, the results are pretty much as you would expect:
> as.character(numbers)
[1] “9” “8” “10” “8” “9”
However, if you simply use as. numeric()
, your result is a vector of the internal level representations of your factor and not the original values:
> as.numeric(numbers)
[1] 2 1 3 1 2
> as.numeric(as.character(numbers))
[1] 9 8 10 8 9
This is an example of nested functions in R, in which you pass the results of one function to a second function. Nested functions are a bit like the Russian nesting dolls, where each toy is inside the next:
The inner function, as.character(numbers)
, contains the text c(“8”, “9”, “10”)
.
The outer function, as.numeric(...)
, does the final conversion to c(9, 8, 10, 8, 9)
.
Looking at levels
To look a little bit under the hood of the structure of a factor, use the str()
function:
> str(state.region)
Factor w/ 4 levels “Northeast”,”South”,..: 2 4 4 2 4 4 1 2 2 2 ...
R reports the structure of state.region
as a factor with four levels. You can see that the first two levels are “Northeast”
and “South”,
but these levels are represented as integers 1, 2, 3, and 4.
Factors are a convenient way to describe categorical data. Internally a factor is stored as a numeric value associated with each level. This means you can set and investigate the levels of a factor separately from the values of the factor.
To look at the levels of a factor, you use the levels()
function. For example, to extract the factor levels of state.region
, use the following:
> levels(state.region)
[1] “Northeast” “South”
[3] “North Central” “West”
Because the values of the factor are linked to the levels, when you change the levels, you also indirectly change the values themselves. To make this clear, change the levels of state.region
to the values “NE”
, “S”
, “NC”
, and “W”
:
> levels(state.region) <- c(“NE”, “S”, “NC”, “W”)
> head(state.region)
[1] S W W S W W
Levels: NE S NC W
Sometimes it’s useful to know the number of levels of a factor. The convenience function nlevels()
extracts the number of levels from a factor:
> nlevels(state.region)
[1] 4
Because the levels of a factor are internally stored by R as a vector, you also can extract the number of levels using length
:
> length(levels(state.region))
[1] 4
For the very same reason, you can index the levels of a factor using standard vector subsisting rules. For example, to extract the second and third factor levels, use the following:
> levels(state.region)[2:3]
[1] “S” “NC”
Distinguishing data types
In the field of statistics, being able to distinguish between variables of different types is very important. The type of data very often determines the type of analysis that can be performed. As a result, R offers the ability to explicitly classify data as follows:
Nominal data: This type of data, which you represent in R using factors, distinguishes between different categories, but there is no implied order between categories. Examples of nominal data are colors (red, green, blue), gender (male, female), and nationality (British, French, Japanese).
Ordinal data: Ordinal data is distinguished by the fact that there is some kind of natural order between elements but no indication of the relative size difference. Any kind of data that is possible to rank in order but not give exact values to is ordinal. For example, low < medium < high describes data that is ordered with three levels.
In market research, it’s very common to use a five-point scale to measure perceptions: strongly disagree < disagree < neutral < agree < strongly agree. This is also an example of ordinal data.
Another example is the use of the names of colors to indicate order, such as red < amber < green to indicate project status.
In R, you use ordered factors to describe ordinal data. For more on ordered factors, see the “Working with ordered factors” section, later in this chapter.
Numeric data: You have numeric data when you can describe your data with numbers (for example, length, weight, or count). Numeric data has two subcategories.
• Interval scaled data: You have interval scaled data when the interval between adjacent units of measurement is the same, but the zero point is arbitrary. An everyday example of interval scaled data is our calendar system. Each year has the same length, but the zero point is arbitrary. In other words, time didn’t start in the year zero — we simply use a convenient year to start counting. This means you can add and subtract dates (and all other types of interval scaled data), but you can’t meaningfully divide dates. Other examples include longitude, as well as anything else where there can be disagreement about where the starting point is.
Other examples of interval scaled data can be found in social science research such as market research.
In R you can use integer or numeric objects to represent interval scaled data.
• Ratio scaled data: This is data where all kinds of mathematical operations are allowed, in particular the ability to multiply and divide (in other words, take ratios). Most data in physical sciences are ratio scaled — for example, length, mass, and speed. In R, you use numeric objects to represent ratio scaled data.
Working with ordered factors
Sometimes data has some kind of natural order in which some elements are in some sense “better” or “worse” than other elements, but at the same time it’s impossible to ascribe a meaningful value to these. An example is any situation where project status is described as low, medium, or high. A similar example is a traffic light that can be red, yellow, or green.
The name for this type of data, where rank ordering is important is ordinal data. In R, there is a special data type for ordinal data. This type is called ordered factors and is an extension of factors that you’re already familiar with.
To create an ordered factor in R, you have two options:
Use the factor()
function with the argument ordered=TRUE
.
Use the ordered()
function.
Say you want to represent the status of five projects. Each project has a status of low, medium, or high:
> status <- c(“Lo”, “Hi”, “Med”, “Med”, “Hi”)
Now create an ordered factor with this status data:
> ordered.status <- factor(status, levels=c(“Lo”, “Med”, “Hi”), ordered=TRUE)
> ordered.status
[1] Lo Hi Med Med Hi
Levels: Lo < Med < Hi
> table(status)
status
Hi Lo Med
2 1 2
Notice that the results are ordered alphabetically. However, the results of performing the same function on the ordered factor yields results that are easier to interpret because they’re now sorted in the order Lo, Med, Hi:
> table(ordered.status)
ordered.status
Lo Med Hi
1 2 2
R preserves the ordering information inherent in ordered factors. In Part V, you see how this becomes an essential tool to gain control over the appearance of bar charts.
Also, in statistical modeling, R applies the appropriate statistical transformation (of contrasts) when you have factors or ordered factors in your model. In Chapter 15, you do some statistical modeling with categorical variables.