Chapter 2. Basic Objects

The first step of learning R programming is getting familiar with basic R objects and their behavior. In this chapter, you will learn the following topics:

  • Creating and subsetting atomic vectors (for example, numeric vectors, character vectors, and logical vectors), matrices, arrays, lists, and data frames.
  • Defining and working with functions

"Everything that exists is an object. Everything that happens is a function." -- John Chambers

For example, in statistical analysis, we often feed a set of data to a linear regression model and obtain a group of linear coefficients.

Provided that there are different types of objects in R, when we do this, what basically happens in R is that we provide a data frame object that holds the set of data, carry it to the linear model function and get a list object consisting of the properties of the regression results, and finally extract a numeric vector, which is another type of object, from the list to represent the linear coefficients.

Every task involves various different types of objects. Each object has a different goal and behavior. It's important to understand how a basic object works in order to solve real-world problems, especially with more elegant code and fewer steps. More importantly, a more concrete understanding of object behavior allows you to spend more time on working out the solution to your problem than on getting stuck by countless minor problems while producing the right code.

In the following sections, we will see a variety of basic objects in R that represent different types of data and make it easy to analyze and visualize datasets. You will have a basic understanding of how these objects work and how they interact with each other.

Vector

A vector is a group of primitive values of the same type. It can be a group of numbers, true/false values, texts, and values of some other type. It is one of the building blocks of all R objects.

There are several types of vectors in R. They are distinct from each other in the type of elements they store. In the following sections, we will see the most commonly used types of vectors including numeric vectors, logical vectors, and character vectors.

Numeric vector

A numeric vector is a vector of numeric values. A scalar number is the simplest numeric vector. An example is shown as follows:

1.5
## [1] 1.5

A numeric vector is the most frequently used data type and is the foundation of nearly all kinds of data analysis. In other popular programming languages, there are some scalar types such as integer, double, and string, and these scalar types are the building blocks of the container types such as vectors. In R, however, there is no formal definition of scalar types. A scalar number is only a special case of numeric vector, and it's special only because its length is 1.

When we create a value, it is natural to think of how to store it for future use. To store the value, we can use <- to assign the value to a symbol. In other words, we create a variable named x of the value 1.5:

x <- 1.5

Then, the value is assigned to symbol x, and we can use x to represent the value from now on:

x
## [1] 1.5

There are multiple ways to create a numeric vector. We can call numeric() to create a zero vector of a given length:

numeric (10)
## [1] 0 0 0 0 0 0 0 0 0 0

We can also use c() to combine several vectors to make one vector. The simplest case is, for example, to combine several single-element vectors to be a multi-element vector:

c(1, 2, 3, 4, 5)
## [1] 1 2 3 4 5

We can also combine a mixture of single-element vectors and multi-element vectors and obtain a vector with the same elements as we previously created:

c(1, 2, c(3, 4, 5))
## [1] 1 2 3 4 5

To create a series of consecutive integers, the : operator will easily do the trick.

1:5
## [1] 1 2 3 4 5

Precisely speaking, the preceding code produces an integer vector instead of a numeric vector. In many cases, their difference is not that important. We will cover this topic later.

A more general way to produce a numeric sequence is seq(). For example, the following code produces a numeric vector of a sequence from 1 to 10 by increment 2:

seq(1, 10, 2)
## [1] 1 3 5 7 9

Functions like seq() have many arguments. We can call such a function by supplying all the arguments, but it is not necessary in most cases. Most functions provide reasonable default values for some arguments, which makes it easier for us to call them. In this case, we only need to specify the argument that we would like to modify from its default value.

For example, we can create another numeric vector that starts from 3 with the length 10 by specifying the length.out argument:

seq(3, length.out = 10)
## [1] 3 4 5 6 7 8 9 10 11 12

A function call like the above uses a named argument length.out so that other arguments are kept default and only this argument is modified.

There are many ways in which we can define numeric vectors, but we should always be careful when we use :, an example is shown as follows:

1 + 1:5
## [1] 2 3 4 5 6

As the result shows, 1 + 1:5 does not mean a sequence from 2 to 5, but from 2 to 6. It is because : has higher priority than +, which results in evaluating 1:5 first and adding 1 to each entry, yielding the sequence you see in the result. We will cover the priority of operators later.

Logical vector

In contrast to numeric vectors, a logical vector stores a group of TRUE or FALSE values. They are basically yes or no to denote the answers to a group of logical questions.

The simplest logical vectors are TRUE and FALSE themselves:

TRUE
## [1] TRUE

A more usual way to obtain a logical vector is to ask logical questions about R objects. For example, we can ask R whether 1 is greater than 2:

1 > 2
## [1] FALSE

The answer is yes, represented by TRUE. Sometimes, it is verbose to write TRUE and FALSE; so, we can use T as an abbreviation for TRUE and F for FALSE. If we want to perform multiple comparisons at the same time, we can directly use numeric vectors in the question:

c(1, 2) > 2
## [1] FALSE FALSE

R interprets this expression as the element-wise comparison between c(1, 2) and 2. In other words, it is equivalent to c(1 > 2, 2 > 2).

We can compare two multi-element numeric vectors as long as the length of the longer vector is a multiple of that of the shorter one:

c(1, 2) > c(2, 1)
## [1] FALSE TRUE

The preceding code is equivalent to c(1 > 2, 2 > 1). To demonstrate how two vectors of different lengths are compared, see the following example::

c(2, 3) > c(1, 2, -1, 3)
## [1] TRUE TRUE TRUE FALSE

This may confuse you a bit. The computing mechanism recycles the shorter vector and works like c(2 > 1, 3 > 2, 2 > -1, 3 > 3). More specifically, the shorter vector will by recycled to finish all the comparisons for each element in the longer vector.

In R, several logical binary operators are defined, such as == to denote equality, > for greater-than, >= for greater-or-equals-to, < for less-than, and <= for less-than-or-equals-to. Moreover, R provides some other additional logical operators like %in% to tell whether each element in the left-hand side vector is contained by the right-hand side vector:

1 %in% c(1, 2, 3)
## [1] TRUE
c(1, 4) %in% c(1, 2, 3)
## [1] TRUE FALSE

You may notice that all the equality operators perform recycling but %in% does not. Instead, it always works by iterating over the vector on the left and works like c(1 %in% c(1, 2, 3), 4 %in% c(1, 2, 3)) in the preceding example.

Character vector

A character vector is a group of strings. Here, a character does not mean literally a single letter or symbol in a language, but it means a string like this is a string. Both double quotation marks and single quotation mark, can be used to create a character vector, as follows:

"hello, world!"
## [1] "hello, world!"
'hello, world!'
## [1] "hello, world!"

We can also use the combine function c() to construct a multi-element character vector:

c("Hello", "World")
## [1] "Hello" "World"

We can use == to tell whether two vectors have equal values in corresponding positions; this applies to character vectors too:

c("Hello", "World") == c('Hello', 'World')
## [1] TRUE TRUE

The character vectors are equal because " and ' both work to create a string and do not affect its value:

c("Hello", "World") == "Hello, World"
## [1] FALSE FALSE

The previous expression yields both FALSE because neither Hello nor World  equals Hello, World. The only difference between the two quotation marks is the behavior when you create a string containing quotation marks.

If you use " to create a string (a single-element character vector) containing " itself, you need to type " to escape " inside the string to prevent the interpreter from regarding " in the string as the close quotation mark of the string.

The following examples demonstrate the escaping of quotation marks. The code uses cat() to print the given text:

cat("Is "You" a Chinese name?")
## Is "You" a Chinese name?

If you feel that this is not easy to read, you may well use ' to create the string, which can be easier:

cat('Is "You" a Chinese name?')
## Is "You" a Chinese name?

In other words, " allows ' in the string without escaping, and ' allows " in the string without escaping.

Now we know the basic things about creating numeric vectors, logical vectors, and character vectors. In fact, we also have complex vectors and raw vectors in R. Complex vectors are vectors of complex values, such as c(1 + 2i, 2 + 3i). Raw vectors basically store raw binary data that is represented in the hexadecimal form. These two types of vectors are much less frequently used, but they share many behaviors with the three types of vectors we have covered.

In the next section, you will learn several ways to access part of a vector. By subsetting vectors, you should begin to understand how different types of vectors can be related to each other.

Subsetting vectors

If we want to access some specific entries or a subset of a vector, subsetting a vector means accessing some specific entries or a subset of the vector. In this section, we'll demonstrate various ways to subset a vector.

First, we create a simple numeric vector and assign it to v1:

v1 <- c(1, 2, 3, 4)

Each of the following lines gets a specific subset of v1.

For example, we can get the second element:

v1[2]
## [1] 2

We can get the second to fourth elements:

v1[2:4]
## [1] 2 3 4

We can get all elements except the third one:

v1[-3]
## [1] 1 2 4

The patterns are clear—we can put any numeric vector inside the square brackets after the vector to extract a corresponding subset:

a <- c(1, 3)v1[a]
## [1] 1 3

All the preceding examples perform subsetting by position, that is, we get a subset of a vector by specifying the positions of elements. Using negative numbers will exclude those elements. One thing to notice is that you can't use positive numbers and negative numbers together:

v1[c(1, 2, -3)]
## Error in v1[c(1, 2, -3)]: only 0's may be mixed with negative subscripts

What if we subset the vector using positions beyond the range of the vector? The following example tries to get a subset of v1 from the third element to the nonexisting sixth element:

v1[3:6]
## [1] 3 4 NA NA

As we can see, the nonexisting positions end up in missing values represented by NA. In real-world data, missing values are common. The good part is that all arithmetic calculations with NA also result in NA for consistency. On the other hand, however, it takes extra effort to deal with data because it may not be safe to assume that the data contains no missing values.

Another way to subset a vector is using logical vectors. We can supply an equal-length logical vector to determine whether each entry should be extracted:

v1[c(TRUE, FALSE, TRUE, FALSE)]
## [1] 1 3

More than subsetting, we can overwrite a specific subset of a vector like this:

v1[2] <- 0

In this case, v1 becomes the following:

v1
## [1] 1 0 3 4

We can also overwrite multiple elements at different positions at the same time:

v1[2:4] <- c(0, 1, 3)

Now, v1 becomes the following:

v1
## [1] 1 0 1 3

Like subsetting, logical selectors are also accepted for overwriting:

v1[c(TRUE, FALSE, TRUE, FALSE)] <- c(3, 2)

As you may expect, v1 becomes the following:

v1
## [1] 3 0 2 3

A useful implication of this operation is selecting entries by logical criterion. For example, the following code picks out all elements that are not greater than 2 in v1:

v1[v1 <= 2]
## [1] 0 2

A more complex selection criterion also works. The following example picks out all elements of v1 that satisfy x2 - x + 1 > 0 :

v1[v1 ^ 2 - v1 + 1 >= 0]
## [1] 3 0 2 3

To replace all entries that satisfy x <= 2 with 0, we can call the following:

v1[v1 <= 2] <- 0

As you may expect, v1 becomes the following:

v1
## [1] 3 0 0 3

If we overwrite the vector at a nonexisting entry, the vector will automatically expand with the unassigned value being NA as missing values:

v1[10] <- 8
v1
## [1] 3 0 0 3 NA NA NA NA NA 8

Named vectors

A named vector is not a specific type of vector parallel to a numeric or logical vector. It is a vector with names corresponding to the elements. We can give names to a vector when we create it:

x <- c(a = 1, b = 2, c = 3)
x
## a b c
## 1 2 3

Then, we can access the elements with a single-valued character vector:

x["a"]
## a
## 1

We can also get multiple elements with a character vector:

x[c("a", "c")]
## a c
## 1 3

If the character vector has duplicate elements, the selection will result in selecting duplicate elements:

x[c("a", "a", "c")]
## a a c
## 1 1 3

In addition to this, all other operations to a vector also perfectly work for named vectors.

We can get the names of a vector with names():

names(x)
## [1] "a" "b" "c"

The names of a vector are not fixed. We can change the names of a vector by assigning another character vector to its names.

names(x) <- c("x", "y", "z")
x["z"]
## z
## 3

If the names are no longer needed, we can simply remove the vector's names using NULL, a special object that represents undefined value:

names(x) <- NULL
x
## [1] 1 2 3

You may wonder what happens when the name does not exist at all. Let's experiment with the original x value:

x <- c(a = 1, b = 2, c = 3)
x["d"]
## <NA>
## NA

By intuition, accessing a nonexisting element should produce an error. However, the result is not an error but a vector of a single missing value with a missing name:

names(x["d"])
## [1] NA

If you provide a character vector in which some names exist but others do not, the resulting vector will preserve the length of the selection vector:

x[c("a", "d")]
## a <NA>
## 1 NA

Extracting an element

While [] creates a subset of a vector, [[]] extracts an element from a vector. A vector is like ten boxes of candy, [] gets you three boxes of candy, but [[]] opens a box and gets you a candy from it.

For simple vectors, using [] and [[]] to get one element will produce the same result. However, in some cases, they have different behaviors. For example, subsetting a named vector using one entry and extracting an element from it will produce different results:

x <- c(a =  1, b = 2, c = 3)
x["a"]
## a
## 1
x[["a"]]
## [1] 1

The metaphor of candy boxes makes it easier to understand. The x["a"] argument gives you the box of candy labeled "a", while x[["a"]] gives you the candy in the box labeled "a".

Since [[]] only extracts one element, it does not work with vectors of more than one element:

x[[c(1, 2)]]
## Error in x[[c(1, 2)]]: attempt to select more than one element

Also, it does not work with negative integers meaning excluding elements at certain positions:

x[[-1]]
## Error in x[[-1]]: attempt to select more than one element

We already know that subsetting a vector with a nonexisting position or name will produce missing values. However, [[]] simply does not work when we extract an element with a position beyond the range, nor does it work with a nonexisting name:

x[["d"]]
## Error in x[["d"]]: subscript out of bounds

For many beginners, it may be confusing to see both [[]] and [] used in the code and it is easy to misuse them. Just remember the metaphor of the candy boxes.

Telling the class of vectors

Sometimes we need to tell which kind of vector we are dealing with before taking an action. The class() function tells us the class of any R object:

class(c(1, 2, 3))
## [1] "numeric"
class(c(TRUE, TRUE, FALSE))
## [1] "logical"
class(c("Hello", "World"))
## [1] "character"

If we need to ensure that an object is indeed a vector of a specific class, we can use is.numericis.logicalis.character, and some other functions with similar names:

is.numeric(c(1, 2, 3))
## [1] TRUE
is.numeric(c(TRUE, TRUE, FALSE))
## [1] FALSE
is.numeric(c("Hello", "World"))
## [1] FALSE

Converting vectors

Different classes of vectors can be coerced to a specific class of vector. For example, some data are string representation of numbers, such as 1 and 20. If we leave these strings as they are, we won't be able to perform numeric calculations with them. Fortunately, these two strings can be converted to numeric vectors. This will make R regard them as numbers rather than strings so that we can do the math with them.

To demonstrate a typical conversion, we first create a character vector:

strings <- c("1", "2", "3")
class(strings)
## [1] "character"

As I mentioned, strings cannot be used to do maths directly:

strings + 10
## Error in strings + 10: non-numeric argument to binary operator

We can use as.numeric() to convert the character vector to a numeric vector:

numbers <- as.numeric(strings)
numbers
## [1] 1 2 3
class(numbers)
## [1] "numeric"

Now we can do maths with numbers:

numbers + 10
## [1] 11 12 13

Similar to is.* functions (for example, is.numericis.logical, and is.character) that check the class of a given object, we can use the as.* function family to convert a vector from its original class to another:

as.numeric(c("1", "2", "3", "a"))
## Warning: NAs introduced by coercion
## [1] 1 2 3 NA
as.logical(c(-1, 0, 1, 2))
## [1] TRUE FALSE TRUE TRUE
as.character(c(1, 2, 3))
## [1] "1" "2" "3"
as.character(c(TRUE, FALSE))
## [1] "TRUE" "FALSE"

It seems that each type of vector can be somehow converted to all other types. However, the conversion follows a set of rules.

The first line in the preceding block of code attempts to convert the character vector to a numeric vector, just as we did in the previous example. Obviously, the last element a cannot be converted to a number. The conversion is done except for the last element, so a missing value is produced instead.

As for converting a numeric vector to a logical vector, the rule is that only 0 corresponds to FALSE and all non-zero numbers will produce TRUE.

Each kind of vector can be converted to a character vector since everything has a character representation. However, if a numeric vector or a logical vector is coerced to a character vector, it cannot be directly involved in the arithmetic operations with other numeric or logical vectors unless it is converted back. That is why the following code does not work, as I have just mentioned:

c(2, 3) + as.character(c(1, 2))
## Error in c(2, 3) + as.character(c(1, 2)): non-numeric argument to binary operator

From the preceding examples, I have stressed that although R does not impose strong typing rules, it does not mean that R is smart enough to do exactly what you want it to do automatically. In most cases, it is better to ensure that the type of vectors are correct in computations; otherwise, an unexpected error will occur. In other words, only when you get the right type of data objects can you do the right math.

Arithmetic operators for numeric vectors

The arithmetic operations of numeric vectors are very simple. They basically follow two rules: Computing in an element-wise manner and recycling the shorter vector. The following examples demonstrate the behavior of the operators working with numeric vectors:

c(1, 2, 3, 4) + 2
## [1] 3 4 5 6
c(1, 2, 3) - c(2, 3, 4)
## [1] -1 -1 -1
c(1, 2, 3) * c(2, 3, 4)
## [1] 2 6 12
c(1, 2, 3) / c(2, 3, 4)
## [1] 0.5000000 0.6666667 0.7500000
c(1, 2, 3) ^ 2
## [1] 1 4 9
c(1, 2, 3) ^ c(2, 3, 4)
## [1] 1 8 81
c(1, 2, 3, 14) %% 2
## [1] 1 0 1 0

Although vectors can have names, the operations do not function with corresponding names. Only the names of vectors on the left-hand side will remain and the names of those on the right-hand side will be ignored:

c(a = 1, b = 2, c = 3) + c(b = 2, c = 3, d = 4)
## a b c
## 3 5 7
c(a = 1, b = 2, 3) + c(b = 2, c = 3, d = 4)
## a b
## 3 5 7

We saw some basic behaviors of numeric vectors, logical vectors, and character vectors. They are the most commonly used data structures and are the building blocks of a wide variety of other useful objects. One of them is matrix, which is intensively used in the formulation of statistical and econometric theories, and it is very useful in representing two-dimensional data and solving linear systems. In the next chapter, we will see how we can create a matrix in R and how it is deeply rooted in vectors.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset