Classes in depth

R has the following six fundamental or atomic classes:

  • Character: When assigning a character value to a variable, the corresponding string must be quoted.
  • Numeric: Decimal numbers.
  • Integer: Non-decimal numbers.
  • Complex: Complex numbers.
  • Logical: TRUE/FALSE values.
  • Raw: As explained in the help section of R—the raw type is intended to hold raw bytes. This is very rarely used.

All the rest of the classes that can be built in R are combinations of these six. In the later sections, you will find a list of the most common ones.

Vectors

Vectors are objects that contain elements of only one atomic class. The type of the vector will be the same as the elements it contains (for example, a numeric vector or a character vector). It is important to keep in mind that, as is the case with variables, if a value is added to a vector that does not correspond to the vector type, R will eventually change the vector type in order to adjust it to all the values in the vector instead of throwing an error:

> aaa <- numeric(length=5)
> aaa[1] <- 6
> aaa[2] <- 2
> class(aaa)
[1] "numeric"

> aaa[3] <- "a string"
> class(aaa)
[1] "character"

As shown in the preceding example, R will try to coerce the vector class in order to make it fit in to the vector. Of course, in this case the first two elements of the vector cannot be used as numbers:

> aaa[1] - aaa[2]
Error in aaa[1] - aaa[2] : non-numeric argument to binary operator

Note

The c() function stands for combine and is usually the most comfortable way of generating vectors in R; it generates a vector of the least general class that can support all the input values. For example, if at least one character is passed (and the other elements are, for instance, numbers), the vector will be of the character class and the numbers will all be treated as such.

Lists

Lists are the vectors that support any objects of any class, elemental or non-elemental. It is very common, indeed, that lists will contain other lists within them. In the case of a list, its elements preserve its original class:

> aaa <- list()
> aaa[1] <- 4
> aaa[2] <- 5
> aaa[3] <- "a string"
> aaa
[[1]]
[1] 4

[[2]]
[1] 5

[[3]]
[1] "a string"

> aaa[[2]] - aaa[[1]]
[1] 1

Although the selection of elements in lists in R is covered in detail in the Selecting elements over lists section of this chapter, it is worth mentioning that to access the elements in the lists by index (the first element, the second element, and so on), double brackets are needed.

Matrices and arrays

Matrices and arrays are special types of vectors. In fact, they are vectors with a dimension attribute. This can be easily tested, as follows:

> numeric.vector <- 1:20
> attr(numeric.vector, "dim") <- c(10,2)
> class(numeric.vector)
[1] "matrix"

Matrices are special types of arrays that have two dimensions, that is, rows and columns. Alternatively, they can be generated as follows:

> numeric.vector <- 1:20
> numeric.vector <- matrix(numeric.vector,10,2)
> class(numeric.vector)
[1] "matrix"

Similar to vectors, matrices and arrays contain only elements of the same type. As it was already explained, using attr(object,"dim") is equivalent to using dim(object).

Data frames

A data frame is a special type of list, where all the elements of the list have the same length. For this reason, it is presented as an object consisting of n observations of m variables. This resembles a two-dimensional matrix structure, with the difference that as it is a list, it can contain elements of different classes. By default, all the functions that read table structures (see the upcoming Reading data section) return a data frame.

Additional characteristics of data frame objects can be found at http://cran.r-project.org/doc/manuals/r-release/R-intro.html#Data-frames. For further information about indexing data frames, see the Selecting elements over data frames section of this chapter.

Factors

A factor is a special class designed for categorical variables. It can be mainly thought of as a numeric code that stands for a character element, which is its label. Technically, it consists of a set of integers (that is, non-decimal numbers) and its corresponding levels (mainly, the label). By default, strings are coerced to factors when data frame objects are created. For this reason, it is important to have a clear understanding of its structure.

In the following example, an animals character vector is created and coerced to a factor. This means that every distinct element in the object was changed for a numeric value starting from 1 and assigned a label. In this case, there are three levels with three labels (dog, cat, and horse). If the variable is invoked, R will print the actual values of the labeled vector and all the labels as follows:

> animals <- c("dog","cat","dog","horse")
> animals <- as.factor(animals)
> animals
[1] dog   cat   dog   horse
Levels: cat dog horse

However, if animals is passed to cat(), the codes are displayed as follows:

> cat(animals)
2 1 2 3

When factors are coerced to character, the output is an element of the character class of the labels, and when they are coerced to numeric it is the code:

> as.character(animals)
[1] "dog"   "cat"   "dog"   "horse"
> as.numeric(animals)
[1] 2 1 2 3
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset