In this chapter, basic techniques to fulfill the reorganization of data (cleaning, processing, and so on) will be covered. These will be the key factors while developing web applications with R and Shiny because, unlike traditional web application commercial software, they provide the possibility of performing any operation on your data and consequently, displaying it in the exact way that was imagined with no boundaries. By presenting several useful tools, this chapter will help the reader to gain skills in data manipulation.
The chapter is divided into the following seven sections:
plyr
packagedata.table
packagereshape2
packageThere are mainly two functions in the base R package (that is, the package that comes by default when installing R) to display ordered elements—sort()
and order()
.
sort()
: This is a function that returns the passed vector in decreasing or increasing order:> vector1 <- c(2,5,3,4,1) > sort(vector1) [1] 1 2 3 4 5
If the vector passed is of the character type, the function returns it in alphabetical order and if it is logical, it will first return the FALSE
elements and then the TRUE
elements:
> sort(c(T,T,F,F)) [1] FALSE FALSE TRUE TRUE
order()
: This returns the index number of the ordered elements according to their values:> vector1 <- c(2,5,3,4,1) > order(vector1) [1] 5 1 3 4 2
In the preceding example, for the vector1
object, the function returns the fifth element first, then the first, then the third, and so on. For character or logical vectors, the criterion is the same as in sort()
:
> sort(vector1,decreasing=T) [1] 5 4 3 2 1
To obtain an identical result of sort(object)
with order()
, the object could be indexed by the output of its order()
function:
> vector1[order(vector1)] [1] 1 2 3 4 5
As explained in the previous chapter, the elements of a vector can be accessed by index numbers, and as order()
returns the index numbers according to their value, indexing a vector by its order()
output will result in an ordered vector.
Unlike sort()
, order()
can handle multiple input vectors where ordering criteria is applied in the order the vectors are passed. For example, if there was a tie in the ordering by the first criteria, the second vector will be used:
> vector1 <- c(2,2,3,3,1) > vector2 <- c(2,5,4,3,1) > order(vector1,vector2,decreasing = c(T,F)) [1] 3 4 2 1 5
Note that with multiple vectors, a logical vector has to be passed to the decreasing argument. As it happens with element selection, in the case the length of the logical vector is smaller than the number of vectors being ordered, the logical vector will be recycled, as explained in Chapter 2, First Steps towards Programming in R.
The order()
function is particularly useful to order matrices or data frames by indexing per row by the output of the order based upon any of its columns:
> data(iris) > names(iris) [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" > iris.ordered <- iris[order(iris$Sepal.Length, iris$Sepal.Width),]
After loading the iris
data frame object, a new iris.ordered
object is created. It is the same dataset as iris
but ordered by Sepal.Length
and Sepal.Width
as the order
function returns the corresponding indexes, which are then applied to the iris
dataset. Note that, as it is a data frame, the indexing has a comma that separates rows from columns (in the case of data frames, observations from variables). As there is nothing after the comma, R returns all the variables:
In conclusion, order()
particularly is a very useful function, especially because it is the best way to order data frames and matrices.