reshape2

reshape2 is a package that consists mainly of two functions: melt and dcast/acast. Generally, it could be said that melt() transforms one row to multiple and shorter rows while dcast() and acast() do exactly the opposite.

The melt() function, basically, transforms one row of data to many by pivoting a set of variables (the measure variables) over a set of other variables (the id variables). The function is called as follows:

melt(dataset,id.vars,measure.vars,variable_name)

The id variables are usually factors or characters while the measure variables are the numeric ones. In fact, this is the behavior by default if none of the arguments are specified. variable_name is the name that adopts the column where the variables are specified (variable by default):

> library(reshape2)
> data(iris)
> melt(iris)
Using Species as id variables
Species      variable    value
1 setosa   Sepal.Length   5.1
2 setosa   Sepal.Length   4.9
3 setosa   Sepal.Length   4.7
4 setosa   Sepal.Length   4.6
5 setosa   Sepal.Length   5.0
6 setosa   Sepal.Length   5.4
7 setosa   Sepal.Length   4.6
8 setosa   Sepal.Length   5.0

(Here, the output is cut.)

As can be seen, in the iris dataset, melt() by default chooses Species as the id variable and all the others as the measure variables. Then the output of this function is every possible combination of the id variables with the measure variables and its corresponding value. The variable_name value remained as default but it could have been changed.

As it was previously mentioned, dcast() is exactly the opposite function; it transforms multiple rows to one row. The way in which it performs such an operation depends on the specified summarizing function. Although dcast() has many other arguments that can be passed, it could be summarized to the following:

dcast(data, formula, fun.aggregate = NULL,
value.var = guess_value(data))

formula is a formula object exactly as in aggregate. The aggregating function has the same behavior as in apply functions, and it can also be specified within the function call. If not specified, it is defaulted to length (that is, frequency). value.var is the variable from which the values to perform the aggregation are taken. When not specified, the function makes a guess:

data(iris)
molten.iris <- melt(iris)
## Using Species as id variables
dcast(molten.iris, variable~Species, fun.aggregate= sum)
##       variable  setosa  versicolor  virginica
## 1 Sepal.Length  250.3      296.8     329.4
## 2  Sepal.Width  171.4      138.5     148.7
## 3 Petal.Length   73.1      213.0     277.6
## 4  Petal.Width   12.3       66.3     101.3

The dataset used for this dcast call is the molten data and the function used is sum. The value.var argument is guessed. In this case, however, there is only one value variable:

data(iris)
molten.iris <- melt(iris)
## Using Species as id variables
dcast(molten.iris, variable~Species, fun.aggregate= function(x) sum(x+2))
##       variable  setosa   versicolor  virginica
## 1 Sepal.Length  350.3      396.8     429.4
## 2  Sepal.Width  271.4      238.5     248.7
## 3 Petal.Length  173.1      313.0     377.6
## 4  Petal.Width  112.3      166.3     201.3

In this case, the aggregate function is defined within the dcast call.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset