Chapter 3. R Packages

Perhaps the biggest reason for R’s phenomenally ascendant popularity is its collection of user-contributed packages. As of mid-September 2013, there were 4,845 packages available on CRAN1, written by an estimated 2,000 different people. Odds are good that if a statistical technique exists, it has been written in R and contributed to CRAN. Not only are there an incredibly large number of packages, many are written by the authorities in the field such as Andrew Gelman, Trevor Hastie, Dirk Eddelbuettel and Hadley Wickham.

1. http://cran.r-project.org/web/packages/

A package is essentially a library of prewritten code designed to accomplish some task or a collection of tasks. The survival package is used for survival analysis, ggplot2 is used for plotting and sp is for dealing with spatial data.

It is important to remember that not all packages are of the same quality. Some are built to be very robust and are well-maintained, while others are built with good intentions but can fail with unforeseen errors and others still are just plain poor. Even with the best packages, it is important to remember that most were written by statisticians for statisticians, so they may differ from what a computer engineer would expect.

This book will not attempt to provide an exhaustive list of good packages to use because that is constantly changing. However, there are some packages that are so pervasive that they will be used in this book as if they were part of base R. Some of these are ggplot2, reshape2 and plyr by Hadley Wickham; glmnet by Trevor Hastie, Robert Tibshirani and Jerome Friedman; Rcpp by Dirk Eddelbuettel; and knitr by Yihui Xie. We have written a package on CRAN, coefplot, with more to follow.

3.1. Installing Packages

As with many tasks in R, there are multiple ways to install packages. The simplest is to install them using the GUI provided by RStudio and shown in Figure 3.1. Access the Packages pane shown in this figure either by clicking its tab or by pressing Ctrl+7 on the keyboard.

Image

Figure 3.1 RStudio’s Packages pane.

In the upper-left corner, click the Install Packages button to bring up the dialog in Figure 3.2.

Image

Figure 3.2 RStudio’s package installation dialog.

From here simply type the name of a package (RStudio has a nice autocomplete feature for this) and click Install. Multiple packages can be specified, separated by commas. This downloads and installs the desired package, which is then available for use. Selecting the Install dependencies checkbox will automatically download and install all packages that the desired package requires to work. For example, our coefplot package depends on ggplot2, plyr, useful, stringr and reshape2, and each of those may have further dependencies.

An alternative is to type a very simple command into the console:

> install.packages("coefplot")

This will accomplish the same thing as working in the GUI.

There has been a movement recently to install packages directly from GitHub or BitBucket repositories, especially to get the development versions of packages. This can be accomplished using devtools.

> require(devtools)
> install_github(repo = "coefplot", username = "jaredlander")

If the package being installed from a repository contains source code for a compiled language—generally C++ or FORTRAN—then the proper compilers must be installed. More information is in Section 24.6.

Sometimes there is a need to install a package from a local file, either a zip of a prebuilt package or a tar.gz of package code. This can be done using the installation dialog mentioned before but switching the Install from: option to Package Archive File as shown in Figure 3.3. Then browse to the file and install. Note that this will not install dependencies, and if they are not present the installation will fail. Be sure to install dependencies first.

Image

Figure 3.3 RStudio’s package installation dialog to install from an archive file.

Similarly to before, this can be accomplished using install.packages.

> install.packages("coefplot_1.1.7.zip")

3.1.1. Uninstalling Packages

In the rare instance when a package needs to be uninstalled, it is easiest to click the white X inside a grey circle on the right of the package description in RStudio’s Packages pane shown in Figure 3.1. Alternatively, this can be done with remove.packages where the first argument is a character vector naming the packages to be removed.

3.2. Loading Packages

Now that packages are installed they are almost ready to use and just need to be loaded first. There are two commands that can be used, either library or require. They both accomplish the same thing—loading the package—but require will return TRUE if it succeeds and FALSE with a warning if it cannot find the package. This returned value is useful when loading a package from within a function, a practice considered acceptable to some, improper to others. In general usage there is not much of a difference, so it comes down to personal preference. The argument to either function is the name of the desired package, with or without quotes. So loading the coefplot package would look like:

> require(coefplot)

Loading required package: coefplot
Loading required package: ggplot2

It prints out the dependent packages that get loaded as well. This can be suppressed by setting the argument quietly to TRUE.

> require(coefplot, quietly = TRUE)

A package only needs to be loaded when starting a new R session. Once loaded, it remains available until either R is restarted or the package is unloaded, as described in Section 3.2.1.

An alternative to loading a package through code is to select the checkbox next to the package name in RStudio’s Packages pane, seen on the left of Figure 3.1. This will load the package by running the code just shown.

3.2.1. Unloading Packages

Sometimes a package needs to be unloaded. This is simple enough either by clearing the checkbox in RStudio’s Packages pane or by using the detach function. The function takes the package name preceded by package: all in quotes.

> detach("package:coefplot")

It is not uncommon for functions in different packages to have the same name. For example, coefplot is in both arm (by Andrew Gelman) and coefplot.2 If both packages are loaded, the function in the package loaded last will be invoked when calling that function. A way around this is to precede the function with the name of the package, separated by two colons (::).

2. This particular instance is because we built coefplot as an improvement on the one available in arm. There are other instances where the names have nothing in common.

> arm::coefplot(object)
> coefplot::coefplot(object)

Not only does this call the appropriate function, it also allows the function to be called without even loading the package beforehand.

3.3. Building a Package

Building a package is one of the more rewarding parts of working with R, especially sharing that package with the community through CRAN. Chapter 24 discusses this process in detail.

3.4. Conclusion

Packages make up the backbone of the R community and experience. They are often considered what makes working with R so desirable. This is how the community makes its work, and so many of the statistical techniques, available to the world. With such a large number of packages, finding the right one can be overwhelming. CRAN Task Views (http://cran.r-project.org/web/views/) offers a curated listing of packages for different needs. However, the best way to find a new package might just be to ask the community. Appendix A gives some resources for doing just that.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset