Chapter 10. Packages

R is not limited to the code provided by the R Core Team. It is very much a community effort, and there are thousands of add-on packages available to extend it. The majority of R packages are currently installed in an online repository called CRAN (the Comprehensive R Archive Network[28]), which is maintained by the R Core Team. Installing and using these add-on packages is an important part of the R experience.

We’ve just seen the plyr package for advanced looping. Throughout the rest of the book, we’ll see many more common packages: lubridate for date and time manipulation, xlsx for importing Excel files, reshape2 for manipulating the shape of data frames, ggplot2 for plotting, and dozens of others.[29]

Chapter Goals

After reading this chapter, you should:

  • Be able to load packages that are installed on your machine
  • Know how to install new packages from local files and via the Internet
  • Understand how to manage the packages on your machine

Loading Packages

To load a package that is already installed on your machine, you call the library function. It is widely agreed that calling this function library was a mistake, and that calling it load_package would have saved a lot of confusion, but the function has existed long enough that it is too late to change it now. To clarify the terminology, a package is a collection of R functions and datasets, and a library is a folder on your machine that stores the files for a package.[30]

If you have a standard version of R—that is, you haven’t built some custom version from the source code—the lattice package should be installed, but it won’t automatically be loaded. We can load it with the library function:

library(lattice)

We can now use all the functions provided by lattice. For example, Figure 10-1 displays a fancy dot plot of the famous Immer’s barley dataset:

dotplot(
  variety ~ yield | site,
  data   = barley,
  groups = year
)

Note

The lattice package is covered in detail in Chapter 14.

Notice that the name of the package is passed to library without being enclosed in quotes. If you want to programmatically pass the name of the package to library, then you can set the argument character.only = TRUE. This is mildly useful if you have a lot of packages to load:

pkgs <- c("lattice", "utils", "rpart")
for(pkg in pkgs)
{
  library(pkg, character.only = TRUE)
}
A dot plot of Immer’s barley data using lattice

Figure 10-1. A dot plot of Immer’s barley data using lattice

If you use library to try to load a package that isn’t installed, then it will throw an error. Sometimes you might want to handle the situation differently, in which case the require function provides an alternative. Like library, require loads a package, but rather than throwing an error it returns TRUE or FALSE, depending upon whether or not the package was successfully loaded:

if(!require(apackagethatmightnotbeinstalled))
{
  warning("The package 'apackagethatmightnotbeinstalled' is not available.")
  #perhaps try to download it
  #...
}

The Search Path

You can see the packages that are loaded with the search function:

search()
## [1] ".GlobalEnv"        "package:stats"     "package:graphics"
## [4] "package:grDevices" "package:utils"     "package:datasets"
## [7] "package:methods"   "Autoloads"         "package:base"

This list shows the order of places that R will look to try to find a variable. The global environment always comes first, followed by the most recently loaded packages. The last two values are always a special environment called Autoloads, and the base package. If you define a variable called var in the global environment, R will find that before it finds the usual variance function in the stats package, because the global environment comes first in the search list. If you create any environments (see Chapter 6), they will also appear on the search path.

Libraries and Installed Packages

The function installed.packages returns a data frame with information about all the packages that R knows about on your machine. If you’ve been using R for a while, this can easily be several hundred packages, so it is often best to view the results away from the console:

View(installed.packages())

installed.packages gives you information about which version of each package is installed, where it lives on your hard drive, and which other packages it depends upon, amongst other things. The LibPath column that provides the file location of the package tells you the library that contains the package. At this point, you may be wondering how R decides which folders are considered libraries.

Note

The following explanation is a little bit technical, so don’t worry about remembering the minutiae of how R finds its packages. This information can save you administration effort when you choose to upgrade R, or when you have problems with packages loading, but it isn’t required for day-to-day use of R.

The packages that come with the R install (base, stats, and nearly 30 others) are stored in the library subdirectory of wherever you installed R. You can retrieve the location of this with:

R.home("library")   #or
## [1] "C:/PROGRA~1/R/R-devel/library"
.Library
## [1] "C:/PROGRA~1/R/R-devel/library"

You also get a user library for installing packages that will only be accessible by you. (This is useful if you install on a family PC and don’t want your six-year-old to update packages and break compatibility in your code.) The location is OS dependent. Under Windows, for R version x.y.z, it is in the R/win-library/x.y subfolder of the home directory, where the home directory can be found via:

path.expand("~")    #or
## [1] "C:\Users\richie\Documents"
Sys.getenv("HOME")
## [1] "C:\Users\richie\Documents"

Under Linux, the folder is similarly located in the R/R.version$platform-library/x.y subfolder of the home directory. R.version$platform will typically return a string like “i686-pc-linux-gnu,” and the home directory is found in the same way as under Windows. Under Mac OS X, it is found in Library/R/x.y/library.

One problem with the default setup of library locations is that when you upgrade R, you need to reinstall all your packages. This is the safest behavior, since different versions of R will often need different versions of packages. In practice, on a development machine the convenience of not having to reinstall packages often outweighs versioning worries.[31] To make life easier for yourself, it’s a very good idea to create your own library that can be used by all versions of R. The simplest way of doing this is to define an environment variable named R_LIBS that contains a path[32] to your desired library location. Although you can define environment variables programmatically with R, they are only available to R, and only for the rest of the session—define them from within your operating system instead.

You can see a character vector of all the libraries that R knows about using the .libPaths function:

.libPaths()
## [1] "D:/R/library"
## [2] "C:/Program Files/R/R-devel/library"

The first value in this vector is the most important, as this is where packages will be installed by default.

Installing Packages

Factory-fresh installs of R are set up to access the CRAN package repository (via a mirror—you’ll be prompted to pick the one nearest to you), and CRANextra if you are running Windows. CRANextra contains a handful of packages that need special attention to build under Windows, and cannot be hosted on the usual CRAN servers. To access additional repositories, type setRepositories() and select the repositories that you want. Figure 10-2 shows the available options.

List of available package repositories

Figure 10-2. List of available package repositories

Bioconductor contains packages related to genomics and molecular biology, while R-Forge and RForge.net mostly contain development versions of packages that eventually appear on CRAN. You can see information about all the packages that are available in the repositories that you have set using available.packages (be warned—there are thousands, so this takes several seconds to run):

View(available.packages())

As well as these repositories, there are many R packages in online repositories such as GitHub, Bitbucket, and Google Code. Retrieving packages from GitHub is particularly easy, as discussed below.

Many IDEs have a point-and-click method of installing packages. In R GUI, the Packages menu has the option “Install package(s)…” to install from a repository and “Install package(s) from local zip files…” to install packages that you downloaded earlier. Figure 10-3 shows the R GUI menu.

You can also install packages using the install.packages function. Calling it without any arguments gives you the same GUI interface as if you’d clicked the “Install package(s)…” menu option. Usually, you would want to specify the names of the packages that you want to download and the URL of the repository to retrieve them from. A list of URLs for CRAN mirrors is available on the main CRAN site.

Installing packages in R GUI

Figure 10-3. Installing packages in R GUI

This command will (try to) download the time-series analysis packages xts and zoo and all the dependencies for each, and then install them into the default library location (the first value returned by .libPaths):

install.packages(
  c("xts", "zoo"),
  repos = "http://www.stats.bris.ac.uk/R/"
)

To install to a different location, you can pass the lib argument to install.packages:

install.packages(
  c("xts", "zoo"),
  lib   = "some/other/folder/to/install/to",
  repos = "http://www.stats.bris.ac.uk/R/"
)

Obviously, you need a working Internet connection for R to be able to download packages, and you need sufficient permissions to be able to write files to the library folder. Inside corporate networks, R’s access to the Internet may be restricted. Under Windows, you can get R to use internet2.dll to access the Internet, making it appear as though it is Internet Explorer and often bypassing restrictions. To achieve this, type:

setInternet2()

If all else fails, you can visit http://<cran mirror>web/packages/available_packages_by_name.html and manually download the packages that you want (remember to download all the dependencies too), then install the resultant tar.gz/tgz/zip file:

install.packages(
  "path/to/downloaded/file/xts_0.8-8.tar.gz",
  repos = NULL,       #NULL repo means "package already downloaded"
  type = "source"     #this means "build the package now"
)

install.packages(
  "path/to/downloaded/file/xts_0.8-8.zip",
  repos = NULL,       #still need this
  type = "win.binary" #Windows only!
)

To install a package directly from GitHub, you first need to install the devtools package:

install.packages("devtools")

The install_github function accepts the name of the GitHub repository that contains the package (usually the same as the name of the package itself) and the name of the user that maintains that repository. For example, to get the development version of the reporting package knitr, type:

library(devtools)
install_github("knitr", "yihui")

Maintaining Packages

After your packages are installed, you will usually want to update them in order to keep up with the latest versions. This is done with update.packages. By default, this function will prompt you before updating each package. This can become unwieldy after a while (having several hundred packages installed is not uncommon), so setting ask = FALSE is recommended:

update.packages(ask = FALSE)

Very occasionally, you may want to delete a package. It is possible to do this by simply deleting the folder containing the package contents from your filesystem, or you can do it programmatically:

remove.packages("zoo")

Summary

  • There are thousands of R packages available from online repositories.
  • You can install these packages with install.packages, and load them with library or require.
  • When you load packages, they are added to the search path, which lets R find their variables.
  • You can view the installed packages with installed.packages, keep them up-to-date with update.packages, and clean your system with remove.packages.

Test Your Knowledge: Quiz

Question 10-1
What are the names of some R package repositories?
Question 10-2
What is the difference between the library and require functions?
Question 10-3
What is a package library?
Question 10-4
How might you find the locations of package libraries on your machine?
Question 10-5
How do you get R to pretend that it is Internet Explorer?

Test Your Knowledge: Exercises

Exercise 10-1
Using R GUI, install the Hmisc package. [10]
Exercise 10-2
Using the install.packages function, install the lubridate package. [10]
Exercise 10-3
Count the number of packages that are installed on your machine in each library. [5]


[28] Named after CPAN, the Comprehensive Perl Archive Network.

[29] You can find a list of the most popular packages on the R-statistics blog.

[30] Be warned that some people on the R-help mailing list consider confusing the two pieces of terminology a capital offense.

[31] If you are deploying R code as part of an application, then the situation is different. Compatibility trumps convenience in that case.

[32] You can have multiple locations separated by semicolons, but this is usually overkill.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset