R is not limited to the code provided by the R Core Team. It is very much a community effort, and there are thousands of add-on packages available to extend it. The majority of R packages are currently installed in an online repository called CRAN (the Comprehensive R Archive Network[28]), which is maintained by the R Core Team. Installing and using these add-on packages is an important part of the R experience.
We’ve just seen the plyr
package for advanced looping. Throughout the rest of the book, we’ll see many more common packages: lubridate
for date and time manipulation, xlsx
for importing Excel files, reshape2
for manipulating the shape of data frames, ggplot2
for plotting, and dozens of others.[29]
After reading this chapter, you should:
To load a package that is already installed on your machine, you call the library
function. It is widely agreed that calling this function library
was a mistake, and that calling it load_package
would have saved a lot of confusion, but the function has existed long enough that it is too late to change it now. To clarify the terminology, a package is a collection of R functions and datasets, and a library is a folder on your machine that stores the files for a package.[30]
If you have a standard version of R—that is, you haven’t built some custom version from the source code—the lattice
package should be installed, but it won’t automatically be loaded. We can load it with the library
function:
library(
lattice)
We can now use all the functions provided by lattice
. For example, Figure 10-1 displays a fancy dot plot of the famous Immer’s barley dataset:
dotplot(
variety ~ yield|
site,
data=
barley,
groups=
year)
The lattice
package is covered in detail in Chapter 14.
Notice that the name of the package is passed to library
without being enclosed in quotes. If you want to programmatically pass the name of the package to library
, then you can set the argument character.only = TRUE
. This is mildly useful if you have a lot of packages to load:
pkgs<-
c(
"lattice"
,
"utils"
,
"rpart"
)
for
(
pkg in pkgs)
{
library(
pkg,
character.only=
TRUE
)
}
If you use library
to try to load a package that isn’t installed, then it will throw an error. Sometimes you might want to handle the situation differently, in which case the require
function provides an alternative. Like library
, require
loads a package, but rather than throwing an error it returns TRUE
or FALSE
, depending upon whether or not the package was successfully loaded:
if
(
!require(
apackagethatmightnotbeinstalled))
{
warning(
"The package 'apackagethatmightnotbeinstalled' is not available."
)
#perhaps try to download it
#...
}
You can see the packages that are loaded with the search
function:
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics" ## [4] "package:grDevices" "package:utils" "package:datasets" ## [7] "package:methods" "Autoloads" "package:base"
This list shows the order of places that R will look to try to find a variable. The global environment always comes first, followed by the most recently loaded packages. The last two values are always a special environment called Autoloads
, and the base package. If you define a variable called var
in the global environment, R will find that before it finds the usual variance function in the stats
package, because the global environment comes first in the search list. If you create any environments (see Chapter 6), they will also appear on the search path.
The function installed.packages
returns a data frame with information about all the packages that R knows about on your machine. If you’ve been using R for a while, this can easily be several hundred packages, so it is often best to view the results away from the console:
View(
installed.packages())
installed.packages
gives you information about which version of each package is installed, where it lives on your hard drive, and which other packages it depends upon, amongst other things. The LibPath
column that provides the file location of the package tells you the library that contains the package. At this point, you may be wondering how R decides which folders are considered libraries.
The following explanation is a little bit technical, so don’t worry about remembering the minutiae of how R finds its packages. This information can save you administration effort when you choose to upgrade R, or when you have problems with packages loading, but it isn’t required for day-to-day use of R.
The packages that come with the R install (base
, stats
, and nearly 30 others) are stored in the library subdirectory of wherever you installed R. You can retrieve the location of this with:
R.home(
"library"
)
#or
## [1] "C:/PROGRA~1/R/R-devel/library"
.
Library
## [1] "C:/PROGRA~1/R/R-devel/library"
You also get a user library for installing packages that will only be accessible by you. (This is useful if you install on a family PC and don’t want your six-year-old to update packages and break compatibility in your code.) The location is OS dependent. Under Windows, for R version x.y.z, it is in the R/win-library/x.y subfolder of the home directory, where the home directory can be found via:
path.expand(
"~"
)
#or
## [1] "C:\Users\richie\Documents"
Sys.getenv(
"HOME"
)
## [1] "C:\Users\richie\Documents"
Under Linux, the folder is similarly located in the R/R.version$platform-library/x.y subfolder of the home directory. R.version$platform
will typically return a string like “i686-pc-linux-gnu,” and the home directory is found in the same way as under Windows. Under Mac OS X, it is found in Library/R/x.y/library.
One problem with the default setup of library locations is that when you upgrade R, you need to reinstall all your packages. This is the safest behavior, since different versions of R will often need different versions of packages. In practice, on a development machine the convenience of not having to reinstall packages often outweighs versioning worries.[31] To make life easier for yourself, it’s a very good idea to create your own library that can be used by all versions of R. The simplest way of doing this is to define an environment variable named R_LIBS
that contains a path[32] to your desired library location. Although you can define environment variables programmatically with R, they are only available to R, and only for the rest of the session—define them from within your operating system instead.
You can see a character vector of all the libraries that R knows about using the .libPaths
function:
.
libPaths()
## [1] "D:/R/library" ## [2] "C:/Program Files/R/R-devel/library"
The first value in this vector is the most important, as this is where packages will be installed by default.
Factory-fresh installs of R are set up to access the CRAN package repository (via a mirror—you’ll be prompted to pick the one nearest to you), and CRANextra if you are running Windows. CRANextra contains a handful of packages that need special attention to build under Windows, and cannot be hosted on the usual CRAN servers. To access additional repositories, type setRepositories()
and select the repositories that you want. Figure 10-2 shows the available options.
Bioconductor contains packages related to genomics and molecular biology, while R-Forge and RForge.net mostly contain development versions of packages that eventually appear on CRAN. You can see information about all the packages that are available in the repositories that you have set using available.packages
(be warned—there are thousands, so this takes several seconds to run):
View(
available.packages())
As well as these repositories, there are many R packages in online repositories such as GitHub, Bitbucket, and Google Code. Retrieving packages from GitHub is particularly easy, as discussed below.
Many IDEs have a point-and-click method of installing packages. In R GUI, the Packages menu has the option “Install package(s)…” to install from a repository and “Install package(s) from local zip files…” to install packages that you downloaded earlier. Figure 10-3 shows the R GUI menu.
You can also install packages using the install.packages
function. Calling it without any arguments gives you the same GUI interface as if you’d clicked the “Install package(s)…” menu option. Usually, you would want to specify the names of the packages that you want to download and the URL of the repository to retrieve them from. A list of URLs for CRAN mirrors is available on the main CRAN site.
This command will (try to) download the time-series analysis packages xts
and zoo
and all the dependencies for each, and then install them into the default library location (the first value returned by .libPaths
):
install.packages(
c(
"xts"
,
"zoo"
),
repos=
"http://www.stats.bris.ac.uk/R/"
)
To install to a different location, you can pass the lib
argument to install.packages
:
install.packages(
c(
"xts"
,
"zoo"
),
lib=
"some/other/folder/to/install/to"
,
repos=
"http://www.stats.bris.ac.uk/R/"
)
Obviously, you need a working Internet connection for R to be able to download packages, and you need sufficient permissions to be able to write files to the library folder. Inside corporate networks, R’s access to the Internet may be restricted. Under Windows, you can get R to use internet2.dll to access the Internet, making it appear as though it is Internet Explorer and often bypassing restrictions. To achieve this, type:
setInternet2()
If all else fails, you can visit http://<cran mirror>web/packages/available_packages_by_name.html and manually download the packages that you want (remember to download all the dependencies too), then install the resultant tar.gz/tgz/zip file:
install.packages(
"path/to/downloaded/file/xts_0.8-8.tar.gz"
,
repos=
NULL
,
#NULL repo means "package already downloaded"
type=
"source"
#this means "build the package now"
)
install.packages(
"path/to/downloaded/file/xts_0.8-8.zip"
,
repos=
NULL
,
#still need this
type=
"win.binary"
#Windows only!
)
To install a package directly from GitHub, you first need to install the devtools
package:
install.packages(
"devtools"
)
The install_github
function accepts the name of the GitHub repository that contains the package (usually the same as the name of the package itself) and the name of the user that maintains that repository. For example, to get the development version of the reporting package knitr
, type:
library(
devtools)
install_github(
"knitr"
,
"yihui"
)
After your packages are installed, you will usually want to update them in order to keep up with the latest versions. This is done with update.packages
. By default, this function will prompt you before updating each package. This can become unwieldy after a while (having several hundred packages installed is not uncommon), so setting ask = FALSE
is recommended:
update.packages(
ask=
FALSE
)
Very occasionally, you may want to delete a package. It is possible to do this by simply deleting the folder containing the package contents from your filesystem, or you can do it programmatically:
remove.packages(
"zoo"
)
install.packages
, and load them with library
or require
.
search
path, which lets R find their variables.
installed.packages
, keep them up-to-date with update.packages
, and clean your system with remove.packages
.
library
and require
functions?
Hmisc
package. [10]
install.packages
function, install the lubridate
package. [10]
[28] Named after CPAN, the Comprehensive Perl Archive Network.
[29] You can find a list of the most popular packages on the R-statistics blog.
[30] Be warned that some people on the R-help mailing list consider confusing the two pieces of terminology a capital offense.
[31] If you are deploying R code as part of an application, then the situation is different. Compatibility trumps convenience in that case.
[32] You can have multiple locations separated by semicolons, but this is usually overkill.