Chapter 6
Going on a Date with R
In This Chapter
Working with dates in R
Understanding the different ways of representing dates
Throwing time into the mix
Formatting dates and times for pretty printing
Operating on dates and times
All kinds of real-world data are associated with a specific date or instant in time. Companies report results each quarter. Stock markets report closing prices daily. Network analysts measure traffic by the hour (if not by the minute). And of course, scientists measure air temperature, sometimes by the minute, sometimes by the day, and have done so for decades.
Dealing with dates accurately can be a complicated task. You have to account for time-zone differences, leap years, and regional differences in holidays. In addition, people report data differently in different places. For example, what an American would write as “May 12, 2010” or “05-12-10” would be written by someone from the United Kingdom as “12 May 2010” or “12-05-10.” Working with a time instant on a specific day isn’t any easier. The same time may be written as 9:25 p.m., 21:25, or 21h25 — not to mention time zones!
In this chapter, we look at the different ways of representing dates and times using R. You take control of the format of dates and time for pretty printing. Then you do some math with dates — addition and subtraction. Finally, you use some tricks to extract specific elements, such as the month, from a date.
Working with Dates
R has a range of functions that allow you to work with dates and times. The easiest way of creating a date is to use the as.Date()
function. For example, you write the opening day of the 2012 London Olympic Games as:
> xd <- as.Date(“2012-07-27”)
> xd
[1] “2012-07-27”
> str(xd)
Date[1:1], format: “2012-07-27”
To find out what day of the week this is, use weekdays()
:
> weekdays(xd)
[1] “Friday”
You can add or subtract numbers from dates to create new dates. For example, to calculate the date that is seven days in the future, use the following:
> xd + 7
[1] “2012-08-03”
In the same way as with numbers or text, you can put multiple dates into a vector. To create a vector of seven days starting on July 27, add 0:6
to the starting date. (Remember: The colon operator generates integer sequences.)
> xd + 0:6
[1] “2012-07-27” “2012-07-28” “2012-07-29” “2012-07-30”
[5] “2012-07-31” “2012-08-01” “2012-08-02”
Because the weekdays()
function takes vector input, it returns the days of the week for this sequence:
> weekdays(xd + 0:6)
[1] “Friday” “Saturday” “Sunday” “Monday”
[5] “Tuesday” “Wednesday” “Thursday”
> startDate <- as.Date(“2012-01-01”)
> xm <- seq(startDate, by=”2 months”, length.out=6)
> xm
[1] “2012-01-01” “2012-03-01” “2012-05-01” “2012-07-01”
[5] “2012-09-01” “2012-11-01”
In addition to weekdays()
, you also can get R to report on months()
and quarters()
:
> months(xm)
[1] “January” “March” “May” “July”
[5] “September” “November”
> quarters(xm)
[1] “Q1” “Q1” “Q2” “Q3” “Q3” “Q4”
To view the locale settings on your machine, try the following:
> Sys.localeconv()
Table 6-1 summarizes some useful functions for working with dates.
Table 6-1 Useful Functions with Dates
Function |
Description |
|
Converts character string to |
|
Full weekday name in the current locale (for example, Sunday, Monday, Tuesday) |
|
Full month name in the current locale (for example, January, February, March) |
|
Quarter numbers (Q1, Q2, Q3, or Q4) |
|
Generates dates sequences if you pass it a |
Presenting Dates in Different Formats
You’ve probably noticed that as.Date()
is fairly prescriptive in its defaults: It expects the date to be formatted in the order of year, month, and day. Fortunately, R allows you flexibility in specifying the date format.
> as.Date(“27 July 2012”, format=”%d %B %Y”)
[1] “2012-07-27”
This rather cryptic line of code indicates that the date format consists of the day (%d
), full month name (%B
), and the year with century (%Y
), with spaces between each element.
Table 6-2 lists some of the many date formatting elements that you can use to specify dates. You can access the full list by typing ?strptime
in your R console.
Table 6-2 Some Format Codes for Dates (For Use with as.Date, POSXct, POSIXlt, and strptime)
Format |
Description |
|
Year with century. |
|
Year without century (00–99). Values 00 to 68 are prefixed by 20, and values 69 to 99 are prefixed by 19. |
|
Month as decimal number (01–12). |
|
Full month name in the current locale. (Also matches abbreviated name on input.) |
|
Abbreviated month name in the current locale. (Also matches full name on input.) |
|
Day of the month as a decimal number (01–31). You don’t need to add the leading zero when converting text to |
|
Full weekday name in the current locale. (Also matches abbreviated name on input.) |
|
Abbreviated weekday name in the current locale. (Also matches full name on input.) |
|
Weekday as decimal number (0–6, with Sunday being 0). |
Try the formatting codes with another common date format, “27/7/2012” (that is, day, month, and year separated by a slash):
> as.Date(“27/7/12”, format=”%d/%m/%y”)
[1] “2012-07-27”
Adding Time Information to Dates
Often, referring only to dates isn’t enough. You also need to indicate a specific time in hours and minutes.
According to Wikipedia, the time of the Apollo 11 moon landing was July 20, 1969, at 20:17:39 UTC. (UTC is the acronym for Coordinated Universal Time. It’s how the world’s clocks are regulated.) To express this date and time in R, try the following:
> apollo <- “July 20, 1969, 20:17:39”
> apollo.fmt <- “%B %d, %Y, %H:%M:%S”
> xct <- as.POSIXct(apollo, format=apollo.fmt, tz=”UTC”)
> xct
[1] “1969-07-20 20:17:39 UTC”
Table 6-3 lists additional formatting codes that are useful when working with time information in dates.
Table 6-3 Formatting Codes for the Time Element of POSIXct and POSIXlt Datetimes
Format |
Description |
|
Hours as a decimal number (00–23) |
|
Hours as a decimal number (01–12) |
|
Minutes as a decimal number (00–59) |
|
Seconds as a decimal number (00–61) |
|
AM/PM indicator |
Formatting Dates and Times
To format a date for pretty printing, you use format()
, which takes a POSIXct
or POSIXlt
datetime as input, together with a formatting string. You already have encountered a formatting string when creating a date.
Continuing with the example where the object xct
is the day and time of the Apollo landing, you can format this date and time in many different ways. For examples, to format it as DD/MM/YY
, try:
> format(xct, “%d/%m/%y”)
[1] “20/07/69”
> format(xct, “%S minutes past %I %p, on %d %B %Y”)
[1] “39 minutes past 08 PM, on 20 July 1969”
Performing Operations on Dates and Times
Because R stores datetime objects as numbers, you can do various operations on dates, including addition, subtraction, comparison, and extraction.
Addition and subtraction
Because R stores objects of class POSIXct
as the number of seconds since the epoch (usually the start of 1970), you can do addition and subtraction by adding or subtracting seconds. It’s more common to add or subtract days from dates, so it’s useful to know that each day has 86,400 seconds.
> 24*60*60
[1] 86400
So, to add seven days to the Apollo moon landing date, use addition, just remember to multiply the number of days by the number of seconds per day:
> xct + 7*86400
[1] “1969-07-27 20:17:39 UTC”
After you know that you can convert any duration to seconds, you can add or subtract any value to a datetime object. For example, add three hours to the time of the Apollo moon landing:
> xct + 3*60*60
[1] “1969-07-20 23:17:39 UTC”
Similarly, to get a date seven days earlier, use subtraction:
> xct - 7*86400
[1] “1969-07-13 20:17:39 UTC”
Try that yourself, first converting xct
to a Date
object, then subtracting 7:
> as.Date(xct) - 7
[1] “1969-07-13”
Comparison of dates
Similar to the way that you can add or subtract states you can also compare dates with the comparison operators, such as less than (<
) or greater than (>
), covered in Chapter 5.
Say you want to compare the current time with any fixed time. In R, you use the Sys.time()
function to get the current system time:
> Sys.time()
[1] “2012-03-24 10:12:52 GMT”
Now you know the exact time when we wrote this sentence. Clearly when you try the same command you will get a different result!
Now you can compare your current system time with the time of the Apollo landing:
> Sys.time() < xct
[1] FALSE
If your system clock is accurate, then obviously you would expect the result to be false, because the moon landing happened more than 40 years ago.
As we cover in Chapter 5, the comparison operators are vectorized, so you can compare an entire vector of dates with the moon landing date. Try to use all your knowledge of dates, sequences of dates, and comparison operators to compare the start of several decades to the moon landing date.
Start by creating a POSIXct
object containing the first day of 1950. Then use seq()
to create a sequence with intervals of ten years:
> dec.start <- as.POSIXct(“1950-01-01”)
> dec <- seq(dec.start, by=”10 years”, length.out=4)
> dec
[1] “1950-01-01 GMT” “1960-01-01 GMT” “1970-01-01 GMT”
[4] “1980-01-01 GMT”
Finally, you can compare your new vector dec
with the moon landing date:
> dec > xct
[1] FALSE FALSE TRUE TRUE
As you can see, the first two results (comparing to 1950 and 1960) are FALSE
, and the last two values (comparing to 1970 and 1980) are TRUE
.
Extraction
Another thing you may want to do is to extract specific elements of the date, such as the day, month, or year. For example, scientists may want to compare the weather in a specific month (say, January) for many different years. To do this, they first have to determine the month, by extracting the months from the datetime object.
An easy way to achieve this is to work with dates in the POSIXlt
class, because this type of data is stored internally as a named list, which enables you to extract elements by name. To do this, first convert the Date
class:
> xlt <- as.POSIXlt(xct)
> xlt
[1] “1969-07-20 20:17:39 UTC”
Next, use the $
operator to extract the different elements. For example, to get the year, use the following:
> xlt$year
[1] 69
And to get the month, use the following:
> xlt$mon
[1] 6
> unclass(xlt)
If you run this line of code, you’ll see that POSIXlt
objects are really just named lists. You get to work with lists in much more detail in Chapter 7.