Chapter 6

Going on a Date with R

In This Chapter

arrow Working with dates in R

arrow Understanding the different ways of representing dates

arrow Throwing time into the mix

arrow Formatting dates and times for pretty printing

arrow Operating on dates and times

All kinds of real-world data are associated with a specific date or instant in time. Companies report results each quarter. Stock markets report closing prices daily. Network analysts measure traffic by the hour (if not by the minute). And of course, scientists measure air temperature, sometimes by the minute, sometimes by the day, and have done so for decades.

Dealing with dates accurately can be a complicated task. You have to account for time-zone differences, leap years, and regional differences in holidays. In addition, people report data differently in different places. For example, what an American would write as “May 12, 2010” or “05-12-10” would be written by someone from the United Kingdom as “12 May 2010” or “12-05-10.” Working with a time instant on a specific day isn’t any easier. The same time may be written as 9:25 p.m., 21:25, or 21h25 — not to mention time zones!

In this chapter, we look at the different ways of representing dates and times using R. You take control of the format of dates and time for pretty printing. Then you do some math with dates — addition and subtraction. Finally, you use some tricks to extract specific elements, such as the month, from a date.

Working with Dates

R has a range of functions that allow you to work with dates and times. The easiest way of creating a date is to use the as.Date() function. For example, you write the opening day of the 2012 London Olympic Games as:

> xd <- as.Date(“2012-07-27”)

> xd

[1] “2012-07-27”

> str(xd)

Date[1:1], format: “2012-07-27”

tip.eps This works because the default format for dates in as.Date() is YYYY-MM-DD — four digits for year, and two digits for month and day, separated by a hyphen. In the next section, you get to specify dates in different formats.

To find out what day of the week this is, use weekdays():

> weekdays(xd)

[1] “Friday”

You can add or subtract numbers from dates to create new dates. For example, to calculate the date that is seven days in the future, use the following:

> xd + 7

[1] “2012-08-03”

In the same way as with numbers or text, you can put multiple dates into a vector. To create a vector of seven days starting on July 27, add 0:6 to the starting date. (Remember: The colon operator generates integer sequences.)

> xd + 0:6

[1] “2012-07-27” “2012-07-28” “2012-07-29” “2012-07-30”

[5] “2012-07-31” “2012-08-01” “2012-08-02”

Because the weekdays() function takes vector input, it returns the days of the week for this sequence:

> weekdays(xd + 0:6)

[1] “Friday”    “Saturday”  “Sunday”    “Monday”

[5] “Tuesday”   “Wednesday” “Thursday”

tip.eps You can use the seq() function to create sequences of dates in a far more flexible way. As with numeric vectors, you have to specify at least three of the arguments (from, to, by, and length.out). However, in the case of Date objects, the by argument is very flexible. You specify by as a string consisting of a number followed by days, weeks, or months. Imagine you want to create a sequence of every second month of 2012, starting at January 1:

> startDate <- as.Date(“2012-01-01”)

> xm <- seq(startDate, by=”2 months”, length.out=6)

> xm

[1] “2012-01-01” “2012-03-01” “2012-05-01” “2012-07-01”

[5] “2012-09-01” “2012-11-01”

In addition to weekdays(), you also can get R to report on months() and quarters():

> months(xm)

[1] “January”   “March”     “May”       “July”

[5] “September” “November”

> quarters(xm)

[1] “Q1” “Q1” “Q2” “Q3” “Q3” “Q4”

technicalstuff.eps The results of many date functions, including weekdays() and months() depends on the locale of the machine you’re working on. The locale describes elements of international customization on a specific installation of R. This includes date formats, language settings, and currency settings. To find out some of the locale settings on your machine, use Sys.localeconv(). R sets the value of these variables at install time by interrogating the operating system for details. You can change these settings at runtime or during the session with Sys.setlocale().

To view the locale settings on your machine, try the following:

> Sys.localeconv()

Table 6-1 summarizes some useful functions for working with dates.

Table 6-1 Useful Functions with Dates

Function

Description

as.Date()

Converts character string to Date

weekdays()

Full weekday name in the current locale (for example, Sunday, Monday, Tuesday)

months()

Full month name in the current locale (for example, January, February, March)

quarters()

Quarter numbers (Q1, Q2, Q3, or Q4)

seq()

Generates dates sequences if you pass it a Date object as its first argument

Presenting Dates in Different Formats

You’ve probably noticed that as.Date() is fairly prescriptive in its defaults: It expects the date to be formatted in the order of year, month, and day. Fortunately, R allows you flexibility in specifying the date format.

tip.eps By using the format argument of as.Date(), you can convert any date format into a Date object. For example, to convert “27 July 2012” into a date, use the following:

> as.Date(“27 July 2012”, format=”%d %B %Y”)

[1] “2012-07-27”

This rather cryptic line of code indicates that the date format consists of the day (%d), full month name (%B), and the year with century (%Y), with spaces between each element.

Table 6-2 lists some of the many date formatting elements that you can use to specify dates. You can access the full list by typing ?strptime in your R console.

Table 6-2 Some Format Codes for Dates (For Use with as.Date, POSXct, POSIXlt, and strptime)

Format

Description

%Y

Year with century.

%y

Year without century (00–99). Values 00 to 68 are prefixed by 20, and values 69 to 99 are prefixed by 19.

%m

Month as decimal number (01–12).

%B

Full month name in the current locale. (Also matches abbreviated name on input.)

%b

Abbreviated month name in the current locale. (Also matches full name on input.)

%d

Day of the month as a decimal number (01–31). You don’t need to add the leading zero when converting text to Date, but when you format a Date as text, R adds the leading zero.

%A

Full weekday name in the current locale. (Also matches abbreviated name on input.)

%a

Abbreviated weekday name in the current locale. (Also matches full name on input.)

%w

Weekday as decimal number (0–6, with Sunday being 0).

Try the formatting codes with another common date format, “27/7/2012” (that is, day, month, and year separated by a slash):

> as.Date(“27/7/12”, format=”%d/%m/%y”)

[1] “2012-07-27”

Adding Time Information to Dates

Often, referring only to dates isn’t enough. You also need to indicate a specific time in hours and minutes.

tip.eps To specify time information in addition to dates, you can choose between two functions in R: as.POSIXct() and as.POSIXlt(). These two datetime functions differ in the way that they store date information internally, as well as in the way that you can extract date and time elements. (For more on these two functions, see the nearby sidebar, “The two datetime functions.”)

technicalstuff.eps POSIX is the name of a set of standards that refers to the UNIX operating system. POSIXct refers to a time that is internally stored as the number of seconds since the start of 1970, by default. (You can modify the origin year by setting the origin argument to POSIXct().) POSIXlt refers to a date stored as a names list of vectors for the year, month, day, hours, and minutes.

According to Wikipedia, the time of the Apollo 11 moon landing was July 20, 1969, at 20:17:39 UTC. (UTC is the acronym for Coordinated Universal Time. It’s how the world’s clocks are regulated.) To express this date and time in R, try the following:

> apollo <- “July 20, 1969, 20:17:39”

> apollo.fmt <- “%B %d, %Y, %H:%M:%S”

> xct <- as.POSIXct(apollo, format=apollo.fmt, tz=”UTC”)

> xct

[1] “1969-07-20 20:17:39 UTC”

tip.eps As you can see, as.POSIXct() takes similar arguments to as.Date(), but you need to specify the date format as well as the time zone.

Table 6-3 lists additional formatting codes that are useful when working with time information in dates.

Table 6-3 Formatting Codes for the Time Element of POSIXct and POSIXlt Datetimes

Format

Description

%H

Hours as a decimal number (00–23)

%I

Hours as a decimal number (01–12)

%M

Minutes as a decimal number (00–59)

%S

Seconds as a decimal number (00–61)

%p

AM/PM indicator


Formatting Dates and Times

To format a date for pretty printing, you use format(), which takes a POSIXct or POSIXlt datetime as input, together with a formatting string. You already have encountered a formatting string when creating a date.

Continuing with the example where the object xct is the day and time of the Apollo landing, you can format this date and time in many different ways. For examples, to format it as DD/MM/YY, try:

> format(xct, “%d/%m/%y”)

[1] “20/07/69”

tip.eps In addition to the formatting codes, you can use any other character. If you want to format the xct datetime as a sentence, try the following:

> format(xct, “%S minutes past %I %p, on %d %B %Y”)

[1] “39 minutes past 08 PM, on 20 July 1969”

remember.eps You can find the formatting codes in Table 6-2 and Table 6-3, as well as at the Help page ?strptime.

Performing Operations on Dates and Times

Because R stores datetime objects as numbers, you can do various operations on dates, including addition, subtraction, comparison, and extraction.

Addition and subtraction

Because R stores objects of class POSIXct as the number of seconds since the epoch (usually the start of 1970), you can do addition and subtraction by adding or subtracting seconds. It’s more common to add or subtract days from dates, so it’s useful to know that each day has 86,400 seconds.

> 24*60*60

[1] 86400

So, to add seven days to the Apollo moon landing date, use addition, just remember to multiply the number of days by the number of seconds per day:

> xct + 7*86400

[1] “1969-07-27 20:17:39 UTC”

After you know that you can convert any duration to seconds, you can add or subtract any value to a datetime object. For example, add three hours to the time of the Apollo moon landing:

> xct + 3*60*60

[1] “1969-07-20 23:17:39 UTC”

Similarly, to get a date seven days earlier, use subtraction:

> xct - 7*86400

[1] “1969-07-13 20:17:39 UTC”

warning_bomb.eps There is an important difference between Date objects and POSIXct or POSIXlt objects. If you use a Date object, you add and subtract days; with POSIXct and POSIXlt, the operations add or subtract only seconds.

Try that yourself, first converting xct to a Date object, then subtracting 7:

> as.Date(xct) - 7

[1] “1969-07-13”

Comparison of dates

Similar to the way that you can add or subtract states you can also compare dates with the comparison operators, such as less than (<) or greater than (>), covered in Chapter 5.

Say you want to compare the current time with any fixed time. In R, you use the Sys.time() function to get the current system time:

> Sys.time()

[1] “2012-03-24 10:12:52 GMT”

Now you know the exact time when we wrote this sentence. Clearly when you try the same command you will get a different result!

Now you can compare your current system time with the time of the Apollo landing:

> Sys.time() < xct

[1] FALSE

If your system clock is accurate, then obviously you would expect the result to be false, because the moon landing happened more than 40 years ago.

As we cover in Chapter 5, the comparison operators are vectorized, so you can compare an entire vector of dates with the moon landing date. Try to use all your knowledge of dates, sequences of dates, and comparison operators to compare the start of several decades to the moon landing date.

Start by creating a POSIXct object containing the first day of 1950. Then use seq() to create a sequence with intervals of ten years:

> dec.start <- as.POSIXct(“1950-01-01”)

> dec <- seq(dec.start, by=”10 years”, length.out=4)

> dec

[1] “1950-01-01 GMT” “1960-01-01 GMT” “1970-01-01 GMT”

[4] “1980-01-01 GMT”

Finally, you can compare your new vector dec with the moon landing date:

> dec > xct

[1] FALSE FALSE  TRUE  TRUE

As you can see, the first two results (comparing to 1950 and 1960) are FALSE, and the last two values (comparing to 1970 and 1980) are TRUE.

Extraction

Another thing you may want to do is to extract specific elements of the date, such as the day, month, or year. For example, scientists may want to compare the weather in a specific month (say, January) for many different years. To do this, they first have to determine the month, by extracting the months from the datetime object.

An easy way to achieve this is to work with dates in the POSIXlt class, because this type of data is stored internally as a named list, which enables you to extract elements by name. To do this, first convert the Date class:

> xlt <- as.POSIXlt(xct)

> xlt

[1] “1969-07-20 20:17:39 UTC”

Next, use the $ operator to extract the different elements. For example, to get the year, use the following:

> xlt$year

[1] 69

And to get the month, use the following:

> xlt$mon

[1] 6

technicalstuff.eps You can use the unclass() function to expose the internal structure of POSIXlt objects.

> unclass(xlt)

If you run this line of code, you’ll see that POSIXlt objects are really just named lists. You get to work with lists in much more detail in Chapter 7.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset