In data analysis, it is common to encounter date and time data types. Perhaps, the simplest functions related with date are Sys.Date()
, which returns the current date, and Sys.time()
, which returns the current time.
As the book is being rendered, the date is printed as follows:
Sys.Date()
## [1] "2016-02-26"
And the time is:
Sys.time()
## [1] "2016-02-26 22:12:25 CST"
From the output, the date and time look like character vectors, but actually they are not:
current_date <- Sys.Date() as.numeric(current_date) ## [1] 16857 current_time <- Sys.time() as.numeric(current_time) ## [1] 1456495945
They are, in essence, numeric values relative to an origin and have special methods to do date/time calculations. For a date, its numeric value means the number of days passed after 1970-01-01. For a time, its numeric value means the number of seconds passed after 1970-01-01 00:00.00 UTC.
We can create a date relative to a customized origin:
as.Date(1000, "1970-01-01")
## [1] "1972-09-27"
However, in more cases, we create date and time from a standard text representation:
my_date <- as.Date("2016-02-10")
my_date
## [1] "2016-02-10"
But if we can represent time in string such as 2016-02-10, then why do we need to create a Date
object like we did earlier? It is because a date has more features: we can do date math with them. Suppose we have a date object, we can add or minus a number of days and get a new date:
my_date + 3 ## [1] "2016-02-13" my_date + 80 ## [1] "2016-04-30" my_date - 65 ## [1] "2015-12-07"
We can directly subtract a date from another to get the difference in number of days between two dates:
date1 <- as.Date("2014-09-28") date2 <- as.Date("2015-10-20") date2 - date1 ## Time difference of 387 days
The output of date2 - date1
looks like a message, but it is actually a numeric value. We can make it explicit using as.numeric()
:
as.numeric(date2 - date1)
## [1] 387
Time is similar, but there is no function called as.Time()
. To create a date time from a text representation, we can use either as.POSIXct()
or as.POSIXlt()
. These two functions are different implementations of a date/time object under the POSIX standard. In the following example, we use as.POSIXlt
to create a date/time object:
my_time <- as.POSIXlt("2016-02-10 10:25:31")
my_time
## [1] "2016-02-10 10:25:31 CST"
This type of object also defines +
and -
for simple time calculations. Unlike the date object, it works at the unit of seconds rather than days:
my_time + 10 ## [1] "2016-02-10 10:25:41 CST" my_time + 12345 ## [1] "2016-02-10 13:51:16 CST" my_time - 1234567 ## [1] "2016-01-27 03:29:24 CST"
Given a string representation of date or time in data, we have to convert it to date or date/time objects, which enable us to do calculations. Often, however, what we get in raw data is not always the format that can be directly recognized by as.Date()
or as.POSIXlt()
. In this case, we need to use a set of special letters as placeholders to represent certain parts of a date or time, just like we did with sprintf()
.
For example, for the input 2015.07.25
, as.Date()
will produce an error if no format string is supplied:
as.Date("2015.07.25")
## Error in charToDate(x): character string is not in a standard unambiguous format
We can use a format string as a template to tell as.Date()
how to parse the string to a date:
as.Date("2015.07.25", format = "%Y.%m.%d")
## [1] "2015-07-25"
Similarly, for a non-standard date/time string, we also need to specify a template string to tell as.POSIXlt()
how to handle it:
as.POSIXlt("7/25/2015 09:30:25", format = "%m/%d/%Y %H:%M:%S")
## [1] "2015-07-25 09:30:25 CST"
An alternative (and more direct) function to convert a string to a date/time is strptime()
:
strptime("7/25/2015 09:30:25", "%m/%d/%Y %H:%M:%S")
## [1] "2015-07-25 09:30:25 CST"
In fact, as.POSIXlt()
is only a wrapper of strptime()
for character input, but strptime()
always requires that you supply the format string, while as.POSIXlt()
works for standard formats without a supplied template.
Just like numeric vectors, date and date/time are vectors too. You can supply a character vector to as.Date()
and get a vector of dates:
as.Date(c("2015-05-01", "2016-02-12")) ## [1] "2015-05-01" "2016-02-12"
The math is also vectorized. In the following code, we will add some consecutive integers to the date, and we get consecutive dates as expected:
as.Date("2015-01-01") + 0:2
## [1] "2015-01-01" "2015-01-02" "2015-01-03"
The same feature also applies to date/time objects:
strptime("7/25/2015 09:30:25", "%m/%d/%Y %H:%M:%S") + 1:3
## [1] "2015-07-25 09:30:26 CST" "2015-07-25 09:30:27 CST" ## [3] "2015-07-25 09:30:28 CST"
Sometimes, the data uses integer representations of date and time. It makes parsing the date and time trickier. For example, to parse 20150610
, we will run the following code:
as.Date("20150610", format = "%Y%m%d")
## [1] "2015-06-10"
To parse 20150610093215
, we can specify the template to describe such a format:
strptime("20150610093215", "%Y%m%d%H%M%S")
## [1] "2015-06-10 09:32:15 CST"
A trickier example is to parse the date/time in the following data frame:
datetimes <- data.frame( date = c(20150601, 20150603), time = c(92325, 150621))
If we use paste0()
on the columns of datetimes
and directly call strptime()
with the template used in the previous example, we will get a missing value that indicates that the first element is not consistent with the format:
dt_text <- paste0(datetimes$date, datetimes$time)dt_text ## [1] "2015060192325" "20150603150621" strptime(dt_text, "%Y%m%d%H%M%S") ## [1] NA "2015-06-03 15:06:21 CST"
The problem lies in 92325
, which should be 092325
. We need to use sprintf()
to make sure a leading zero is present when necessary:
dt_text2 <- paste0(datetimes$date, sprintf("%06d", datetimes$time))dt_text2 ## [1] "20150601092325" "20150603150621" strptime(dt_text2, "%Y%m%d%H%M%S") ## [1] "2015-06-01 09:23:25 CST" "2015-06-03 15:06:21 CST"
Finally, the conversion works as supposed.
In the previous section, you learned how to convert strings to date and date/time objects. In this section, you will learn the opposite: converting date and date/time objects back to strings according to a specific template.
Once a date object is created, every time we print it, it is always represented in the standard format:
my_date ## [1] "2016-02-10"
We can convert the date to a string in a standard representation with as.character()
:
date_text <- as.character(my_date)
date_text
## [1] "2016-02-10"
From the output, my_date
looks the same, but the string is now merely a plain text and no longer supports date calculations:
date_text + 1 ## Error in date_text + 1: non-numeric argument to binary operator
Sometimes, we need to format the date in a non-standard way:
as.character(my_date, format = "%Y.%m.%d")
## [1] "2016.02.10"
In fact, as.character()
calls format()
directly behind the scenes. We will get exactly the same result using format()
, and this is recommended in most cases:
format(my_date, "%Y.%m.%d")
## [1] "2016.02.10"
The same thing also applies to a date/time object. We can further customize the template to include more texts other than the placeholders:
my_time
## [1] "2016-02-10 10:25:31 CST"
format(my_time, "date: %Y-%m-%d, time: %H:%M:%S")
## [1] "date: 2016-02-10, time: 10:25:31"
There are a number of packages to make dealing with date and time much easier. I recommend the lubridate
package (https://cran.r-project.org/web/packages/lubridate) because it provides almost all the functions you need to work with date and time objects.
In the previous sections, you learned a number of basic functions to deal with strings and date/time objects. These functions are useful but much less flexible than regular expressions. You will learn this very powerful technique in the next section.