Formatting date/time

In data analysis, it is common to encounter date and time data types. Perhaps, the simplest functions related with date are Sys.Date(), which returns the current date, and Sys.time(), which returns the current time.

As the book is being rendered, the date is printed as follows:

Sys.Date()
## [1] "2016-02-26"

And the time is:

Sys.time()
## [1] "2016-02-26 22:12:25 CST"

From the output, the date and time look like character vectors, but actually they are not:

current_date <- Sys.Date()
as.numeric(current_date)
## [1] 16857
current_time <- Sys.time()
as.numeric(current_time)
## [1] 1456495945

They are, in essence, numeric values relative to an origin and have special methods to do date/time calculations. For a date, its numeric value means the number of days passed after 1970-01-01. For a time, its numeric value means the number of seconds passed after 1970-01-01 00:00.00 UTC.

Parsing text as date/time

We can create a date relative to a customized origin:

as.Date(1000, "1970-01-01")
## [1] "1972-09-27"

However, in more cases, we create date and time from a standard text representation:

my_date <- as.Date("2016-02-10") 
my_date
## [1] "2016-02-10"

But if we can represent time in string such as 2016-02-10, then why do we need to create a Date object like we did earlier? It is because a date has more features: we can do date math with them. Suppose we have a date object, we can add or minus a number of days and get a new date:

my_date + 3
## [1] "2016-02-13"
my_date + 80
## [1] "2016-04-30"
my_date - 65
## [1] "2015-12-07"

We can directly subtract a date from another to get the difference in number of days between two dates:

date1 <- as.Date("2014-09-28") 
date2 <- as.Date("2015-10-20") 
date2 - date1
## Time difference of 387 days

The output of date2 - date1 looks like a message, but it is actually a numeric value. We can make it explicit using as.numeric():

as.numeric(date2 - date1)
## [1] 387

Time is similar, but there is no function called as.Time(). To create a date time from a text representation, we can use either as.POSIXct() or as.POSIXlt(). These two functions are different implementations of a date/time object under the POSIX standard. In the following example, we use as.POSIXlt to create a date/time object:

my_time <- as.POSIXlt("2016-02-10 10:25:31") 
my_time
## [1] "2016-02-10 10:25:31 CST"

This type of object also defines + and - for simple time calculations. Unlike the date object, it works at the unit of seconds rather than days:

my_time + 10
## [1] "2016-02-10 10:25:41 CST"
my_time + 12345
## [1] "2016-02-10 13:51:16 CST"
my_time - 1234567
## [1] "2016-01-27 03:29:24 CST"

Given a string representation of date or time in data, we have to convert it to date or date/time objects, which enable us to do calculations. Often, however, what we get in raw data is not always the format that can be directly recognized by as.Date() or as.POSIXlt(). In this case, we need to use a set of special letters as placeholders to represent certain parts of a date or time, just like we did with sprintf().

For example, for the input 2015.07.25as.Date() will produce an error if no format string is supplied:

as.Date("2015.07.25")
## Error in charToDate(x): character string is not in a standard unambiguous format

We can use a format string as a template to tell as.Date() how to parse the string to a date:

as.Date("2015.07.25", format = "%Y.%m.%d")
## [1] "2015-07-25"

Similarly, for a non-standard date/time string, we also need to specify a template string to tell as.POSIXlt() how to handle it:

as.POSIXlt("7/25/2015 09:30:25", format = "%m/%d/%Y %H:%M:%S")
## [1] "2015-07-25 09:30:25 CST"

An alternative (and more direct) function to convert a string to a date/time is strptime():

strptime("7/25/2015 09:30:25", "%m/%d/%Y %H:%M:%S")
## [1] "2015-07-25 09:30:25 CST"

In fact, as.POSIXlt() is only a wrapper of strptime() for character input, but strptime() always requires that you supply the format string, while as.POSIXlt() works for standard formats without a supplied template.

Just like numeric vectors, date and date/time are vectors too. You can supply a character vector to as.Date() and get a vector of dates:

as.Date(c("2015-05-01", "2016-02-12"))
## [1] "2015-05-01" "2016-02-12"

The math is also vectorized. In the following code, we will add some consecutive integers to the date, and we get consecutive dates as expected:

as.Date("2015-01-01") + 0:2
## [1] "2015-01-01" "2015-01-02" "2015-01-03"

The same feature also applies to date/time objects:

strptime("7/25/2015 09:30:25", "%m/%d/%Y %H:%M:%S") + 1:3
## [1] "2015-07-25 09:30:26 CST" "2015-07-25 09:30:27 CST" ## [3] "2015-07-25 09:30:28 CST"

Sometimes, the data uses integer representations of date and time. It makes parsing the date and time trickier. For example, to parse 20150610, we will run the following code:

as.Date("20150610", format = "%Y%m%d")
## [1] "2015-06-10"

To parse 20150610093215, we can specify the template to describe such a format:

strptime("20150610093215", "%Y%m%d%H%M%S")
## [1] "2015-06-10 09:32:15 CST"

A trickier example is to parse the date/time in the following data frame:

datetimes <- data.frame(
date = c(20150601, 20150603), 
time = c(92325, 150621))

If we use paste0()on the columns of datetimes and directly call strptime() with the template used in the previous example, we will get a missing value that indicates that the first element is not consistent with the format:

dt_text <- paste0(datetimes$date, datetimes$time)dt_text
## [1] "2015060192325" "20150603150621"
strptime(dt_text, "%Y%m%d%H%M%S")
## [1] NA "2015-06-03 15:06:21 CST"

The problem lies in 92325, which should be 092325. We need to use sprintf() to make sure a leading zero is present when necessary:

dt_text2 <- paste0(datetimes$date, sprintf("%06d", datetimes$time))dt_text2
## [1] "20150601092325" "20150603150621"
strptime(dt_text2, "%Y%m%d%H%M%S")
## [1] "2015-06-01 09:23:25 CST" "2015-06-03 15:06:21 CST"

Finally, the conversion works as supposed.

Formatting date/time to strings

In the previous section, you learned how to convert strings to date and date/time objects. In this section, you will learn the opposite: converting date and date/time objects back to strings according to a specific template.

Once a date object is created, every time we print it, it is always represented in the standard format:

my_date
## [1] "2016-02-10"

We can convert the date to a string in a standard representation with as.character():

date_text <- as.character(my_date) 
date_text
## [1] "2016-02-10"

From the output, my_date looks the same, but the string is now merely a plain text and no longer supports date calculations:

date_text + 1
## Error in date_text + 1: non-numeric argument to binary operator

Sometimes, we need to format the date in a non-standard way:

as.character(my_date, format = "%Y.%m.%d")
## [1] "2016.02.10"

In fact, as.character() calls format() directly behind the scenes. We will get exactly the same result using format(), and this is recommended in most cases:

format(my_date, "%Y.%m.%d")
## [1] "2016.02.10"

The same thing also applies to a date/time object. We can further customize the template to include more texts other than the placeholders:

my_time
## [1] "2016-02-10 10:25:31 CST"
format(my_time, "date: %Y-%m-%d, time: %H:%M:%S")
## [1] "date: 2016-02-10, time: 10:25:31"

Note

The format placeholders are much more than we mentioned. Read the documentation by typing in ?strptime for detailed information.

There are a number of packages to make dealing with date and time much easier. I recommend the lubridate package (https://cran.r-project.org/web/packages/lubridate) because it provides almost all the functions you need to work with date and time objects.

In the previous sections, you learned a number of basic functions to deal with strings and date/time objects. These functions are useful but much less flexible than regular expressions. You will learn this very powerful technique in the next section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset