Chapter 2: Datetime Variables in SAS

2.1 Datetime Variables

2.2 Output Formats

2.3 Importing Datetime Variables

2.4 Handling Datetime Variables

2.5 Time Series Data Sets

2.1 Datetime Variables

When working with time series data, it is important to completely control the actual points in time for the observations. Our calendar is defined in a very messy, unsystematic way, but graphs and output from analyses should present dates and times in a layout that is easily understood by everyone. This chapter gives a brief presentation of the many facilities (formats, functions, and so on) that SAS provides in order to help you handle date-time observations in the correct way.

In a SAS data set, a date is coded as the number of days since January 1, 1960. The number zero (0) represents January 1, 1960, the number 1 represents January 2, and 365 represents the date December 31,1960 (because the year 1960 was a leap year). New Year’s Eve, 2007, is represented by the number 17531. Dates before 1960 are expressed by negative numbers; for example, -1 represents December 31,1959. Variables that include both a date and the precise time of day are expressed as the number of seconds since the exact turn of the year 1959/1960. Even fractions of seconds are allowed.

Such numbers are meaningless at first glance, but thanks to the rich variety of formats provided by SAS, they can be expressed as normal dates in many ways and in many languages. These facilities allow for importing, handling, and exporting (for example, printing) SAS datetime variables. In this book, only a few datetime formats are applied. For more information, see Morgan (2006).

2.2 Output Formats

Through output formats, SAS results are presented in a readable way. A number representing an amount in dollars is printed including a dollar sign ($), and percentages are printed using a percent sign (%). The numbers of digits and other items in presentations are also handled by a rich variety of formats. The cryptic definition of dates and times as the number of days or seconds since New Year’s Eve 1959 are similarly translated to meaningful dates and times using special output formats. Such output formats change only how numbers are printed; the underlying number is unaffected.

When dates are printed in a monthly format, the actual date is suppressed. This means that all days in January 2010 are written as JAN10. But suppose some day of the month has to be included in the SAS data set variable, perhaps as the 15th of each month. Easy ways to construct date variables are presented in Sections 2.3 and 2.4.

Monthly data could be formatted by the monyy7. format. Note that all formats in SAS end with a period. This gives an American English format to the month, with a three-letter abbreviation of the name of the month and a four-digit number for the year, such as JAN2010, which is a useful compressed form for applications in graphical presentations. If the number 7 is changed to 5, the format monyy5. renders only the last two digits of the year, such as JAN10 for January 2010. Usually, this provides a clear notation, but for series that consist of hundreds of years, it is, of course, ambiguous.

Often the format is coded in the very construction of the data set, which means that the user does not even have to think about the formatting of date variables. But formats could also be applied in procedure steps such as in the simple application shown in Program 2.1. This program merely prints the observations in the data set of overnight stays by US citizens at hotels in Denmark. This data set is used in later chapters of this book.

Program 2.1 Inclusion of a format in a procedure call

PROC PRINT data=sasts.hotel;
var date;
format date monyy5.;
run;

In this book, the formats are coded in the data sets. A specification of a format is not necessary in procedure calls, so you don’t have to worry about formats. Again, see Morgan (2006) for more details.

A useful format that includes the date is mmddyy10., which presents the date in an American English way (12/31/2007). If the number of places in the format (the number 10) is reduced, the text becomes shorter. The shortest version is mmddyy6., which removes the slashes and the first two digits for the year, so that it is printed as 123107.These formats are illustrated by Program 2.2.

Program 2.2 Inclusion of a detailed date format in a procedure call

PROC PRINT data=sasts.arctic_ice_daily;
var date;
format date mmddyy6.;
run;

SAS also includes formats for presenting dates in styles other than American English. A sequence of formats with the prefix EUR presents dates in the European style. European SAS installations automatically invoke options like dflang=German or dflang=Danish, which write dates in a German or Danish style. Users who often switch between European languages could use this facility to apply the same format with the prefix EUR and change the DFLANG option to the relevant language. The formats corresponding to monyy5. and monyy7. are eurdfmy5. and eurdfmy7..These formats are illustrated by Program 2.3.

Program 2.3 Inclusion of European date formats in procedure calls

options dflang=German;
PROC PRINT data=sasts.hotel;
var date;
format date eurdfmy7.;
run;
options dflang=Danish;
PROC PRINT data=sasts.hotel;
var date;
format date eurdfmy5.;
run;

The output from this program writes March 2010 as Mrz2010 for German and mar10 for Danish (in Danish the name of a month is not capitalized).

Another possibility is to apply specific formats that are made for the language in question. These formats typically have a prefix like DEU (for German) and DAN (for Danish). This, however, means that the relevant language should be specified every time a format for dates in local language is applied. Program 2.4 renders exactly the same output as Program 2.3.

Program 2.4 Inclusion of country-specific date formats

PROC PRINT data=sasts.hotel;
var date;
format date deudfmy7.;
run;
PROC PRINT data=sasts.hotel;
var date;
format date dandfmy5.;
run;

Date values in local languages could also be obtained by using the formats with prefix EUR. For example, the format eurdfdd10., which writes the day before the month. New Year’s Eve 2007 becomes 31.12.2007, which is standard for most European languages.

Another possibility is to use a series of formats with prefix NL. These formats are determined by the value of the LOCALE option. One example, shown in Program 2.5, is the format nldate12., which in a French installation renders 31/12/07. Just as for the EUR prefix, the idea is to make it easy to change among languages.

Program 2.5 Application of automatically generated country-specific date formats

options locale=French;
PROC PRINT data=sasts.arctic_ice_daily;
var date;
format date nldate12.;
run;

The time part of a datetime variable is by definition the number of seconds since New Year’s 1959/1960. The time part alone could be written by the format tod12., which abbreviates Time Of Day. Similarly, information for both the date and the time of day in hours, minutes, and seconds could be written by the format datetime20. The numbers could be varied according to the degree of abbreviation needed in the design of the output. Program 2.6 shows the time with no date and the combination of a date and the time for the same datetime variable in the original data set. Also, the number of seconds, which is the value actually applied in the code, is printed as the variable d3 with no assigned format; see Output 2.1. If the date is needed for a variable that is specified as a datetime variable with no reference to the time, it is not enough to apply a specific format. Instead, it should be extracted by the function datepart, which converts the number of seconds since 1960 to the number of days since 1960.

Program 2.6 An application of various datetime formats

data speed1;
set sasts.speed(obs=10);
d1=dt;d2=dt;d3=dt;
run;
PROC PRINT data=speed1 noobs;
var d1-d3;
format d1 tod12.;
format d2 datetime20.;
run;

Output 2.1 A print of a datetime variable using different formats

Output 2.1 A print of a datetime variable using different formats

Formats also allow for specifications of the time, including decimal fractions of seconds. An example is the data set of movements of a baby for which the movement of the left arm is recorded 60 times a second. A short portion of the data set is printed by Program 2.7, and the relevant portion of the output is shown in Output 2.2. The date is irrelevant in this example. By default, the date is set to zero, which in SAS terminology means that the datetime20.3 format will write the date as January 1. 1960. The fractional part of the seconds is written using three decimal places, as specified by the format. The data set is huge, so only five observations are printed by applying the firstobs=12345 obs=12349 options in the line of the procedure call.

Program 2.7 Printing using both a time format and a datetime format

PROC PRINT data=sasts.baby (firstobs=12345 obs=12349);
var time;
format time tod12.3;
run;
PROC PRINT data=sasts.baby (firstobs=12345 obs=12349);
var time;
format time datetime20.3;
run;

Output 2.2 Print output from Program 2.7

Output 2.2 Print output from Program 2.7

2.3 Importing Datetime Variables

When data is imported from software products other than SAS by using the various import engines in SAS, most datetime variables are recognized and converted to proper SAS datetime variables with convenient formats. Often the user needs to do nothing because the conversion is done automatically. These possibilities are subject to constant changes from version to version of SAS due to changes in the other software products. This means that any book on the subject very quickly becomes obsolete; you should refer to the most current SAS Help for more information. The reading of date variables is technically done by the informat ANYDTDTE, which tries to convert the actual text and numbers into dates in a meaningful way. The number 12252001 could in this way easily be recognized as Christmas Day 2001 and converted to a date variable in the way that SAS expresses dates (the number of days since January 1, 1960).

Some data sources present dates in styles that are not immediately recognized by SAS. One possibility is to consider other settings for the DATESTYLE option. This option affects the order in which the day and month part is written. In American English installations, the default is DATESTYLE=MDY, and DATESTYLE=DMY is used in Europe. If a user with a European installation of SAS attempts to read American date variables, the ANYDTDTE format does not work without specific use of the option DATESTYLE=MDY.

Abbreviation of years written with four digits to only two digits is ambiguous. The option yearcutoff tells how two digits are read as years. The value option yearcutoff=1932 specifies that two-digit numbers 32 or greater indicate years in the twentieth century (for example, 41 indicates 1941), but two-digit numbers up to and including 31 indicate years after year 2000 (18 indicates 2018). The default is yearcutoff=1920.

In Program 2.8, the ANYDTDTE informat anydtdte20. is applied for three lines of input that obviously represent date values. The last line specifies only the month December with no specification of the date, so the date is set to December 1; see Output 2.3.

Program 2.8 Input of a date variable using an automatic input format

data d3;
input date1 anydtdte20.;
format date1 date10.;
datalines;
12022005
122005
dec05
;
run;
PROC PRINT;
run;

Output 2.3 Result of the automatic input of a date variable

Output 2.3 Result of the automatic input of a date variable

If the same code as in Program 2.8 is applied in a European installation that sets the option datestyle=dmy, the result is different. The first line results in February 12, instead of December 2.

Output 2.4 Result of the automatic input of a date variable in a European installation

Output 2.4 Result of the automatic input of a date variable in a European installation

The name ANYDTDTE is short for ANY DateTime DaTE because it extracts the date part of the import even if time information is also given. Similarly, the ANYDTDTM (ANYDateTime DateTiMe) informat extracts the datetime part, and the ANYDTTME (ANY DateTime TiME) informat extracts only the time part and neglects the date.

An example of the flexible ANYDTDTM informat (using the option datestyle=mdy as for American installations) including the time of the day, is shown in Program 2.9, which results in Output 2.5.

Program 2.9 Input of a datetime variable using a flexible input format

data s1;
input dt ANYDTDTM25.;
format dt datetime16.10;
datalines;
06-06-02:15:25:51
06/06/02/15/25/54
11-06-02,15:25:57
06.11.02,15.25.59
06062002.15.25.59
;
run;
PROC PRINT;
run;

Output 2.5 Output from Program 2.8

Output 2.5 Output from Program 2.8

When date variables are imported by some of the automatic channels provided by SAS, the variables are defined with a standard format convenient for American English. It is easy to change this datetime format to a local language style or to a more compressed form, such as in situations where the values are needed as tick marks on a graphical presentation and the physical space does not allow for long texts.

In some situations, you have to create the datetime variable in the beginning and specify the format in a sequence of DATA steps, as described in the next section.

2.4 Handling Datetime Variables

Many functions are provided by SAS to manipulate datetime values. If the variable date represents dates, then the day, the month, the quarter, and the year are extracted by the functions day, month, qtr, and year. Similarly hours, minutes, and seconds could be extracted by SAS functions with obvious names. The date of a datetime variable is extracted by the function datepart. An example is the code in Program 2.10, which constructs a plot, shown in Figure 2.1, of quarterly numbers for the value of e-commerce in the US. It does this in a way that makes the seasonality obvious and also shows the increasing importance of e-commerce. This example is continued in later chapters of this book.

Program 2.10 Plotting quarterly data by year

data a;
set sasts.E_commerce;
yy=year(date);
q=qtr(date);
run;
PROC SGPLOT data=a;
series x=q y=E_commerce/markers group=yy;
xaxis values=(1 to 4 by 1);
where yy<2010 and yy>2005;
run;
quit;

Figure 2.1 A quarterly plot of US E-Commerce for four years

Figure 2.1 A quarterly plot of US E-Commerce for four years

Another example is the extraction of the day, the month, and the year from a reading of the counter position of a copy machine. (See Program 2.11.) This scenario is an example in Chapters 3 and 4. Part of the printed output is shown in Output 2.6.

Program 2.11 Extraction of the day, the month, and the year from a date variable using SAS functions

data a;
set sasts.copy_machine;
day_part=day(date);
month_part=month(date);
year_part=year(date);
run;
proc print;
run;

Output 2.6 Part of the printed output from Program 2.11

Output 2.6 Part of the printed output from Program 2.11

Datetime variables could be constructed by special SAS functions such as the function mdy, which produces a date value from information about the day, the month, and the year. This kind of function is very useful because many data sources present the day, the month, and the year in nonstandard ways and you have to construct the datetime variables by hand. An example is shown in Program 2.12, where the variable date1 specifies a single date as the number 12202005. From this number, the day, the month, and the year are extracted by mathematical functions and the date variable can be constructed by the mdy function. The result is then printed in Output 2.7 in both European and American styles. In this example, it easy to see that the informat anydtdte20. could also do the job when the option datestyle=dmy is applied.

Program 2.12 Manual coding of a date variable

data d1;
input date1;
datalines;
12202005
;
run;
data d2;
set d1;
format date ddmmyy10.;
format date_letters WORDDATE18.;
year=date1-floor(date1/10000)*10000;
day=(date1-year)/10000-floor((date1-year)/1000000)*100;
mon=(date1-day*10000-year)/1000000;
date=mdy(mon,day,year);
date_letters=date;
PROC PRINT;
run;

Output 2.7 Writing the same date in many ways

Output 2.7 Writing the same date in many ways

2.5 Time Series Data Sets

Time series data is usually presented as a data set with a unique variable to represent a point in time, such as a date. Moreover, the data set is ordered according to this time index. If this is not the case, it is easy to obtain the correct order using PROC SORT. Most of the SAS procedures used in this book in some way assume that this is the case even if some procedures are able to transform the data before the prescribed analysis, such as by logarithmic transformations. But because other procedures take for granted that the data set is well structured, this section presents some hints on how to change a data set into the correct form.

Many time series are given as stock values. An example used in Section 3.2 is the observed number from a counter on a copy machine where the interesting number−the number of copies actually made by the machine−has to be derived as the difference between two consecutive observations. Another example is a series of a stock index where the return earned by an investment is again defined as the difference between two index values.

Such differences could be found by the dif function, which takes the difference between the actual observation and the observation before it. This result is meaningful only when the data set is ordered according to time, as for the time series of UK unemployment in Program 2.13. The code in Program 2.13 shows an example of the lag function, which returns the value for the previous observation. These SAS functions could be extended by including numbers. For example, for the monthly data, the function lag12 returns the observation that is 12 months (one year) earlier and dif12 takes the difference between the present value and the value one year before.

Program 2.13 Transforming a time series using differences and lags

data changes;
set sasts.UK_unemployment;
monthly_change=dif(unemployment);
yearly_change=dif12(unemployment);
previous_month=lag(unemployment);
previous_year=lag12(unemployment);
run;

For a date variable, these functions return the number of days because the underlying number of a date is the number of days since 1960. If you need the number of months between two dates, you can use specific functions. The function intck returns the number of weeks, months, and so on, between two points in time. For example, if the interval is specified as ‘month’, the function returns the number of shifts of months, as in Program 2.14. The results are shown in Output 2.8.

Program 2.14 Calculation of the time span between two dates

data a;
set sasts.copy_machine;
days=dif(date);
week=intck( ‘week’, lag(date),date);
month=intck( ‘month’, lag(date),date);
run;
PROC PRINT;
run;

Output 2.8 Part of the printed output from Program 2.14

Output 2.8 Part of the printed output from Program 2.14

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset