Chapter 3: Reading Raw Data from External Files
3.2 Reading Data Values Separated by Blanks
3.3 Specifying Missing Values with List Input
3.4 Reading Data Values Separated by Commas (CSV Files)
3.5 Using an alternative Method to Specify an External File
3.6 Reading Data Values Separated by Delimiters Other Than Blanks or Commas
3.7 Placing Data Lines Directly in Your Program (the DATALINES Statement)
3.8 Specifying INFILE Options with the DATALINES Statement
3.9 Reading Raw Data from Fixed Columns—Method 1: Column Input
3.10 Reading Raw Data from Fixed Columns—Method 2: Formatted Input
3.11 Using a FORMAT Statement in a DATA Step versus in a Procedure
3.12 Using Informats with List Input
3.13 Supplying an INFORMAT Statement with List Input
3.14 Using List Input with Embedded Delimiters
One way to provide SAS with data is to have SAS read the data from a text file and create a SAS data set. Some SAS users already have data in SAS data sets. If this is your case, you can skip this chapter!
SAS has different ways of reading data from text files and, depending on how the data values are arranged, you can choose an input method that is most convenient. You have already seen one method, called list input, that was used in the introductory program in Chapter 2. This chapter discusses list input as well as two other methods that are appropriate for data arranged in fixed columns.
Some of the more advanced aspects of reading raw data are covered in Chapter 21.
One of the easiest methods of reading data is called list input. By default, SAS assumes that data values are separated by one or more blanks.
Task: you have a raw data file called Mydata.txt stored in your C:ookslearning folder. It is shown here:
M 50 68 155
F 23 60 101
M 65 72 220
F 35 65 133
M 15 71 166
These values represent gender, age, height (in inches), and weight (in pounds). Notice that this file meets the criteria for list input—each data value is separated from the next by one or more blanks. Program 3.1 reads data from this file and creates a SAS data set.
Program 3.1: Demonstrating List Input with Blanks as Delimiters
data Demographics;
infile 'C:ookslearningMydata.txt';
input Gender $ Age Height Weight;
run;
title "Listing of data set Demographics";
proc print data=Demographics;
run;
The INFILE statement tells SAS where to find the data. The INPUT statement contains the variable names you want to associate with each data value. The order of these names matches the order of the values in the file. The dollar sign ($) following Gender tells SAS that Gender is a character variable.
To see that this program works properly, we added a PROC PRINT step to list the observations in the SAS data set (details on PROC PRINT can be found in Chapter 14).
Here is the output:
Figure 3.1: Output from Program 3.1
Each column represents a variable in the data set and each row represents the data on a single person (an observation). The first column, labeled Obs (short for observation), is generated by PROC PRINT. The values in this column go from 1 to the number of observations in the data set. The order of rows in this list reflects the order that the observations were read from the input data and created in the DATA step. If you change the order of the observations or add new observations to the data set, the numbers in the Obs column may change.
The order of the variables (columns) reflects the order that the variables were encountered in the DATA step.
What would happen if you didn't have a value for Age for the second subject in your file? Your data file would look like this (with a missing value in line 2):
M 50 68 155
F 60 101
M 65 72 220
F 35 65 133
M 15 71 166
It should be obvious that this will cause errors. SAS reads the value 60 for the Age and 101 for the Height. Because there are no more values on the second line of data, SAS goes to the next line and attempts to read the M as a Height value (and causes a data error message in the log). Clearly, you need a way to tell SAS that there is a missing value for Age in the second line. One way to do this is to use a period to represent the missing value, like this in Mydata.txt:
M 50 68 155
F . 60 101
M 65 72 220
F 35 65 133
M 15 71 166
You must separate the period from the values around it by one or more spaces because a space is the default delimiter character. SAS now assigns a missing value for Age for the second subject. By the way, a missing value is not the same as a 0. This is important because if you asked SAS to compute the mean (average) Age for all the subjects, it would average only the non-missing values.
You can use a period to represent a missing character or numeric value when you use list input.
A common way to store data on Windows and UNIX platforms is in comma-separated values (CSV) files. These files use commas instead of blanks as data delimiters. They may or may not enclose character values in quotes. The file Mydata.csv contains the same values as the file Mydata.txt. It is shown here.
File C:ookslearningMydata.csv
"M",50,68,155
"F",23,60,101
"M",65,72,220
"F",35,65,133
"M",15,71,166
Program 3.2 (below) reads this file and creates a new data set called Demographics.
Program 3.2: Reading Data From a Comma-Separated Values (Csv) File
data Demographics;
infile 'C:ookslearningMydata.csv' dsd;
input Gender $ Age Height Weight;
run;
Notice the INFILE statement in this example. The DSD (delimiter-sensitive data) following the file name is an INFILE option. It performs several functions. First, it changes the default delimiter from a blank to a comma. Next, if there are two delimiters in a row, it assumes there is a missing value between. Finally, if character values are placed in quotes (single or double quotes), the quotes are stripped from the value. That’s a lot of mileage for just three letters!
The INPUT statement is identical to Program 3.1 as is the resulting SAS data set.
The INFILE statement in Program 3.2 used the actual file name (placed in quotes) to specify your raw data file. An alternative method is to use a separate FILENAME statement to identify the file and to use this reference (called a fileref) in your INFILE statement instead of the actual file name. Program 3.3 is identical to Program 3.2 except for the way it references the external file.
Program 3.3: Using a Filename Statement to Identify an External File
filename Preston 'C:ookslearningMydata.csv';
data Demographics;
infile Preston dsd;
input Gender $ Age Height Weight;
run;
The name following the FILENAME statement (Preston, in this example) is an alias for the actual file name. For certain operating environments, the fileref can be created outside of SAS (for example, in a DD statement in JCL on a mainframe). Notice also that the fileref (Preston) in the INFILE statement is not placed in quotes. This is how SAS knows that Preston is not the name of a file but rather a reference to it.
Remember that the default data delimiter for list input is a blank. Using the INFILE option DSD changes the default to a comma. What if you have a file with other delimiters, such as tabs or colons? No problem! You only need to add the DLM= option to the INFILE statement. For example, the following lines of data use colons as delimiters.
Example of a file using colon delimiters:
M:50:68:155
F:23:60:101
M:65:72:220
F:35:65:133
M:15:71:166
To read this file, you could use this INFILE statement:
infile 'file-description' dlm=':';
You can spell out the name of the DELIMITER= option instead of using the abbreviation DLM= if you like, for example:
infile 'file-description' delimiter=':';
You can use the DSD and DLM= options together. This combination of options performs all the actions requested by the DSD option (see Section 3.4) but overrides the default DSD delimiter (comma) with a delimiter of your choice.
infile 'file-description' dsd dlm=':';
Tabs present a particularly interesting problem. What character do you place between the quotes on the DLM= option? You cannot click the TAB key. Instead, you need to represent the tab by its hexadecimal equivalent. For ASCII files (the coding method used on Windows platforms and UNIX operating systems—it stands for American Standard Code for Information Interchange), you would use the following:
infile 'file-description' dlm='09'x;
For EBCDIC files (used on most mainframe computers—it stands for Extended Binary-Coded Decimal Interchange Code), you would use the following statement:
infile 'file-description' dlm='05'x;
Note: These two values are called hexadecimal constants. If you know (or look up) the hexadecimal value of any character, you can represent it in a SAS statement by placing the hexadecimal value in single or double quotes and following the value immediately (no space) by an upper- or lowercase x.
Suppose you want to write a short test program in SAS. Instead of having to place your data in an external file, you can place your lines of data directly in your SAS program by using a DATALINES statement. For example, if you want to read data from the text file Mydata.txt (blank delimited data with values for Gender, Age, Height, and Weight), but you don’t want to go to the trouble of writing the external file, you could use Program 3.4.
Program 3.4: Demonstrating the DATALINES Statement
data demographic;
input Gender $ Age Height Weight;
datalines;
M 50 68 155
F 23 60 101
M 65 72 220
F 35 65 133
M 15 71 166
;
As you can see from this example, the INFILE statement was removed and a DATALINES statement was added. Following DATALINES are your lines of data. Finally, a semicolon is used to end the DATA step. (Note: You may either use a single semicolon or a RUN statement to end the DATA step.) The lines of data must be the last element in the DATA step—any other statements must come before the lines of data.
While you would probably not use DATALINES in a real application, it is extremely useful when you want to write short test programs.
As a historical note, the DATALINES statement used to be called the CARDS statement. If you don’t know what a computer card is, ask an old person. By the way, you can still use the word CARDS in place of DATALINES if you want.
What if you use DATALINES and want to use one or more of the INFILE options, such as DLM= or DSD? You can use many of the INFILE options with DATALINES by using a reserved file reference called DATALINES. For example, if you wanted to run Program 3.2 without an external data file, you could use Program 3.5.
Program 3.5: Using INFILE Options with DATALINES
data Demographics;
infile datalines dsd;
input Gender $ Age Height Weight;
datalines;
"M",50,68,155
"F",23,60,101
"M",65,72,220
"F",35,65,133
"M",15,71,166
;
Many raw data files store specific information in fixed columns. This has several advantages over data values separated by delimiters. First, you don’t have to worry about missing values. If you do not have a value, you can leave the appropriate columns blank. Next, when you write your INPUT statement, you can choose which variables to read and in what order to read them.
The simplest method for reading data in fixed columns is called column input. This method of input can read character data and standard numeric values. By standard numeric values, we mean positive or negative numbers as well as numbers in exponential form (for example, 3.4E3 means 3.4 times 10 to the 3rd power). This form of input cannot handle values with commas or dollar signs. You can only read dates as character values with this form of input as well. Now for an example.
You have a raw data file called Bank.txt in a folder called C:ookslearning on your Windows computer. A data description for this file follows.
Variable |
Description |
Starting Column |
Ending Column |
Data Type |
Subj |
Subject Number |
1 |
3 |
Character |
DOB |
Date of Birth |
4 |
13 |
Character |
Gender |
Gender |
14 |
14 |
Character |
Balance |
Bank Account Balance |
15 |
21 |
Numeric |
File C:ookslearningBank.txt
1 2
1234567890123456789012345 ß Columns (not part of the file)
-------------------------
00110/21/1955M 1145
00211/18/2001F 18722
00305/07/1944M 123.45
00407/25/1945F -12345
Program 3.6 is a SAS program that reads data values from this file.
Program 3.6: Demonstrating Column Input
data Financial;
infile 'C:ookslearningBank.txt';
input Subj $ 1-3
DOB $ 4-13
Gender $ 14
Balance 15-21;
run;
title "Listing of Financial";
proc print data=Financial;
run;
You specify a variable name, a dollar sign if the variable is a character value, the starting column, and the ending column (if the value takes more than one column). In this program, the number of columns you specify for each character variable determines the number of bytes SAS uses to store these values; for numeric variables, SAS will always use 8 bytes to store these values, regardless of how many columns you specify in your INPUT statement. (There are advanced techniques to change the storage length for numeric variables—and these techniques should be used only when you need to save storage space and you understand the possible problems that can result.)
Notice that this program uses a separate line for each variable. This is not necessary, but it makes the program more readable. You could have written the program like this:
data Financial;
infile 'C:ookslearningBank.txt';
input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance 15-21;
run;
It just doesn’t look as nice and is harder to read. This is a good time to recommend that you get into good habits in writing your SAS programs. It is amazing how much easier it is to read and understand a program where some care is taken in its appearance.
The resulting listing is:
Figure 3.2: Output from Program 3.6
It is important to remember that the date of birth (DOB) is a character value in this data set. To create a more useful, numerical SAS date, you need to use formatted input, the next type of input to be described.
Formatted input also reads data from fixed columns. It can read both character and standard numeric data as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates in a variety of formats. Formatted input is the most common and powerful of all the input methods. Anytime you have nonstandard data in fixed columns, you should consider using formatted input to read the file.
Let’s start with the same raw data file (Bank.txt) that was used in Program 3.6. First examine the program, and then read the explanation.
Program 3.7: Demonstrating Formatted Input
data Financial;
infile 'C:ookslearningBank.txt';
input @1 Subj $3.
@4 DOB mmddyy10.
@14 Gender $1.
@15 Balance 7.;
run;
title "Listing of Financial";
proc print data=Financial;
run;
The @ (at) signs in the INPUT statement are called column pointers—and they do just that. For example, @4 says to SAS, go to column 4. Following the variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value. The choice of which informat to use is dictated by the data.
Two of the most basic informats are w.d and $w. The w.d informat reads standard numeric values. The w tells SAS how many columns to read. The optional d tells SAS that there is an implied decimal point in the value. For example, if you have the number 123 and you read it with a 3.0 informat, SAS stores the value 123.0. If you read the same number with a 3.1 informat, SAS stores the value 12.3. If the number you are reading already has a decimal point in it (this counts as one of the columns to be read), SAS ignores the d portion of the informat. So, if you read the value 1.23 with a 4.1 informat, SAS stores a value of 1.23.
The $w. informat tells SAS to read w columns of character data. In this program, Subj is read as character data and takes up three columns; values of Gender take up a single column.
Now it’s time to read the date. The mmddyy10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, 1960.
If you read the value 01/01/1960 with the mmddyy10. informat, SAS stores a value of 0.
The date 01/02/1960 read with the same informat would result in a value of 1, and so forth. SAS knows all about leap years and correctly converts any date from 1582 to way into the future (1582 is the year Pope Gregory started the Gregorian calendar—dates before this are not defined in SAS).
So, getting back to our example, since date values are in the mm/dd/yyyy form and start in column 4, you use @4 to move the column pointer to column 4 and the mmddyy10. informat to tell SAS to read the next 10 columns as a date in this form. SAS then computes the number of days from January 1, 1960, corresponding to each of the date values. Let’s see the results:
Figure 3.3: Output from Program 3.7
Well, the dates (variable DOB) look rather strange. What you are seeing are the actual values SAS is storing for each DOB (the number of days from January 1, 1960).
You need a way to display these dates in a more traditional form, such as the way the dates were displayed in the raw data file (10/21/1955, in the first observation) or in some other form (such as 10Oct1955). While you are at it, why not add dollar signs and commas to the Balance figures?
You can accomplish both of these tasks by associating a format with each of these two variables. There are many built-in formats in SAS that allow you to display dates and financial values in easily readable ways. You associate these formats with the appropriate variables in a FORMAT statement. Program 3.8 shows how to add a FORMAT statement to PROC PRINT.
Program 3.8: Demonstrating a FORMAT Statement
title "Listing of Financial";
proc print data=Financial;
format DOB mmddyy10.
Balance dollar11.2;
run;
Here you are using the mmddyy10. format to print the DOB values and the dollar11.2 format to print the Balance values. Notice the period in each of the formats. All SAS formats need to end either in a period or in a period followed by a number. This is how SAS distinguishes between the names of variables or data sets and the names of formats. The 11.2 following the dollar format says to allow up to 11 columns to print the Balance values (including the dollar sign, the decimal point, and possibly a comma or a minus sign). The 2 following the period says to include two decimal places after the decimal point. Here is the revised output:
Figure 3.4: Output from Program 3.8
It is important to remember that the formats only affect the way these values appear in printed output—the internal values are not changed.
To be sure that you understand what formats do, let’s repeat Program 3.8 and use another format for date of birth (DOB).
Program 3.9: Rerunning Program 3.8 with a Different Format
title "Listing of Financial";
proc print data=Financial;
format DOB date9.
Balance dollar11.2;
run;
This produces the resulting output:
Figure 3.5: Output from Program 3.9
The date9. format prints dates as a two-digit day of the month, a three-character month abbreviation, and a four-digit year. This format helps avoid confusion between the month-day-year and day-month-year formats used in the United States and Europe, respectively.
Notice also that the dollar11.2 format makes the Balance figures much easier to read. This is a good place to mention that the commaw.d format is useful for displaying large numbers with commas, where you don’t need or want dollar signs.
Program 3.8 and Program 3.9 demonstrated using a FORMAT statement in a procedure. Placing a FORMAT statement here associates the formats and variables only for that procedure. It is usually more useful to place your FORMAT statement in the DATA step. When you do this, there is a permanent association of the formats and variables in the data set. You can override any permanent format by placing a FORMAT statement in a particular procedure where you would like a different format. You will usually want to place all of your date formats in a DATA step because no one wants to see unformatted SAS dates. You can also remove a format from a variable by issuing a FORMAT statement for one or more variables and not specify a format. For example, if a variable called Age was formatted in a DATA step and you wanted to see unformatted values in a listing, you could write the following FORMAT statement:
format Age;
Suppose you have a blank- or comma-delimited file containing dates and character values longer than 8 bytes (or other values that require an informat). One way to provide informats with list input is to follow each variable name in your INPUT statement with a colon, followed by the appropriate informat. To see how this works, suppose you want to read the CSV file List.csv:
"001","Christopher Mullens",11/12/1955,"$45,200"
"002","Michelle Kwo",9/12/1955,"$78,123"
"003","Roger W. McDonald",1/1/1960,"$107,200"
Variables in this file represent a subject number (Subj), Name, date of birth (DOB), and yearly salary (Salary). You need to supply informats for Name (length is greater than 8 bytes), DOB (you need a date informat here), and Salary (this is a nonstandard numeric value—with a dollar sign and commas). Program 3.10 shows one way to supply the appropriate informats for these variables.
Program 3.10: Using Informats with List Input
data List_Example;
infile 'C:ookslearningList.csv' dsd;
input Subj : $3.
Name : $20.
DOB : mmddyy10.
Salary : dollar8.;
format DOB date9. Salary dollar8.;
run;
You see here that there is a colon preceding each informat. This colon (called an informat modifier) tells SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Do not forget the colons because without them SAS may read past a delimiter to satisfy the width specified in the informat.
This program would also work if the informat for Subj were omitted and the variable name was followed by a dollar sign (to signify that Subj is a character variable). However, the Subj variable would then be stored in 8 bytes (the default length for character variables with list input). By providing the $3. informat, you tell SAS to use 3 bytes to store this variable.
Another way to supply informats when using list input is to use an INFORMAT statement before the INPUT statement. Following the keyword INFORMAT, you list each variable and the informat you want to use to read each variable. You may also use a single informat for several variables if you follow a list of variables by a single informat.
To see how this works, see Program 3.11, that uses an INFORMAT statement.
Program 3.11: Supplying an INFORMAT Statement with List Input
data List_Example;
informat Subj $3.
Name $20.
DOB mmddyy10.
Salary dollar8.;
infile 'C:ookslearningList.csv' dsd;
input Subj
Name
DOB
Salary;
format DOB date9. Salary dollar8.;
run;
This program uses an INFORMAT statement to associate an informat to each of the variables. When choosing informats for your variables, be sure to make the length long enough to accommodate the longest data value you will encounter. Notice that the INPUT statement does not require anything other than the variable names because each variable already has an assigned informat. A listing from PROC PRINT confirms that all is well:
Figure 3.6: Output from Program 3.11
What if the data in the previous CSV file was placed in a text file where blanks were used as delimiters instead of commas and there were no quotes around each character value? Here's what the file List.txt would look like:
001 Christopher Mullens 11/12/1955 $45,200
002 Michelle Kwo 9/12/1955 $78,123
003 Roger W. McDonald 1/1/1960 $107,200
Houston, we have a problem! If you try to read this file with list input, the blank(s) in the Name field will trigger the end of the variable. SAS, in its infinite wisdom, came up with a novel solution—the ampersand (&) informat modifier. The ampersand, like the colon, says to use the supplied informat, but the delimiter is now two or more blanks instead of just one. So, if you use an ampersand modifier to read the List.txt file here, you need to use the ampersand modifier following Name. You also need to have two or more spaces between the end of the name and the date of birth. Here is the modified List.txt file:
001 Christopher Mullens 11/12/1955 $45,200
002 Michelle Kwo 9/12/1955 $78,123
003 Roger W. McDonald 1/1/1960 $107,200
And here is the program using the ampersand modifier:
Program 3.12: Demonstrating the Ampersand Modifier for List Input
data list_example;
infile 'C:ookslearninglist.txt';
input Subj : $3.
Name & $20.
DOB : mmddyy10.
Salary : dollar8.;
format DOB date9. Salary dollar8.;
run;
The INPUT statement is one of the most powerful and versatile SAS statements. Please refer to Chapter 25 to learn even more about the ability of SAS to read raw data.
Solutions to odd-numbered problems are located at the back of this book. Solutions to all problems are available to professors. If you are a professor, visit the book’s companion website at support.sas.com/cody for information about how to obtain the solutions to all problems.
1. You have a text file called Scores.txt containing information on gender (M or F) and four test scores (English, history, math, and science). Each data value is separated from the others by one or more blanks. Here is a listing of the data file Scores.txt:
M 80 82 85 88
F 94 92 88 96
M 96 88 89 92
F 95 . 92 92
a. Write a DATA step to read in these values. Choose your own variable names. Be sure that the value for Gender is stored in 1 byte and that the four test scores are numeric.
b. Include an assignment statement computing the average of the four test scores.
c. Write the appropriate PROC PRINT statements to list the contents of this data set.
2. You are given a CSV (comma-separated values) file called Political.csv containing state, political party, and age. A listing of the file Political.txt is shown here:
"NJ",Ind,55
"CO",Dem,45
"NY",Rep,23
"FL",Dem,66
"NJ",Rep,34
a. Write a SAS program to create a temporary SAS data set called Vote. Use the variable names State, Party, and Age. Age should be stored as a numeric variable; State and Party should be stored as character variables.
b. Include a procedure to list the observations in this data set.
c. Include a procedure to compute frequencies for Party.
3. You are given a text file where dollar signs were used as delimiters. To indicate missing values, two dollars signs were entered. Values in this file represent last name, employee number, and annual salary.
Here is a listing of the file Company.txt:
Roberts$M234$45000
Chien$M74777$$
Walters$$75000
Rogers$F7272$78131
Using this data file as input, create a temporary SAS data set called Company with the variables LastName (character), EmpNo (character), and Salary (numeric).
4. Repeat Problem 2 using a FILENAME statement to create a fileref instead of using the file name on the INFILE statements.
5. You want to create a program that uses a DATALINES statement to read in values for X and Y. In the DATA step, you want to create a new variable, Z, equal to 100 + 50X + 2X2 – 25Y + Y2. Use the following (X,Y) data pairs: (1,2), (3,6), (5,9), and (9,11).
6. You have a text file called Bankdata.txt with data values arranged as follows:
Variable |
Description |
Starting Column |
Ending Column |
Data Type |
Name |
Name |
1 |
15 |
Char |
Acct |
Account number |
16 |
20 |
Char |
Balance |
Acct balance |
21 |
26 |
Num |
Rate |
Interest rate |
27 |
30 |
Num |
Create a temporary SAS data set called Bank using this data file. Use column input to specify the location of each value. Include in this data set a variable called Interest computed by multiplying Balance by Rate. List the contents of this data set using PROC PRINT.
Here is a listing of the text file:
Philip Jones V1234 4322.32
Nathan Philips V1399 15202.45
Shu Lu W8892 451233.45
Betty Boop V7677 50002.78
7. You have a text file called Geocaching.txt with data values arranged as follows:
Variable |
Description |
Starting Column |
Ending Column |
Data Type |
Name |
Cache name |
1 |
20 |
Char |
LongDeg |
Longitude degrees |
21 |
22 |
Num |
LongMin |
Longitude minutes |
23 |
28 |
Num |
LatDeg |
Latitude degrees |
29 |
30 |
Num |
LatMin |
Latitude minutes |
31 |
36 |
Num |
Here is a listing of the file:
Higgensville Hike 4030.2937446.539
Really Roaring 4027.4047442.147
Cushetunk Climb 4037.0247448.014
Uplands Trek 4030.9907452.794
Create a temporary SAS data set called Cache using this data file. Use column input to read the data values.
To learn about geocaching (treasure hunting with a hand-held GPS), go to www.geocaching.com. The author and his wife use the geocaching name “Jan and the Man.” Check it out.
8. Repeat Problem 6 using formatted input to read the data values instead of column input.
9. Repeat Problem 7 using formatted input to read the data values instead of column input.
10. You are given a text file called Stockprices.txt containing information on the purchase and sale of stocks. The data layout is as follows:
Variable |
Description |
Starting Column |
Length |
Type |
Stock |
Stock symbol |
1 |
4 |
Char |
PurDate |
Purchase date |
5 |
10 |
mm/dd/yyyy |
PurPrice |
Purchase price |
15 |
6 |
Dollar signs and commas |
Number |
Number of shares |
21 |
4 |
Num |
SellDate |
Selling date |
25 |
10 |
mm/dd/yyyy |
SellPrice |
Selling price |
35 |
6 |
Dollar signs and commas |
A listing of the data file is:
IBM 5/21/2006 $80.0 10007/20/2006 $88.5
CSCO04/05/2005 $17.5 20009/21/2005 $23.6
MOT 03/01/2004 $14.7 50010/10/2006 $19.9
XMSR04/15/2006 $28.4 20004/15/2007 $12.7
BBY 02/15/2005 $45.2 10009/09/2006 $56.8
Create a SAS data set (call it Stocks) by reading the data from this file. Use formatted input.
Compute several new variables as follows:
Variable |
Description |
Computation |
TotalPur |
Total purchase price |
Number times PurPrice |
TotalSell |
Total selling price |
Number times SellPrice |
Profit |
Profit |
TotalSell minus TotalPur |
Print out the contents of this data set using PROC PRINT.
11. You have a CSV file called employee.csv. This file contains the following information:
Variable |
Description |
Desired Informat |
ID |
Employee ID |
$3. |
Name |
Employee name |
$20. |
Depart |
Department |
$8. |
DateHire |
Hire date |
MMDDYY10. |
Salary |
Yearly salary |
DOLLAR8. |
Use list input to read data from this file. You will need an informat to read most of these values correctly (i.e., DateHire needs a date informat). You can do this in either of two ways. First is to include an INFORMAT statement to associate each variable with the appropriate informat. The other is to use the colon modifier and supply the informats directly in the INPUT statement. Create a temporary SAS data set (Employ) from this data file. Use PROC PRINT to list the observations in your data set and the appropriate procedure to compute frequencies for the variable Depart.
A listing of the raw data file Employee.csv is:
123,"Harold Wilson",Acct,01/15/1989,$78123.
128,"Julia Child",Food,08/29/1988,$89123
007,"James Bond",Security,02/01/2000,$82100
828,"Roger Doger",Acct,08/15/1999,$39100
900,"Earl Davenport",Food,09/09/1989,$45399
906,"James Swindler",Acct,12/21/1978,$78200