Chapter 3: Reading Raw Data from External Files

3.1  Introduction

3.2  Reading Data Values Separated by Blanks

3.3  Specifying Missing Values with List Input

3.4  Reading Data Values Separated by Commas (CSV Files)

3.5  Using an alternative Method to Specify an External File

3.6  Reading Data Values Separated by Delimiters Other Than Blanks or Commas

3.7  Placing Data Lines Directly in Your Program (the DATALINES Statement)

3.8  Specifying INFILE Options with the DATALINES Statement

3.9  Reading Raw Data from Fixed Columns—Method 1: Column Input

3.10  Reading Raw Data from Fixed Columns—Method 2: Formatted Input

3.11  Using a FORMAT Statement in a DATA Step versus in a Procedure

3.12  Using Informats with List Input

3.13  Supplying an INFORMAT Statement with List Input

3.14  Using List Input with Embedded Delimiters

3.15  Problems

 

3.1  Introduction

One way to provide SAS with data is to have SAS read the data from a text file and create a SAS data set. Some SAS users already have data in SAS data sets. If this is your case, you can skip this chapter!  

SAS has different ways of reading data from text files and, depending on how the data values are arranged, you can choose an input method that is most convenient. You have already seen one method, called list input, that was used in the introductory program in Chapter 2. This chapter discusses list input as well as two other methods that are appropriate for data arranged in fixed columns.

Some of the more advanced aspects of reading raw data are covered in Chapter 21.

 

3.2  Reading Data Values Separated by Blanks

One of the easiest methods of reading data is called list input. By default, SAS assumes that data values are separated by one or more blanks.

Task: you have a raw data file called Mydata.txt stored in your C:ookslearning folder. It is shown here:

M 50 68 155

F 23 60 101

M 65 72 220

F 35 65 133

M 15 71 166

These values represent gender, age, height (in inches), and weight (in pounds). Notice that this file meets the criteria for list input—each data value is separated from the next by one or more blanks. Program 3.1 reads data from this file and creates a SAS data set.

Program 3.1:  Demonstrating List Input with Blanks as Delimiters

  data Demographics;

    infile 'C:ookslearningMydata.txt';

    input Gender $ Age Height Weight;

  run;

  title "Listing of data set Demographics";

  proc print data=Demographics;

  run;

The INFILE statement tells SAS where to find the data. The INPUT statement contains the variable names you want to associate with each data value. The order of these names matches the order of the values in the file. The dollar sign ($) following Gender tells SAS that Gender is a character variable.

To see that this program works properly, we added a PROC PRINT step to list the observations in the SAS data set (details on PROC PRINT can be found in Chapter 14).

Here is the output:

Figure 3.1: Output from Program 3.1

Figure 3.1: Output from Program 3.1

Each column represents a variable in the data set and each row represents the data on a single person (an observation). The first column, labeled Obs (short for observation), is generated by PROC PRINT. The values in this column go from 1 to the number of observations in the data set. The order of rows in this list reflects the order that the observations were read from the input data and created in the DATA step. If you change the order of the observations or add new observations to the data set, the numbers in the Obs column may change.

The order of the variables (columns) reflects the order that the variables were encountered in the DATA step.

3.3  Specifying Missing Values with List Input

What would happen if you didn't have a value for Age for the second subject in your file? Your data file would look like this (with a missing value in line 2):

 M 50 68 155

 F    60 101

 M 65 72 220

 F 35 65 133

 M 15 71 166

It should be obvious that this will cause errors. SAS reads the value 60 for the Age and 101 for the Height. Because there are no more values on the second line of data, SAS goes to the next line and attempts to read the M as a Height value (and causes a data error message in the log). Clearly, you need a way to tell SAS that there is a missing value for Age in the second line. One way to do this is to use a period to represent the missing value, like this in Mydata.txt:

 M 50 68 155

 F .  60 101

 M 65 72 220

 F 35 65 133

 M 15 71 166

You must separate the period from the values around it by one or more spaces because a space is the default delimiter character. SAS now assigns a missing value for Age for the second subject. By the way, a missing value is not the same as a 0. This is important because if you asked SAS to compute the mean (average) Age for all the subjects, it would average only the non-missing values.

You can use a period to represent a missing character or numeric value when you use list input.

3.4  Reading Data Values Separated by Commas (CSV Files)

A common way to store data on Windows and UNIX platforms is in comma-separated values (CSV) files. These files use commas instead of blanks as data delimiters. They may or may not enclose character values in quotes. The file Mydata.csv contains the same values as the file Mydata.txt. It is shown here.

 

File C:ookslearningMydata.csv

"M",50,68,155

"F",23,60,101

"M",65,72,220

"F",35,65,133

"M",15,71,166

Program 3.2 (below) reads this file and creates a new data set called Demographics.

Program 3.2:  Reading Data From a Comma-Separated Values (Csv) File

  data Demographics;

     infile 'C:ookslearningMydata.csv' dsd;

     input Gender $ Age Height Weight;

  run;

Notice the INFILE statement in this example. The DSD (delimiter-sensitive data) following the file name is an INFILE option. It performs several functions. First, it changes the default delimiter from a blank to a comma. Next, if there are two delimiters in a row, it assumes there is a missing value between. Finally, if character values are placed in quotes (single or double quotes), the quotes are stripped from the value. That’s a lot of mileage for just three letters!

The INPUT statement is identical to Program 3.1 as is the resulting SAS data set.

3.5  Using an Alternative Method to Specify an External File

The INFILE statement in Program 3.2 used the actual file name (placed in quotes) to specify your raw data file. An alternative method is to use a separate FILENAME statement to identify the file and to use this reference (called a fileref) in your INFILE statement instead of the actual file name. Program 3.3 is identical to Program 3.2 except for the way it references the external file.

Program 3.3:  Using a Filename Statement to Identify an External File

  filename Preston 'C:ookslearningMydata.csv';

  data Demographics;

     infile Preston dsd;

     input Gender $ Age Height Weight;

  run;

The name following the FILENAME statement (Preston, in this example) is an alias for the actual file name. For certain operating environments, the fileref can be created outside of SAS (for example, in a DD statement in JCL on a mainframe). Notice also that the fileref (Preston) in the INFILE statement is not placed in quotes. This is how SAS knows that Preston is not the name of a file but rather a reference to it.

 

3.6  Reading Data Values Separated by Delimiters Other Than Blanks or Commas

Remember that the default data delimiter for list input is a blank. Using the INFILE option DSD changes the default to a comma. What if you have a file with other delimiters, such as tabs or colons? No problem! You only need to add the DLM= option to the INFILE statement. For example, the following lines of data use colons as delimiters.

Example of a file using colon delimiters:

M:50:68:155

F:23:60:101

M:65:72:220

F:35:65:133

M:15:71:166

To read this file, you could use this INFILE statement:

infile 'file-description' dlm=':';

You can spell out the name of the DELIMITER= option instead of using the abbreviation DLM= if you like, for example:

infile 'file-description' delimiter=':';

You can use the DSD and DLM= options together. This combination of options performs all the actions requested by the DSD option (see Section 3.4) but overrides the default DSD delimiter (comma) with a delimiter of your choice.

infile 'file-description' dsd dlm=':';

Tabs present a particularly interesting problem. What character do you place between the quotes on the DLM= option? You cannot click the TAB key. Instead, you need to represent the tab by its hexadecimal equivalent. For ASCII files (the coding method used on Windows platforms and UNIX operating systems—it stands for American Standard Code for Information Interchange), you would use the following:

infile 'file-description' dlm='09'x;

For EBCDIC files (used on most mainframe computers—it stands for Extended Binary-Coded Decimal Interchange Code), you would use the following statement:

infile 'file-description' dlm='05'x;

Note: These two values are called hexadecimal constants. If you know (or look up) the hexadecimal value of any character, you can represent it in a SAS statement by placing the hexadecimal value in single or double quotes and following the value immediately (no space) by an upper- or lowercase x.

3.7  Placing Data Lines Directly in Your Program (the DATALINES Statement)

Suppose you want to write a short test program in SAS. Instead of having to place your data in an external file, you can place your lines of data directly in your SAS program by using a DATALINES statement. For example, if you want to read data from the text file Mydata.txt (blank delimited data with values for Gender, Age, Height, and Weight), but you don’t want to go to the trouble of writing the external file, you could use Program 3.4.

Program 3.4:  Demonstrating the DATALINES Statement

  data demographic;

     input Gender $ Age Height Weight;

  datalines;

  M 50 68 155

  F 23 60 101

  M 65 72 220

  F 35 65 133

  M 15 71 166

  ;

As you can see from this example, the INFILE statement was removed and a DATALINES statement was added. Following DATALINES are your lines of data. Finally, a semicolon is used to end the DATA step. (Note: You may either use a single semicolon or a RUN statement to end the DATA step.) The lines of data must be the last element in the DATA step—any other statements must come before the lines of data.

While you would probably not use DATALINES in a real application, it is extremely useful when you want to write short test programs.

As a historical note, the DATALINES statement used to be called the CARDS statement. If you don’t know what a computer card is, ask an old person. By the way, you can still use the word CARDS in place of DATALINES if you want.

3.8  Specifying INFILE Options with the DATALINES Statement

What if you use DATALINES and want to use one or more of the INFILE options, such as DLM= or DSD? You can use many of the INFILE options with DATALINES by using a reserved file reference called DATALINES. For example, if you wanted to run Program 3.2 without an external data file, you could use Program 3.5.

Program 3.5:  Using INFILE Options with DATALINES

  data Demographics;

     infile datalines dsd;

     input Gender $ Age Height Weight;

  datalines;

  "M",50,68,155

  "F",23,60,101

  "M",65,72,220

  "F",35,65,133

  "M",15,71,166

  ;

3.9  Reading Raw Data from Fixed Columns—Method 1: Column Input

Many raw data files store specific information in fixed columns. This has several advantages over data values separated by delimiters. First, you don’t have to worry about missing values. If you do not have a value, you can leave the appropriate columns blank. Next, when you write your INPUT statement, you can choose which variables to read and in what order to read them.

The simplest method for reading data in fixed columns is called column input. This method of input can read character data and standard numeric values. By standard numeric values, we mean positive or negative numbers as well as numbers in exponential form (for example, 3.4E3 means 3.4 times 10 to the 3rd power). This form of input cannot handle values with commas or dollar signs. You can only read dates as character values with this form of input as well. Now for an example.

You have a raw data file called Bank.txt in a folder called C:ookslearning on your Windows computer. A data description for this file follows.

 

Variable

Description

Starting

Column

Ending

Column

Data Type

Subj

Subject Number

1

3

Character

DOB

Date of Birth

4

13

Character

Gender

Gender

14

14

Character

Balance

Bank Account Balance

15

21

Numeric

 

File C:ookslearningBank.txt

          1         2

 1234567890123456789012345 ß Columns (not part of the file)

 -------------------------

 00110/21/1955M   1145

 00211/18/2001F  18722

 00305/07/1944M 123.45

 00407/25/1945F -12345

Program 3.6 is a SAS program that reads data values from this file.

Program 3.6:  Demonstrating Column Input

  data Financial;

     infile 'C:ookslearningBank.txt';

     input Subj     $   1-3

           DOB      $  4-13

           Gender   $    14

           Balance    15-21;

  run;

  title "Listing of Financial";

  proc print data=Financial;

  run;

You specify a variable name, a dollar sign if the variable is a character value, the starting column, and the ending column (if the value takes more than one column). In this program, the number of columns you specify for each character variable determines the number of bytes SAS uses to store these values; for numeric variables, SAS will always use 8 bytes to store these values, regardless of how many columns you specify in your INPUT statement. (There are advanced techniques to change the storage length for numeric variables—and these techniques should be used only when you need to save storage space and you understand the possible problems that can result.)

Notice that this program uses a separate line for each variable. This is not necessary, but it makes the program more readable. You could have written the program like this:

  data Financial;

     infile 'C:ookslearningBank.txt';

     input Subj $ 1-3 DOB $ 4-13 Gender $ 14 Balance 15-21;

  run;

It just doesn’t look as nice and is harder to read. This is a good time to recommend that you get into good habits in writing your SAS programs. It is amazing how much easier it is to read and understand a program where some care is taken in its appearance.

The resulting listing is:

Figure 3.2: Output from Program 3.6

Figure 3.2: Output from Program 3.6

It is important to remember that the date of birth (DOB) is a character value in this data set. To create a more useful, numerical SAS date, you need to use formatted input, the next type of input to be described.

3.10  Reading Raw Data from Fixed Columns—Method 2: Formatted Input

Formatted input also reads data from fixed columns. It can read both character and standard numeric data as well as nonstandard numerical values, such as numbers with dollar signs and commas, and dates in a variety of formats. Formatted input is the most common and powerful of all the input methods. Anytime you have nonstandard data in fixed columns, you should consider using formatted input to read the file.

Let’s start with the same raw data file (Bank.txt) that was used in Program 3.6. First examine the program, and then read the explanation.

 

Program 3.7:  Demonstrating Formatted Input

  data Financial;

     infile 'C:ookslearningBank.txt';

     input @1  Subj         $3.

           @4  DOB    mmddyy10.

           @14 Gender       $1.

           @15 Balance       7.;

  run;

  title "Listing of Financial";

  proc print data=Financial;

  run;

The @ (at) signs in the INPUT statement are called column pointers—and they do just that. For example, @4 says to SAS, go to column 4. Following the variable names are SAS informats. Informats are built-in instructions that tell SAS how to read a data value. The choice of which informat to use is dictated by the data.

Two of the most basic informats are w.d and $w. The w.d informat reads standard numeric values. The w tells SAS how many columns to read. The optional d tells SAS that there is an implied decimal point in the value. For example, if you have the number 123 and you read it with a 3.0 informat, SAS stores the value 123.0. If you read the same number with a 3.1 informat, SAS stores the value 12.3. If the number you are reading already has a decimal point in it (this counts as one of the columns to be read), SAS ignores the d portion of the informat. So, if you read the value 1.23 with a 4.1 informat, SAS stores a value of 1.23.

The $w. informat tells SAS to read w columns of character data. In this program, Subj is read as character data and takes up three columns; values of Gender take up a single column.

Now it’s time to read the date. The mmddyy10. informat tells SAS that the date you are reading is in the mm/dd/yyyy form. SAS reads the date and converts the value into a SAS date. SAS stores dates as numeric values equal to the number of days from January 1, 1960.

If you read the value 01/01/1960 with the mmddyy10. informat, SAS stores a value of 0.

The date 01/02/1960 read with the same informat would result in a value of 1, and so forth. SAS knows all about leap years and correctly converts any date from 1582 to way into the future (1582 is the year Pope Gregory started the Gregorian calendar—dates before this are not defined in SAS).

So, getting back to our example, since date values are in the mm/dd/yyyy form and start in column 4, you use @4 to move the column pointer to column 4 and the mmddyy10. informat to tell SAS to read the next 10 columns as a date in this form. SAS then computes the number of days from January 1, 1960, corresponding to each of the date values. Let’s see the results:

Figure 3.3: Output from Program 3.7

Figure 3.3: Output from Program 3.7

Well, the dates (variable DOB) look rather strange. What you are seeing are the actual values SAS is storing for each DOB (the number of days from January 1, 1960).

You need a way to display these dates in a more traditional form, such as the way the dates were displayed in the raw data file (10/21/1955, in the first observation) or in some other form (such as 10Oct1955). While you are at it, why not add dollar signs and commas to the Balance figures?

You can accomplish both of these tasks by associating a format with each of these two variables. There are many built-in formats in SAS that allow you to display dates and financial values in easily readable ways. You associate these formats with the appropriate variables in a FORMAT statement. Program 3.8 shows how to add a FORMAT statement to PROC PRINT.

Program 3.8:  Demonstrating a FORMAT Statement

  title "Listing of Financial";

  proc print data=Financial;

     format DOB     mmddyy10.

            Balance dollar11.2;

  run;

Here you are using the mmddyy10. format to print the DOB values and the dollar11.2 format to print the Balance values. Notice the period in each of the formats. All SAS formats need to end either in a period or in a period followed by a number. This is how SAS distinguishes between the names of variables or data sets and the names of formats. The 11.2 following the dollar format says to allow up to 11 columns to print the Balance values (including the dollar sign, the decimal point, and possibly a comma or a minus sign). The 2 following the period says to include two decimal places after the decimal point. Here is the revised output:

Figure 3.4: Output from Program 3.8

Figure 3.4: Output from Program 3.8

It is important to remember that the formats only affect the way these values appear in printed output—the internal values are not changed.

To be sure that you understand what formats do, let’s repeat Program 3.8 and use another format for date of birth (DOB).

Program 3.9:  Rerunning Program 3.8 with a Different Format

  title "Listing of Financial";

  proc print data=Financial;

     format DOB     date9.

            Balance dollar11.2;

  run;

This produces the resulting output:

Figure 3.5: Output from Program 3.9

Figure 3.5: Output from Program 3.9

The date9. format prints dates as a two-digit day of the month, a three-character month abbreviation, and a four-digit year. This format helps avoid confusion between the month-day-year and day-month-year formats used in the United States and Europe, respectively.

Notice also that the dollar11.2 format makes the Balance figures much easier to read. This is a good place to mention that the commaw.d format is useful for displaying large numbers with commas, where you don’t need or want dollar signs.

3.11  Using a FORMAT Statement in a DATA Step versus in a Procedure

Program 3.8 and Program 3.9 demonstrated using a FORMAT statement in a procedure. Placing a FORMAT statement here associates the formats and variables only for that procedure. It is usually more useful to place your FORMAT statement in the DATA step. When you do this, there is a permanent association of the formats and variables in the data set. You can override any permanent format by placing a FORMAT statement in a particular procedure where you would like a different format. You will usually want to place all of your date formats in a DATA step because no one wants to see unformatted SAS dates. You can also remove a format from a variable by issuing a FORMAT statement for one or more variables and not specify a format. For example, if a variable called Age was formatted in a DATA step and you wanted to see unformatted values in a listing, you could write the following FORMAT statement:

format Age;

3.12  Using Informats with List Input

Suppose you have a blank- or comma-delimited file containing dates and character values longer than 8 bytes (or other values that require an informat). One way to provide informats with list input is to follow each variable name in your INPUT statement with a colon, followed by the appropriate informat. To see how this works, suppose you want to read the CSV file List.csv:

"001","Christopher Mullens",11/12/1955,"$45,200"

"002","Michelle Kwo",9/12/1955,"$78,123"

"003","Roger W. McDonald",1/1/1960,"$107,200"

Variables in this file represent a subject number (Subj), Name, date of birth (DOB), and yearly salary (Salary). You need to supply informats for Name (length is greater than 8 bytes), DOB (you need a date informat here), and Salary (this is a nonstandard numeric value—with a dollar sign and commas). Program 3.10 shows one way to supply the appropriate informats for these variables.

Program 3.10:  Using Informats with List Input

  data List_Example;

     infile 'C:ookslearningList.csv' dsd;

     input Subj   :       $3.

           Name   :      $20.

           DOB    : mmddyy10.

           Salary :  dollar8.;

     format DOB date9. Salary dollar8.;

  run;

You see here that there is a colon preceding each informat. This colon (called an informat modifier) tells SAS to use the informat supplied but to stop reading the value for this variable when a delimiter is encountered. Do not forget the colons because without them SAS may read past a delimiter to satisfy the width specified in the informat.

This program would also work if the informat for Subj were omitted and the variable name was followed by a dollar sign (to signify that Subj is a character variable). However, the Subj variable would then be stored in 8 bytes (the default length for character variables with list input). By providing the $3. informat, you tell SAS to use 3 bytes to store this variable.

3.13  Supplying an INFORMAT Statement with List Input

Another way to supply informats when using list input is to use an INFORMAT statement before the INPUT statement. Following the keyword INFORMAT, you list each variable and the informat you want to use to read each variable. You may also use a single informat for several variables if you follow a list of variables by a single informat.

To see how this works, see Program 3.11, that uses an INFORMAT statement.

Program 3.11:  Supplying an INFORMAT Statement with List Input

  data List_Example;

     informat Subj        $3.

              Name       $20.

              DOB   mmddyy10.

              Salary dollar8.;

     infile 'C:ookslearningList.csv' dsd;

     input Subj

           Name

           DOB

           Salary;

     format DOB date9. Salary dollar8.;

  run;

This program uses an INFORMAT statement to associate an informat to each of the variables. When choosing informats for your variables, be sure to make the length long enough to accommodate the longest data value you will encounter. Notice that the INPUT statement does not require anything other than the variable names because each variable already has an assigned informat. A listing from PROC PRINT confirms that all is well:

Figure 3.6: Output from Program 3.11

Figure 3.6: Output from Program 3.11

3.14  Using List Input with Embedded Delimiters

What if the data in the previous CSV file was placed in a text file where blanks were used as delimiters instead of commas and there were no quotes around each character value? Here's what the file List.txt would look like:

001 Christopher Mullens 11/12/1955 $45,200

002 Michelle Kwo 9/12/1955 $78,123

003 Roger W. McDonald 1/1/1960 $107,200

Houston, we have a problem! If you try to read this file with list input, the blank(s) in the Name field will trigger the end of the variable. SAS, in its infinite wisdom, came up with a novel solution—the ampersand (&) informat modifier. The ampersand, like the colon, says to use the supplied informat, but the delimiter is now two or more blanks instead of just one. So, if you use an ampersand modifier to read the List.txt file here, you need to use the ampersand modifier following Name. You also need to have two or more spaces between the end of the name and the date of birth. Here is the modified List.txt file:

001 Christopher Mullens   11/12/1955 $45,200

002 Michelle Kwo          9/12/1955  $78,123

003 Roger W. McDonald     1/1/1960   $107,200

And here is the program using the ampersand modifier:

Program 3.12:  Demonstrating the Ampersand Modifier for List Input

  data list_example;

     infile 'C:ookslearninglist.txt';

     input Subj   :       $3.

           Name   &      $20.

           DOB    : mmddyy10.

           Salary :  dollar8.;

     format DOB date9. Salary dollar8.;

  run;

The INPUT statement is one of the most powerful and versatile SAS statements. Please refer to Chapter 25 to learn even more about the ability of SAS to read raw data.

3.15  Problems

Solutions to odd-numbered problems are located at the back of this book. Solutions to all problems are available to professors. If you are a professor, visit the book’s companion website at support.sas.com/cody for information about how to obtain the solutions to all problems.

1.       You have a text file called Scores.txt containing information on gender (M or F) and four test scores (English, history, math, and science). Each data value is separated from the others by one or more blanks. Here is a listing of the data file Scores.txt:

 M   80   82 85 88

 F   94   92 88 96

 M   96   88 89 92

 F   95    . 92 92

a.      Write a DATA step to read in these values. Choose your own variable names. Be sure that the value for Gender is stored in 1 byte and that the four test scores are numeric.

b.      Include an assignment statement computing the average of the four test scores.

c.      Write the appropriate PROC PRINT statements to list the contents of this data set.

 

2.       You are given a CSV (comma-separated values) file called Political.csv containing state, political party, and age. A listing of the file Political.txt is shown here:

"NJ",Ind,55

"CO",Dem,45

"NY",Rep,23

"FL",Dem,66

"NJ",Rep,34

a.      Write a SAS program to create a temporary SAS data set called Vote. Use the variable names State, Party, and Age. Age should be stored as a numeric variable; State and Party should be stored as character variables.

b.       Include a procedure to list the observations in this data set.

c.       Include a procedure to compute frequencies for Party.

3.       You are given a text file where dollar signs were used as delimiters. To indicate missing values, two dollars signs were entered. Values in this file represent last name, employee number, and annual salary.

Here is a listing of the file Company.txt:

Roberts$M234$45000

Chien$M74777$$

Walters$$75000

Rogers$F7272$78131

Using this data file as input, create a temporary SAS data set called Company with the variables LastName (character), EmpNo (character), and Salary (numeric).

4.       Repeat Problem 2 using a FILENAME statement to create a fileref instead of using the file name on the INFILE statements.

5.       You want to create a program that uses a DATALINES statement to read in values for X and Y. In the DATA step, you want to create a new variable, Z, equal to 100 + 50X + 2X2 – 25Y + Y2. Use the following (X,Y) data pairs: (1,2), (3,6), (5,9), and (9,11).

6.       You have a text file called Bankdata.txt with data values arranged as follows:

Variable

Description

Starting Column

Ending Column

Data Type

Name

Name

1

15

Char

Acct

Account number

16

20

Char

Balance

Acct balance

21

26

Num

Rate

Interest rate

27

30

Num

 

Create a temporary SAS data set called Bank using this data file. Use column input to specify the location of each value. Include in this data set a variable called Interest computed by multiplying Balance by Rate. List the contents of this data set using PROC PRINT.

Here is a listing of the text file:

Philip Jones   V1234   4322.32

Nathan Philips V1399  15202.45

Shu Lu         W8892 451233.45

Betty Boop     V7677  50002.78

7.       You have a text file called Geocaching.txt with data values arranged as follows:

Variable

Description

Starting Column

Ending Column

Data Type

Name

Cache name

1

20

Char

LongDeg

Longitude degrees

21

22

Num

LongMin

Longitude minutes

23

28

Num

LatDeg

Latitude degrees

29

30

Num

LatMin

Latitude minutes

31

36

Num

 

Here is a listing of the file:

Higgensville Hike   4030.2937446.539

Really Roaring      4027.4047442.147

Cushetunk Climb     4037.0247448.014

Uplands Trek        4030.9907452.794

Create a temporary SAS data set called Cache using this data file. Use column input to read the data values.

To learn about geocaching (treasure hunting with a hand-held GPS), go to www.geocaching.com. The author and his wife use the geocaching name “Jan and the Man.” Check it out.

8.       Repeat Problem 6 using formatted input to read the data values instead of column input.

9.       Repeat Problem 7 using formatted input to read the data values instead of column input.

10.   You are given a text file called Stockprices.txt containing information on the purchase and sale of stocks. The data layout is as follows:

Variable

Description

Starting Column

Length

Type

Stock

Stock symbol

1

4

Char

PurDate

Purchase date

5

10

mm/dd/yyyy

PurPrice

Purchase price

15

6

Dollar signs and commas

Number

Number of shares

21

4

Num

SellDate

Selling date

25

10

mm/dd/yyyy

SellPrice

Selling price

35

6

Dollar signs and commas

 

A listing of the data file is:

IBM  5/21/2006 $80.0 10007/20/2006 $88.5

CSCO04/05/2005 $17.5 20009/21/2005 $23.6

MOT 03/01/2004 $14.7 50010/10/2006 $19.9

XMSR04/15/2006 $28.4 20004/15/2007 $12.7

BBY 02/15/2005 $45.2 10009/09/2006 $56.8

Create a SAS data set (call it Stocks) by reading the data from this file. Use formatted input.

Compute several new variables as follows:

Variable

Description

Computation

TotalPur

Total purchase price

Number times PurPrice

TotalSell

Total selling price

Number times SellPrice

Profit

Profit

TotalSell minus TotalPur

 

Print out the contents of this data set using PROC PRINT.

11.   You have a CSV file called employee.csv. This file contains the following information:

Variable

Description

Desired Informat

ID

Employee ID

$3.

Name

Employee name

$20.

Depart

Department

$8.

DateHire

Hire date

MMDDYY10.

Salary

Yearly salary

DOLLAR8.

 

Use list input to read data from this file. You will need an informat to read most of these values correctly (i.e., DateHire needs a date informat). You can do this in either of two ways. First is to include an INFORMAT statement to associate each variable with the appropriate informat. The other is to use the colon modifier and supply the informats directly in the INPUT statement. Create a temporary SAS data set (Employ) from this data file. Use PROC PRINT to list the observations in your data set and the appropriate procedure to compute frequencies for the variable Depart.

A listing of the raw data file Employee.csv is:

123,"Harold Wilson",Acct,01/15/1989,$78123.

128,"Julia Child",Food,08/29/1988,$89123

007,"James Bond",Security,02/01/2000,$82100

828,"Roger Doger",Acct,08/15/1999,$39100

900,"Earl Davenport",Food,09/09/1989,$45399

906,"James Swindler",Acct,12/21/1978,$78200

 

 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset