Using Filters

In UNIX, filters are designed to accept input, process the data, and write the results to standard output. Think of a filter as a mechanical filter that filters data flowing through it. Using a filter, you can select which data to allow through the filter. grep is classified as a filter. Earlier in this chapter, I described how to use grep to search for a string in a text file and to display the lines containing that string. Now I’ll describe how to use grep to filter data going to standard output.

The ls –l /usr/bin command displays the following output:

-r-xr-xr-x   1 root     bin        21096 Nov  8  2001 acctcom 
-r-xr-xr-x  40 root     bin         5420 Nov  8  2001 adb 
-r-xr-xr-x   1 root     bin        10256 Nov  8  2001 addbib 
-r-s--x--x   1 root     sys       350748 Nov  5  2001 admintool 
-r-xr-xr-x  17 root     bin          134 Nov  8  2001 alias 
-r-xr-xr-x   1 root     bin        15028 Nov  8  2001 aliasadm 
-r-xr-xr-x   1 root     bin          406 Nov  8  2001 amt 
-rwxr-xr-x   1 root     bin        18932 Nov  5  2001 apm 
-r-xr-xr-x   1 root     bin        18779 Nov  8  2001 appcert 

*Output has been truncated.

We can filter this output using grep as follows:

ls -l |grep '^d' <cr> 

The output now looks like this:

drwxr-xr-x   2 root     bin         1024 Feb 27 09:17 sparcv7 
drwxr-xr-x   2 root     bin         1024 Feb 27 09:17 sparcv9 

Using grep as a filter, I was able to display only the lines that contain a d at the beginning. It’s a nice way to list only the directories.

Although most commands can be used standalone or on the right side of a pipe, the following commands make great filters:

  • sort Sorts lines of output.

  • awk Used for pattern scanning and processing language.

  • tr Substitutes or translates characters.

  • cut Cuts out selected fields of each line of a file.

  • paste Joins lines of text from two separate files.

  • diff Compares contents of files or directories.

  • uniq Reports or filters out repeated lines in a file.

  • wc Displays a count of lines, words, and characters. The wc filter was described in Chapter 1.

  • tee Sends the standard output (stdout) to locations at once. The tee command is described in Chapter 3, “Solaris Shells and Variables.”

sort

Use sort to sort lines of input. Input can come from a file or from the standard output generated by a command. sort takes lines from one or more input files, sorts them, and writes the results out to standard output. sort will sort files according to the ASCII (American Standard Code for Information Interchange) sequence. Each line of input must contain fields. These fields are usually separated by spaces or tabs but could be separated by any character such as a colon, comma, or period.

When sorting data, first note the field separator and then determine how you want the input sorted. The syntax for the sort command is as follows:

sort <-options>  filename
						

Some of the more common options to the sort command are as follows:

  • -d Sort alphabetically. Only letters, digits, and blanks (spaces and tabs) are significant in comparisons.

  • -M Compares the specified fields as months.

  • -n Performs a numeric sort.

  • -r Sorts in reverse ACSII order.

  • -o filename Saves output to a file.

There are many more options to the sort command, so refer to the man pages for more information.

In the following example, the output from the ls –l command normally displays nine columns of information, as follows:

-rw-------   1 root     other     306452 Jun 17 20:42 dtdbcache_:0 
-rw-r--r--   1 root     other        240 Jun 20 08:08 file1 
-rw-r--r--   1 root     other         33 Jun 20 12:23 patterns 
-rw-r--r--   1 root     other          0 Jun 17 20:43 sdtvolcheck447 
-rw-r--r--   1 root     other          4 Jun 17 20:42 speckeysd.lock 
-rw-r--r--   1 root     other        118 Jun 20 06:26 userlist 

I want to sort the output from the ls –l command by file size (column 5), with the smallest file displayed first. I’ll use sort to filter the ls –l output, as follows:

ls –l |sort +4 <cr> 

I specify the sort to be performed on field number 4 (the first field is 0). The results are as follows:

-rw-r--r--   1 root     other          0 Jun 17 20:43 sdtvolcheck447 
-rw-r--r--   1 root     other          4 Jun 17 20:42 speckeysd.lock 
-rw-r--r--   1 root     other         33 Jun 20 12:23 patterns 
-rw-r--r--   1 root     other        118 Jun 20 06:26 userlist 
-rw-r--r--   1 root     other        240 Jun 20 08:08 file1 
-rw-------   1 root     other     306452 Jun 17 20:42 dtdbcache_:0 

To display the largest file first, do a reverse sort as follows:

ls –l | sort –r +4 <cr> 

To sort the files by date, use this command:

ls -l | sort -M <cr> 

The results are as follows:

-rw-------   1 root     other     306452 Jun 17 20:42 dtdbcache_:0 
-rw-r--r--   1 root     other          0 Jun 17 20:43 sdtvolcheck447 
-rw-r--r--   1 root     other          4 Jun 17 20:42 speckeysd.lock 
-rw-r--r--   1 root     other         33 Jun 20 12:23 patterns 
-rw-r--r--   1 root     other        118 Jun 20 06:26 userlist 
-rw-r--r--   1 root     other        240 Jun 20 08:08 file1 

To save the sorted output, redirect the standard output to a file using the o followed by a filename option like this:

ls -l | sort –Mo sortlist <cr> 

awk

I can only touch the surface of the capabilities of awk in this chapter. awk is a pattern-scanning, text-processing language. Its name is derived from the three individuals who developed the utility: Aho, Weinberger, and Kernighan.

In this section, I’ll describe how to use awk as a filter. We’ll use awk to search for lines of data using specific selection criteria. We’ll then use awk to perform an action on specific fields in those lines of text.

The general syntax of the awk command is as follows:

awk pattern action filename
						

The following list explains each of the components for the above syntax:

  • pattern The specified pattern that we want to locate in the file. We do not need to specify a pattern; we could simply perform an action on every line in the file.

  • action Specifies the set of instructions you will perform on the file contents.

  • filename The name of the file(s) you want to examine. If awk is used as a filter, no filename is specified, and it gets its standard input from the standard output of the previous command.

awk does not modify the contents of the original file. The results are simply sent to standard output (the screen), and it’s up to you to redirect the output to a filename.

awk will search each line of input for a specific pattern and will perform an action on those lines containing the search pattern. If no search pattern is specified, awk performs the action on every line.

It’s important to understand how files are arranged in order to use awk properly. A file is composed of individual lines referred to as “records.” Each record is generally made up of fields. A field is any group of characters, numbers, or special characters that are separated from each other by a delimiter. Fields are counted from left to right; the leftmost field is field number 1. Field 0 refers to the entire line and all of the fields on that line.

Note

This differs from the way fields are referenced with the sort filter, where the first field is referred to as 0.


In the following example, we have seven fields:

Bill Calkins 123 Main Street Boston MA 

Each grouping of characters is a field. Each field is separated by a space, which is the default field delimiter in awk. Bill is in field number 1, and MA is in field number 7.

Now let’s try a few simple awk statements on a file named namelist that has the following contents:

Bill Calkins 123 Main Street Boston MA 
Elvis Presley 234 First Street Holland MI 
Carlos Santana 345 Chicago Drive Chicago IL 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Jimmy Page 678 Walton Avenue Detroit MI 

Example 1:

I’ll use awk to reformat the file so that the last name (column 2) is displayed first:

awk '{print $2,$1,$3,$4,$5,$6,$7}' namelist 

The result is as follows:

Calkins Bill 123 Main Street Boston MA 
Presley Elvis 234 First Street Holland MI 
Santana Carlos 345 Chicago Drive Chicago IL 
Williams John 456 Apple Avenue Muskegon MI 
Plant Robert 567 Pine Street Lansing MI 
Page Jimmy 678 Walton Avenue Detroit MI 

The action part of the command is enclosed in braces and specifies that awk is to print fields in the order indicated. Each field is referenced by $<n>, where n is the number of the field you want. The field reference $0 represents the entire line. A comma separates each field.

Example 2:

Another form of the awk command enables me to add text to each line, as follows:

awk '{print "Name: "$2,$1," Address: "$3,$4,$5," City/State: ",$6,$7}' namelist 

The results are as follows:

Name: Calkins Bill  Address: 123 Main Street  City/State: Boston MA 
Name: Presley Elvis  Address: 234 First Street  City/State: Holland MI 
Name: Santana Carlos  Address: 345 Chicago Drive  City/State: Chicago IL 
Name: Williams John  Address: 456 Apple Avenue  City/State:  Muskegon MI 
Name: Plant Robert  Address: 567 Pine Street  City/State:  Lansing MI 
Name: Page Jimmy  Address: 678 Walton Avenue  City/State:  Detroit MI 

By inserting text strings, enclosed in double quotes, I can add in my own fields as demonstrated in the previous example.

Example 3:

In this example, I’m going to use awk to search for a particular pattern within a text file. Only the records that match the pattern will be printed.

awk '/MI/ {print $1}' namelist 

The following output is displayed:

Elvis 
John 
Robert 
Jimmy 

In the example, I searched for any line that had MI and then printed only the first field of that line.

Example 4:

I like to use awk as a filter to the ls –l command to only display the information I want to see. Here I’ll list the files in the /etc/home/bcalkins directory, but I’ll only display columns 5 and 9:

ls -l | awk '{print "Filename: " $9, "Size: " $5}' 

The following output is displayed on the screen:

Filename: dir1 Size: 512 
Filename: dir2 Size: 512 
Filename: dir3 Size: 512 
Filename: documents Size: 22 
Filename: file2 Size: 0 
Filename: file3 Size: 0 
Filename: file4 Size: 0 
Filename: file5 Size: 0 

It’s difficult to cover the entire range of awk’s capabilities in this section. You might want to get a copy of Arnold D. Robbins’s Effective Awk Programming: A User’s Guide published by O’Reilly for a text dedicated solely to awk.

tr

Whereas awk and grep worked on entire strings of data, the tr command is a filter used to translate or replace individual characters. The tr command will operate on every line of data it receives. Use the tr filter to change lowercase characters to uppercase or to replace any occurrence of a single character with a new character. Like other filters, tr does not change data in the source file. It’s up to the user to redirect the output to a text file.

The syntax for the tr filter is as follows:

tr a  b 

a will be replaced with b everywhere it occurs.

For example, to replace every occurrence of uppercase B with a lowercase b in the file named namelist, use the following command:

cat namelist | tr B b <cr> 

The results are as follows:

bill Calkins 123 Main Street boston MA 
Elvis Presley 234 First Street Holland MI 
Carlos Santana 345 Chicago Drive Chicago IL 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Jimmy Page 678 Walton Avenue Detroit MI 

tr only works as a filter and only accepts its input via standard output from another command. Therefore, I need to pipe input to it via cat or some other command.

Instead of specifying a specific character to replace, I can specify a class of characters to replace using a character class specification, as follows:

cat namelist| tr '[:upper:]' '[:lower:]' <cr> 

All uppercase characters are replaced with lowercase characters:

bill calkins 123 main street boston ma 
elvis presley 234 first street holland mi 
carlos santana 345 chicago drive chicago il 
john williams 456 apple avenue  muskegon mi 
robert plant 567 pine street lansing mi 
jimmy page 678 walton avenue detroit mi 

The character class specification can be any of the following keywords:

alnum  blank  digit  lower  punct  upper 
alpha  cntrl  graph  print  space  xdigit 

I can use the –s option with tr to replace multiple occurrences of a character with one single character, as follows:

ls -l|tr -s '[:space:]' <cr> 

Normally, the output from ls –l is padded with spaces. In the example, tr strips out multiple occurrences of a space and replaces them with one single space, as follows:

-rw-r--r-- 1 root other 240 Jun 20 08:08 file1 
-rw-r--r-- 1 root other 248 Jun 20 15:07 namelist 
-r--r--r-- 1 root other 670 Jun 20 13:58 passwd 
-rw-r--r-- 1 root other 33 Jun 20 12:23 patterns 
-rw-r--r-- 1 root other 0 Jun 17 20:43 sdtvolcheck447 
-rw-r--r-- 1 root other 4 Jun 17 20:42 speckeysd.lock 
-rw-r--r-- 1 root other 118 Jun 20 06:26 userlist 

If I wanted to replace multiple occurrences of -, I would use the following command:

ls -l|tr -s '-' '' <cr> 

Multiple occurrences of the - are replaced as follows:

-rw-r-r-  1 root    other    240 Jun 20 08:08 file1 
-rw-r-r-  1 root    other    248 Jun 20 15:07 namelist 
-r-r-r-   1 root    other    670 Jun 20 13:58 passwd 
-rw-r-r-  1 root    other     33 Jun 20 12:23 patterns 
-rw-r-r-  1 root    other      0 Jun 17 20:43 sdtvolcheck447 
-rw-r-r-  1 root    other      4 Jun 17 20:42 speckeysd.lock 
-rw-r-r-  1 root    other    118 Jun 20 06:26 userlist 

diff

The differential file comparator, diff, is used to compare the differences between two text files. It can be used when you have two similar files and you want to search for the differences. Perhaps you want to check how a newer version of a text file differs from an older version of the same file.

I have a file named namelista, which contains the following:

Bill Calkins 123 Main Street Boston MA 
Elvis Presley 234 First Street Holland MI 
Carlos Santana 345 Chicago Drive Chicago IL 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Jimmy Page 678 Walton Avenue Detroit MI 

I also have another version of the same file named namelistb, which looks like this:

Bill Calkins 911 WTC Way New York NY 
Elvis Presley 234 First Street Holland MI 
Carlos Santana 345 Chicago Drive Chicago IL 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Jimmy Page 678 Walton Avenue Detroit MI 
Jeff Beck 000 Fruitvale Rd Whitehall MI 

You can use the diff command to quickly find the difference between the two files, as follows:

diff namelista namelistb <cr> 

The system compares both files and lists the changes necessary to convert namelista into namelistb:

1c1 
< Bill Calkins 123 Main Street Boston MA 
---
> Bill Calkins 911 WTC Way New York NY 
6a7 
> Jeff Beck 000 Fruitvale Rd Whitehall MI 

Note

No output will be displayed if the files are identical.


The lines with 1c1 and 6a7 resemble ed commands. The lines from the first file are prefixed with <, and the lines from the second file are prefixed with >. The output tells us that there are two changes to the file.

1c1 tells us that line 1 has changed but is still line number 1.

6a7 tells us that line 7 was appended after line 6.

The following will help you interpret the results that will be generated by diff:

  • a Lines have been added, or appended, to the first file.

  • d Lines have been deleted from the second file.

  • c Lines have been changed between the first and the second file.

If you use the –e option with the diff command, diff will produce output that can be sent to a script. This script can then be used to edit the file, as follows:

diff –e namelista namelistb > edscript 

The contents of the edscript look like this:

6a 
Jeff Beck 000 Fruitvale Rd Whitehall MI 
. 
1c 
Bill Calkins 911 WTC Way New York NY 

Now the ed command can use the contents of edscript to modify namelista and make it the same as namelistb:

ed – namelista <edscript  <cr> 

Use diff not only to compare the contents of text files but also to compare the contents of directories, as follows:

diff -r jradmin sradmin  <cr> 

The system displays the following information:

Common subdirectories: jradmin/dir1 and sradmin/dir1 
Only in jradmin: dir10 
Only in sradmin: dir2 
Only in sradmin: dir3 
Only in jradmin: file20 
Only in jradmin: file30 
Only in jradmin: file40 
Only in sradmin/dir1: subdir1 

uniq

Use the uniq command to filter out repeated lines that are adjacent to each other in a file or from standard output. See Chapter 23, “Name Services,” where I use the uniq filter to strip out duplicate entries from the /etc/passwd and /etc/group files.

Let’s go back to the two files I showed you in the previous section, namelista and namelistb. I’ll combine these two files to create one large file named biglist, as follows:

cat namelista namelistb > biglist  <cr> 

We now have a file named biglist that contains the following:

Bill Calkins 123 Main Street Boston MA 
Elvis Presley 234 First Street Holland MI 
Carlos Santana 345 Chicago Drive Chicago IL 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Jimmy Page 678 Walton Avenue Detroit MI 
Bill Calkins 911 WTC Way New York NY 
Elvis Presley 234 First Street Holland MI 
Carlos Santana 345 Chicago Drive Chicago IL 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Jimmy Page 678 Walton Avenue Detroit MI 
Jeff Beck 000 Fruitvale Rd Whitehall MI 

The file contains duplicate entries. The first step is to get the duplicate entries together so that they are adjacent to each other. If the lines are not adjacent, they will not be removed. I’ll use the sort filter to do this, as follows:

sort biglist  <cr> 

The sorted output is displayed on the screen, as follows:

Bill Calkins 123 Main Street Boston MA 
Bill Calkins 911 WTC Way New York NY 
Carlos Santana 345 Chicago Drive Chicago IL 
Carlos Santana 345 Chicago Drive Chicago IL 
Elvis Presley 234 First Street Holland MI 
Elvis Presley 234 First Street Holland MI 
Jeff Beck 000 Fruitvale Rd Whitehall MI 
Jimmy Page 678 Walton Avenue Detroit MI 
Jimmy Page 678 Walton Avenue Detroit MI 
John Williams 456 Apple Avenue  Muskegon MI 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 
Robert Plant 567 Pine Street Lansing MI 

Now I could save this output to a file, but a more efficient method would be to pipe the output to the uniq command, as follows:

sort biglist | uniq –c <cr> 

The file is sorted and then uniq removes duplicate entries. I also included the –c option, which displays a count of the number of times the line was repeated. The standard output is printed to the screen, as follows:

1 Bill Calkins 123 Main Street Boston MA 
1 Bill Calkins 911 WTC Way New York NY 
2 Carlos Santana 345 Chicago Drive Chicago IL 
2 Elvis Presley 234 First Street Holland MI 
1 Jeff Beck 000 Fruitvale Rd Whitehall MI 
2 Jimmy Page 678 Walton Avenue Detroit MI 
2 John Williams 456 Apple Avenue     Muskegon MI 
2 Robert Plant 567 Pine Street Lansing MI 

To save the list, I could redirect the standard output to a file.

Note

The sort and uniq commands are used together so often that a –u option has been added to sort. The following command also would sort the file and suppress repeated lines:

sort –u biglist <cr> 


The opposite task also can be performed by using the –d option with uniq. This would display only the lines that are not repeated. Here’s what would happen:

sort biglist| uniq –d <cr> 

The following lines would be displayed on the screen:

Carlos Santana 345 Chicago Drive Chicago IL 
Elvis Presley 234 First Street Holland MI 
Jimmy Page 678 Walton Avenue Detroit MI 
John Williams 456 Apple Avenue  Muskegon MI 
Robert Plant 567 Pine Street Lansing MI 

cut

Use the cut command to cut out selected columns or fields from each line of a file or standard output. The syntax for the cut command is as follows:

cut [-options] filename
						

Where options are:

  • -c Specifies the character positions on the line you want to cut

  • -f Specifies a list of fields you want to cut from each line

Again, an explanation of each option is better accomplished with an example.

Example 1:

This example uses the –f option to specify field positions to cut from the file or standard output. I’m going cut the login names (field 1) and the comment field (field 5) from the /etc/passwd file, as follows:

cut –d: -f1,5 /etc/passwd <cr> 

The following results are displayed on the screen:

root:Super-User 
daemon: 
bin: 
sys: 
adm:Admin 
lp:Line Printer Admin 
uucp:uucp Admin 
nuucp:uucp Admin 
smmsp:SendMail Message Submission Program 
listen:Network Admin 
nobody:Nobody 
noaccess:No Access User 
nobody4:SunOS 4.x Nobody 
jradmin:Junior Admin Account 
ftp:Anonymous FTP 

Example 2:

In the next example, I’m going to specify the column positions to extract the information I would like displayed on the screen. I’ll use the –c option to specify the specific column positions that I want to display, as follows:

cut –c1-15 /etc/passwd  <cr> 

Only columns 1 through 15 of the /etc/passwd file are displayed, as follows:

root:x:0:1:Supe 
daemon:x:1:1::/ 
bin:x:2:2::/usr 
sys:x:3:3::/: 
adm:x:4:4:Admin 
lp:x:71:8:Line 
uucp:x:5:5:uucp 
nuucp:x:9:9:uuc 
smmsp:x:25:25:S 
listen:x:37:4:N 
nobody:x:60001: 
noaccess:x:6000 
nobody4:x:65534 
jradmin:x:100:1 
ftp:x:1003:1:An 

paste

Use the paste command to join lines from two separate files into one single line. Suppose we have two files.

file1 contains:

This is line1 from file1 
This is line2 from file1 

file2 contains:

This is line1 from file2 
This is line2 from file2 

The paste command will join the corresponding lines of text from each file into one line, as follows:

paste file1 file2 <cr> 

The system responds with this:

This is line1 from file1        This is line1 from file2 
This is line2 from file1        This is line2 from file2 

The paste command simply substitutes the end of line character (^M) with a tab.

unix2dos/dos2unix

The unix2dos and dos2unix commands are useful when you need to transfer text files between a UNIX system and an Intel-based Windows/DOS system. The unix2dos command is used to convert file formats from a Solaris-formatted text file to a DOS-formatted text file. The command syntax is as follows:

unix2dos [-options] originalfile convertedfile
						

The following are the options to the unix2dos command

  • -ascii Adds carriage returns and converts the end-of-file characters in the Solaris text file to conform to DOS requirements.

  • -iso Converts ISO standard characters to the corresponding character in the DOS extended character set. This is the default.

  • -7 Converts 8-bit Solaris characters to 7-bit DOS characters.

Example 1:

To convert a Solaris text file named solarisfile from Solaris format to DOS format, use the following command:

unix2dos solarisfile dosfile <cr> 

No message is displayed. If I look at the converted file, dosfile, I see that the command added a ^M at the end of each line to make the file DOS compatible, as follows:

Bill Calkins 123 Main Street Boston MA^M 
Elvis Presley 234 First Street Holland MI^M 
Carlos Santana 345 Chicago Drive Chicago IL^M 
John Williams 456 Apple Avenue  Muskegon MI^M 
Robert Plant 567 Pine Street Lansing MI^M 
Jimmy Page 678 Walton Avenue Detroit MI^M 

Alternatively, I can use the dos2unix command to convert a DOS file to a Solaris file. The same syntax and options that were used for unix2dos apply to dos2unix.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset