Awk is a pattern scanning and processing language, full-featured text processing language with a syntax reminiscent of C. While it possesses an extensive set of operators and capabilities, we will cover only a few of these here - the ones most useful in shell scripts.
Awk breaks each line of input passed to it into fields. By default, a field is a string of consecutive characters delimited by whitespace, though there are options for changing this. Awk parses and operates on each separate field. This makes it ideal for handling structured text files -- especially tables -- data organized into consistent chunks, such as rows and columns.
Let's see how it works. At the command line, enter the following command:
Awk breaks each line of input passed to it into fields. By default, a field is a string of consecutive characters delimited by whitespace, though there are options for changing this. Awk parses and operates on each separate field. This makes it ideal for handling structured text files -- especially tables -- data organized into consistent chunks, such as rows and columns.
Let's see how it works. At the command line, enter the following command:
Print out the whole file
$ awk '{ print }' /etc/fstab or awk '{ print $0 }' /etc/fstab
You should see the contents of your /etc/fstab file as output, same as cat /etc/fstab.
When we executed awk, it evaluated the print command for each line in /etc/passwd in order.
For an explanation of the { print } code block.In awk, curly braces are used to group blocks of code together, similar to C.
Inside our block of code, we have a single print command.In awk, when a print command appears by itself, the full contents of the current line are printed, the $0 variable represents the entire current line, so print and print $0 do exactly the same thing.
Deal with multiple fields
It works like cut, but more powerful than cut, which can only use single character as seperator. By default, it uses whitespace as separator.
As we mentioned above, $0 represents the entire current line of the input, $1 represents the first colomn of the input, while $2 is for the second column, etc..
$awk '{print $1,$2}' /etc/fstab
It will print out the first and the second column of the file /etc/fstab
Print out your own string
$ awk '{ print "#" $0 }' /etc/fstab
It prints every line in /etc/fstab, and adds "#" the begining of every line.
Specify separator for the input file
The following script will print out a list of all user accounts on your system:
$ awk -F":" '{ print $1 }' /etc/passwd
In above case, we use the -F option to specify ":" as the field separator. When awk processes the print $1 command, it will print out the first field that appears on each line in the input file.
$ awk -F":" '{ print $1 }' /etc/passwd
In above case, we use the -F option to specify ":" as the field separator. When awk processes the print $1 command, it will print out the first field that appears on each line in the input file.
Here's another example:
$ awk -F":" '{print $1,$3}' /etc/passwd root 0 bin 1 daemon 2 adm 3
In above example, awk prints out username and uid of each user in your system. Also you may have noticed that there is a ',' between $1 and $2 field, this is to tell awk to separate the two fields in output. Default output seperator is a single space.
Specify separator for the output
Awk default seperator is 'OFS', a single space.
If you want to assign a different seperator, for example, a tab
If you want to assign a different seperator, for example, a tab
$ awk -F":" --assign OFS="\t" '{print "user:"$1,"uid:"$3}' /etc/passwd user:root uid:0 user:bin uid:1 user:daemon uid:2
In above example, awk prints out the the first and third column of /etc/passwd file, separated by ":" and output fields are separated by tab. Note: there is no OFS between "user:" and $1, "uid:" and $3.
why ?
"," is needed between fields to ask awk to use output separator, otherwise, fields will be concatenated.
Search pattern
awk search pattern is a regular expression, for example,
Search and print lines with ext string
# awk '/ext/ {print }' /etc/fstab LABEL=/1 / ext3 defaults 1 1 LABEL=/tmp /tmp ext3 defaults 1 2 LABEL=/home /home ext3 defaults 1 2 LABEL=/usr /usr ext3 defaults 1 2
Print uncommented out lines in the file /etc/fstab
# awk '$0 !~ "^#" {print}' /etc/fstab LABEL=/1 / ext3 defaults 1 1 LABEL=/tmp /tmp ext3 defaults 1 2 LABEL=/home /home ext3 defaults 1 2 LABEL=/usr /usr ext3 defaults 1 2 LABEL=/opt /opt ext3 defaults 1 2 ...
Print file systems that kernel will mount by default.
# awk '$4 == "defaults" && $1 !~ "^#" {print}' /etc/fstab LABEL=/1 / ext3 defaults 1 1 LABEL=/tmp /tmp ext3 defaults 1 2 LABEL=/home /home ext3 defaults 1 2 LABEL=/usr /usr ext3 defaults 1 2
The BEGIN and END blocks
Normally, awk executes each block of your script's code once for each input line. However, there are many programming situations where you may need to execute initialization code before awk begins processing the text from the input file. For such situations, awk allows you to define a BEGIN block. The BEGIN block is evaluated before awk starts processing the input file, it's an excellent place to initialize the FS (field separator) variable, print a heading, or initialize other global variables that you'll reference later in the program.
Awk also provides another special block, called the END block. Awk executes this block after all lines in the input file have been processed. Typically, the END block is used to perform final calculations or print summaries that should appear at the end of the output stream.
Awk also provides another special block, called the END block. Awk executes this block after all lines in the input file have been processed. Typically, the END block is used to perform final calculations or print summaries that should appear at the end of the output stream.
# awk 'BEGIN{FS=":";OFS="\t\t"; print "username\tuid"} {print $1,$3}' /etc/passwd username uid root 0 bin 1 daemon 2 adm 3
Another fine print control example: using printf
awk 'BEGIN{FS=":";OFS="\t\t"; print "username\tuid"} {printf "%8s\t%d\n", $1,$3} END{print "Total " NR " fields have seen so far"}' /etc/passwd username uid root 0 bin 1 ... nx 990 Salina 1003 Total 36 fields have seen so far
Note: in the example above, OFS is ignored
Below is the common variable awk uses
NF The number of fields in the current input record. NR The total number of input records seen so far. FS The output field separator, a space by default. OFS The output field separator, a space by default.
Conditional statements
Awk also offers very nice C-like if statements.
{ if ( $5 ~ /root/ ) { print $3 } }
In the example, the block is executed for every input line
Here's a more complicated example of an awk if statement. As you can see, even with complex, nested conditionals, if statements look identical to their C counterparts:
Here's a more complicated example of an awk if statement. As you can see, even with complex, nested conditionals, if statements look identical to their C counterparts:
{ if ( $1 == "foo" ) { if ( $2 == "foo" ) { print "uno" } else { print "one" } } else if ($1 == "bar" ) { print "two" } else { print "three" } }
Numeric variables
So far, we've either printed strings, the entire line, or specific fields. However, awk also allows us to perform both integer and floating point math. Using mathematical expressions, it's very easy to write a script that counts the number of blank lines in a file. Here's one that does just that:
BEGIN { x=0 } /^$/ { x=x+1 } END { print "I found " x " blank lines. :)" }
In the BEGIN block, we initialize our integer variable x to zero. Then, each time awk encounters a blank line, awk will execute the x=x+1 statement, incrementing x. After all the lines have been processed, the END block will execute, and awk will print out a final summary, specifying the number of blank lines it found.
0 comments:
Post a Comment