Wednesday 31 July 2019

Linux - awk useful examples

Awk  is a pattern scanning and processing language, full-featured text processing language with a syntax reminiscent of C. While it possesses an extensive set of operators and capabilities, we will cover only a few of these here - the ones most useful in shell scripts.

Awk breaks each line of input passed to it into fields. By default, a field is a string of consecutive characters delimited by whitespace, though there are options for changing this. Awk parses and operates on each separate field. This makes it ideal for handling structured text files -- especially tables -- data organized into consistent chunks, such as rows and columns.

Let's see how it works. At the command line, enter the following command:

Print out the whole file

$ awk '{ print }' /etc/fstab or awk '{ print $0 }' /etc/fstab
You should see the contents of your /etc/fstab file as output, same as cat /etc/fstab.
When we executed awk, it evaluated the print command for each line in /etc/passwd in order.
For an explanation of the { print } code block.In awk, curly braces are used to group blocks of code together, similar to C.
Inside our block of code, we have a single print command.In awk, when a print command appears by itself, the full contents of the current line are printed, the $0 variable represents the entire current line, so print and print $0 do exactly the same thing.

Deal with multiple fields

It works like cut, but more powerful than cut, which can only use single character as seperator. By default, it uses whitespace as separator.
As we mentioned above, $0 represents the entire current line of the input, $1 represents the first colomn of the input, while $2 is for the second column, etc..
$awk '{print $1,$2}' /etc/fstab
It will print out the first and the second column of the file /etc/fstab

Print out your own string

$ awk '{ print "#" $0 }' /etc/fstab
It prints every line in /etc/fstab, and adds "#" the begining of every line.

Specify separator for the input file

The following script will print out a list of all user accounts on your system:
$ awk -F":" '{ print $1 }' /etc/passwd

In above case, we use the -F option to specify ":" as the field separator. When awk processes the print $1 command, it will print out the first field that appears on each line in the input file.
Here's another example:
$ awk -F":" '{print $1,$3}' /etc/passwd
root 0
bin 1
daemon 2
adm 3
In above example, awk prints out username and uid of each user in your system. Also you may have noticed that there is a ',' between $1 and $2 field, this is to tell awk to separate the two fields in output. Default output seperator is a single space.

Specify separator for the output

Awk default seperator is 'OFS', a single space.

If you want to assign a different seperator, for example, a tab
$ awk -F":" --assign OFS="\t" '{print "user:"$1,"uid:"$3}' /etc/passwd
user:root    uid:0
user:bin    uid:1
user:daemon    uid:2
In above example, awk prints out the the first and third column of /etc/passwd file, separated by ":" and output fields are separated by tab. Note: there is no OFS between "user:" and $1, "uid:" and $3.
why ?
"," is needed between fields to ask awk to use output separator, otherwise, fields will be concatenated.

Search pattern

awk search pattern is a regular expression, for example,

Search and print lines with ext string

# awk '/ext/  {print }' /etc/fstab
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2

Print uncommented out lines in the file /etc/fstab

# awk '$0 !~ "^#" {print}' /etc/fstab
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2
LABEL=/opt              /opt                    ext3    defaults        1 2
...

Print file systems that kernel will mount by default.

# awk '$4 == "defaults" && $1 !~ "^#"  {print}' /etc/fstab
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2

The BEGIN and END blocks

Normally, awk executes each block of your script's code once for each input line. However, there are many programming situations where you may need to execute initialization code before awk begins processing the text from the input file. For such situations, awk allows you to define a BEGIN block. The BEGIN block is evaluated before awk starts processing the input file, it's an excellent place to initialize the FS (field separator) variable, print a heading, or initialize other global variables that you'll reference later in the program.

Awk also provides another special block, called the END block. Awk executes this block after all lines in the input file have been processed. Typically, the END block is used to perform final calculations or print summaries that should appear at the end of the output stream.
# awk 'BEGIN{FS=":";OFS="\t\t"; print "username\tuid"}  {print $1,$3}' /etc/passwd
username    uid
root        0
bin        1
daemon        2
adm        3
Another fine print control example: using printf
awk 'BEGIN{FS=":";OFS="\t\t"; print "username\tuid"} {printf "%8s\t%d\n", $1,$3} END{print "Total " NR " fields have seen so far"}' /etc/passwd
username    uid
    root    0
     bin    1
...
      nx    990
  Salina    1003
Total 36 fields have seen so far
Note: in the example above, OFS is ignored
Below is the common variable awk uses
       NF          The number of fields in the current input record.
       NR          The total number of input records seen so far.
       FS          The output field separator, a space by default.
       OFS        The output field separator, a space by default.

Conditional statements

Awk also offers very nice C-like if statements.
{ if ( $5 ~ /root/ ) { print $3 } }
In the example, the block is executed for every input line

Here's a more complicated example of an awk if statement. As you can see, even with complex, nested conditionals, if statements look identical to their C counterparts:
{
  if ( $1 == "foo" ) 
    { if ( $2 == "foo" ) 
      { print "uno" } 
    else
      { print "one" }
    }
  else if ($1 == "bar" ) 
    { print "two" } 
  else 
    { print "three" } 
}

Numeric variables

So far, we've either printed strings, the entire line, or specific fields. However, awk also allows us to perform both integer and floating point math. Using mathematical expressions, it's very easy to write a script that counts the number of blank lines in a file. Here's one that does just that:
BEGIN { x=0 }
/^$/  { x=x+1 }
END   { print "I found " x " blank lines. :)" }

In the BEGIN block, we initialize our integer variable x to zero. Then, each time awk encounters a blank line, awk will execute the x=x+1 statement, incrementing x. After all the lines have been processed, the END block will execute, and awk will print out a final summary, specifying the number of blank lines it found.

0 comments:

Post a Comment