Thursday, 8 August 2019

Linux - AWK: PRINT HEADER LINE AND PATTERN MATCH

In this tutorial, we look at how we can use AWK to print the header lines from a file or a command output along with the pattern being searched.
While filtering output from certain commands or lengthy reports, it may be important to display the first line of the file or the header line to make sense of the rest of the output which is being displayed.
Consider the below output.
[sahil@linuxnix ~]$ df -hTP
Filesystem                   Type     Size  Used Avail Use% Mounted on
/dev/mapper/vg_pbox6-lv_root ext4      18G  4.9G   12G  30% /
tmpfs                        tmpfs    491M   80K  491M   1% /dev/shm
/dev/sda1                    ext4     477M   35M  418M   8% /boot
/dev/sr0                     iso9660  3.7G  3.7G     0 100% /media/CentOS_6.8_Final
/dev/sdb                     ext4     488M  396K  462M   1% /u01
/dev/sdc                     ext4     488M  396K  462M   1% /u02

We would like to print only the ext4 type file systems but along with the header line as well to make sense of the values indicated by the respective fields.
We could use grep to meet this requirement as done in the below command
[sahil@linuxnix ~]$ df -hTP | grep -E "Filesystem|ext4"
Filesystem                   Type     Size  Used Avail Use% Mounted on
/dev/mapper/vg_pbox6-lv_root ext4      18G  4.9G   12G  30% /
/dev/sda1                    ext4     477M   35M  418M   8% /boot
/dev/sdb                     ext4     488M  396K  462M   1% /u01
/dev/sdc                     ext4     488M  396K  462M   1% /u02
The pipe (|) symbol is an alternation which tells the Linux grep command to print lines containing the word Filesystem or ext4.
But for this to work we’ll always need to know a pattern from the header line (first line) which is not convenient.
So now, we see how we can obtain the required output using awk.
[sahil@linuxnix ~]$ df -hTP | awk 'NR==1 {print}; /ext4/ {print}'
Filesystem                   Type     Size  Used Avail Use% Mounted on
/dev/mapper/vg_pbox6-lv_root ext4      18G  4.8G   12G  30% /
/dev/sda1                    ext4     477M   35M  418M   8% /boot
/dev/sdb                     ext4     488M  396K  462M   1% /u01
/dev/sdc                     ext4     488M  396K  462M   1% /u02

Let me explain the above awk command.
  •  NR here means the row number and $NR==1 implies row number 1. You should consider reading other AWK variables.
  • After this, we have the print statement which tells awk to print the first row.
  • Next, we have a semicolon (;). AWK allows chaining of statements similar to chaining of commands on the command line shell. So the semicolon tells awk that there are more statements to execute.
  • The next AWK statement searches for the word ext4 and prints all lines containing this word.
For our next example, let’s consider the below CSV file
[sahil@linuxnix ~]$ cat agent.csv
Name, Address, Phone Number
Roger,121B Baker's Street, +44-123-5678
Daniel,125A Baker's Street, +44-173-5628
Sean,122B Baker's Street, +44-423-9678
Charles,127D Baker's Street, +44-573-2678
Pierce,129B Baker's Street, +44-825-3678
From the above file, we’d like to print details for the Names Roger and Sean along with the header line. Here’s a long but interesting AWK one-liner which does the job perfectly.
[sahil@linuxnix ~]$ awk -F, 'BEGIN{IGNORECASE=1} ; {OFS=","} ; NR==1 {print $1, $NF};  /[Rr]oger|sean/ {print $1, $NF}' agent.csv
Name, Phone Number
Roger, +44-123-5678
Sean, +44-423-9678
Let me break down the above command step by step.
  • The BEGIN{IGNORECASE=1} tell awk to perform case-insensitive searches while filtering for patterns.
  • With {OFS=”,”} we set the value of the Output Field Separator (OFS) variable to a comma (,) so that the resulting output may be redirected to a CSV file if required.
  • The next line prints the first and last field(column) from the first row. Here NR==1 matches the first row like we did in our previous example and value of $NF evaluates to the last column number which in this case is 3.
  • Next, we perform the pattern search for the names. We use the pipe (|) symbol to denote an alternation indicating to match for either one or both of the patterns.
  • Here I’ve used a character class [Rr] to match for Roger or roger. Although IGNORECASE=1 should take care of the case sensitivity part I wanted to use it in the example just for the sake of demonstration.
  • In the final print statement as with the previous one, we print the first and last fields from the matching rows.

0 comments:

Post a Comment