Wednesday, 31 July 2019

Rearranging Fields with awk

Awk is a useful programming language. Although you can do a lot with awk, it was purposely designed to be useful in text manipulation, such as field extraction and rearrangement. In this article, I just show the basics of awk so that you can understand One line programs in awk

Awk patterns and actions

awk's basic paradiam is different from many programming languages. It is similar in many ways to sed:
awk 'program' [ file ...]
The basic structure of an awk program is  :
pattern {action}
pattern {action}
...
The pattern can be almost any expression, but most is print in text manipulation.
awk '{print something}' ...

Fields

awk has fields and records as a central part of its design. awk reads input records and automatically splits each record into fields. It sets the built in valuable NF to the number of fields in each record.
awk '{print NF}' 
In above example, awk prints the total number of fields for each record.
Field values are designated as such with the $ character. Usually $ is followed by a numeric constant. However it can be followed by an expression. Here are some examples:
awk '{print $1}'        Print first field 
awk '{print $2,$5}'     print second field and fifth fields 
awk '{print $1,$NF}'    print first and last fields
awk 'NF > 0 {print $0}' print nonempty lines 
awk 'NF > 0'

Setting the field separators

For simple programs, you can change the input field separator with the -F option
awk -F: '{print $1, $5}' /etc/passwd
As for output tet field, the separator can be specified by variable OFS
$awk -F: -v 'OFS=,' '{print $1,$5}' /etc/passwd
root,root
bin,bin
daemon,daemon
adm,adm
...

Printing lines

awk printing is not just limited to fields, but also variables, or strings. For example:
$awk -F: '{print "user:"$1,"description:"$5}' /etc/passwd
...
user:dbus description:System message bus
user:rpc description:Rpcbind Daemon
user:usbmuxd description:usbmuxd user
user:avahi-autoipd description:Avahi IPv4LL Stack

Startup and cleanup actions

There are two special patterns, BEGIN and END, awk uses them to do startup and cleanup actions. It is common to use them in larger awk programs.
The basic structure is like this
BEGIN { start up code }
pattern1 {action1}
pattern2 {action2}
END { cleanup code }
For example:
awk 'BEGIN { FS=":"; COLUMN=2 }' '{ sum += $COLUMN } END { print sum, sum / NR }' file(s)
The example above, input file separator is ":", and column 2 is to be processed, print sum and average of column from input files.

0 comments:

Post a Comment