Thursday 1 August 2019

Linux Basic Text Processing With AWK

Once you’ve been working with Linux for a any length of time you come to realize that a large number of tasks you perform whilst using the operating system involve interpreting data from various text files. While tools like grep can help you narrow a search of a text file down to finding just the lines that you care about, sometimes you just want individual columns of data from within the file, or you may want to reorder the columns in a file.

So which tool is best?

AWK Command Tool
This is where the tool AWK comes in handy. AWK itself is a programming language aimed at text processing, and provides a number of functions to help manipulate text files. While learning all the facets of AWK could take a lot of time, for most users the AWK command provides far more power than they’ll ever use beyond isolating and manipulating columns of data.

As a long standing tool in the Linux and Unix world, most distributions come with AWK installed as standard, meaning that for the most part you shouldn’t need to perform any installation. To check, simply run the AWK command at the command line and you should get an information message with details of the flags that AWK accepts.

awk

The basic format of a command for AWK is:

awk ‘pattern {action}’ input-file

By default AWK directs its output to the screen, so if you want to store it in a file you’ll need to redirect its output. You can also, as with most Linux/Unix tools, redirect the output from another command to it. As such, a common use case is to pipe the output of the grep command to an AWK command. The pattern is used to match the input for the action you want to take place. If no pattern is supplied then the action will be applied to every line of the input.

Some Basic AWK Commands
Let’s look at one of the most basic AWK commands you can do:

awk ‘{print “Hello World”}’

As no input file was specified, AWK will present you with a cursor on a blank like, type anything and hit enter and AWK will respond with the text Hello World. This will loop endlessly. To cancel, hit the Control-C key combination. Next, let’s try something simpler:

awk ‘{print}’

Again, as no input file was specified, AWK will offer a cursor for you to provide input. However, this time the command will use the print action to display the contents of the line that AWK reads back into the screen. AWK will helpfully break a line into a number of blocks, using any white space as the separator between those blocks. To print a specific block of text you can use the number for the block of text prefixed with a dollar symbol, with number 1 standing for the first block of text, 2 for the second and so on. $0 is a special case that provides the whole input line, much the same as print alone does.

awk ‘{print $1 $4 $3}’

The above command will print the first, third and fourth blocks of text from its input. Note that there’s nothing between these blocks, so they will run to one continuous string unless you specify any text to appear between them such as:

awk ‘{print $1 " " $4 " " $3}’

As you can see, the blocks don’t have to appear in the order in which they appeared on the original line; this can be mixed up as you require. This can make life simpler when working through log files to collect specific snippets of information, such as IP addresses making repeated connections to your server to brute force attack. AWK doesn’t only have to separate blocks using white space, the -F flag can be used to specify a different separator, for example:

awk -F, ‘{print $2 “,” $5 “,” $3}’ old.csv > new.csv

This command tells AWK to use a comma for the separator, and then to take the second, fifth and third columns from a comma separated variable (csv) file, and then puts them in a new csv file.

To say that this lightly brushes the surface of what AWK is capable of is a bit of an understatement, but this small amount of use case probably makes up the majority of uses that AWK gets on a daily basis. This basic understanding can help make your time spent working at the command line somewhat easier.

0 comments:

Post a Comment