July 2019 ~ Tech Blog

Wednesday 31 July 2019

AWK one-liner collection

AWK is a pattern matching and string processing language named after the surnames of the original authors: Alfred Aho, Peter Weinberger and Brian Kernighan.
Print selected fields
Split up the lines of the file file.txt with ":" (colon) separated fields and print the second field ($2) of each line:
awk -F":" '{print $2}' file.txt

Same as above but print only output if the second field ($2) exists and is not empty:
awk -F":" '{if ($2)print $2}' file.txt

Print selected fields from each line separated by a dash:
awk -F: '{ print $1 "-" $4 "-" $6 }' file.txt

Print the last field in each line:
awk -F: '{ print $NF }' file.txt

Print every line and delete the second field:
awk '{ $2 = ""; print }' file.txt

Good to know:
The command line option -F sets the field separator. The default is space.
$0 the entire line without the newline at the end
$1 to $9, $10 to ..., the fields
NF number of fields
NR currant line number (counting across all files for multiple files)
FNR line number (just for that file)
Print matching lines
Print field number two ($2) only on lines matching "some regexp" (fiel separator is ":"):
awk -F":" '/some regexp/{print $2}' file.txt

Print lines matching "regexp a" and lines matching "regexp b" but the later ones are printed without newline (note the printf):
awk '/regexp a/{print};/regexp b/{printf $0}' file.txt

Print field number two ($2) only on lines not matching "some regexp" (fiel separator is ":"):
awk -F":" '!/some regexp/{print $2}' file.txt
or
awk -F":" '/some regexp/{next;}{print $2}' file.txt

Print field number two ($2) only on lines matching "some regexp" otherwise print field number three ($3) (fiel separator is ":"):
awk -F":" '/some regexp/{print $2;next}{print $3}' file.txt
The "next" command causes awk to continue with the next line and execute "{print $3}" only for non matching lines. This is like
/regexp/{...if..regexp..matches...;next}{...else...}

Print lines where field number two matches regexp (apply regexp only to field 2, not the whole line):
awk '$2 ~ /regexp/{print;}' file.txt
Here is an example parsing the linux "ps aux" command. It has in the eighth column the process state. To print all processes that are in running or runnable state you would look for the letter "R" in that 8-th column. You want as well to print line 1 of the ps command printout since it contains the column header:
ps aux | awk '$8 ~ /R/{print;}NR==1{print}'

Print the next two (i=2) lines after the line matching regexp:
awk '/regexp/{i=2;next;}{if(i){i--; print;}}' file.txt

Print the line and the next two (i=2) lines after the line matching regexp:
awk '/regexp/{i=2+1;}{if(i){i--; print;}}' file.txt

Print the lines from a file starting at the line matching "start" until the line matching "stop":
awk '/start/,/stop/' file.txt

Print fields 1 and 2 from all lines not matching regexp:
awk '!/regexp/{print $1 " " $2 }' file.txt

Print fields 1 and 2 from lines matching regexp1 and not matching regexp2:
awk '/regexp1/&&!/regexp2/{print $1 " " $2 }' file.txt

Regexp syntax:
c matches the non-metacharacter c.
\c matches the literal character c.
. matches any character including newline.
^ matches the beginning of a string (example: ^1 , only lines starting with a one)
$ matches the end of a string (example: end$ , only lines ending in "end")
[abc...] character list, matches any of the characters abc....
[0-9a-zA-Z] range of characters 0-9 and a-z,A-Z
[^abc...] negated character list, matches any character except abc....
r1|r2 alternation: matches either r1 or r2.
r1r2 concatenation: matches r1, and then r2.
r+ matches one or more r's.
r* matches zero or more r's.
r? matches zero or one r's.
(r) grouping: matches r.

In languages like Perl you can use the grouping feature to extract a substring from the matching string. Normal AWK can not use a grouping to chapture a string. However gawk has the match function which can be used for that. The string matched by the first bracket will be in arr[1].
Print the content of the part of the matching regexp that is enclosed by the round brackets:
gawk 'match($0, /length:([0-9]+) cm/,arr){ print arr[1]}' file.txt

If file.txt looks as shown below then the above command would print 12:
width:3 cm
length:12 cm
height:14 cm

Insert a string after the matching line
This inserts a new line after the matching line:
awk '/regexp/{print $0; print "text inserted after matching line";next}{print}' file.txt
$0 is the line where the search pattern "regexp" matches without the newline at the end. The awk print command prints the string and appends a new line.

This appends a string to the matching line:
awk '/regexp/{print $0 "text appended at end of the matching line";next}{print}' file.txt
If matching "do A" else "do B" (if .. then .. else in awk)
awk '/regexp/{A-here;next}{B-here}' file.txt
Example:
awk '/regexp/{gsub(/string/,"replacement");print $1;next}{print;}' file.txt
The example would print lines that do not match unchanged (action B is just "print;") while on lines that match /regexp/ it would replace /string/ by replacement and print the first element ($1).
If matching "A do..." OR if matching "B do.." (if .. then, if .. then, ...., in awk)
awk '/regexpA/{A-do-here;}/regexpB/{B-do-here}' file.txt
Example:
awk '/house/{print $1;}/cat/{print;}' file.txt

Replacement for some common unix commands (useful in a non unix environment)
Count lines (wc -l):
awk 'END{print NR}'

Search for matching lines (egrep regexp):
awk '/regexp/'

Print non matching lines (egrep -v regexp):
awk '!/regexp/'

Print matching lines with numbers (egrep -n regexp):
awk '/regexp/{print FNR,$0}'

Print matching lines and ignore case (egrep -i regexp):
awk 'BEGIN {IGNORECASE=1};/regexp/'

Number lines (cat -n):
awk '{print FNR "\t" $0}'

Remove duplicate consecutive lines (uniq):
awk 'a !~ $0{print}; {a=$0}'

Print first 5 lines of file (head -5):
awk 'NR < 6'
Number non empty lines
This prints all lines and adds a line number to non empty lines:
awk '/^..*$/{ print FNR ":" $0 ;next}{print}' file.txt
Remove empty lines
This prints all lines except empty ones and lines with only space and tab:
awk '/^[ \t]*$/{next}{print}' file.txt
Number lines longer than 80 char and show them
This is useful to find all the lines longer than 80 characters (or any other length):
awk 'length($0)>80{print FNR,$0}' file.txt
Substitute foo for bar on lines matching regexp
awk '/regexp/{gsub(/foo/, "bar")};{print}' file.txt
Delete trailing white space (spaces, tabs)
awk '{sub(/[ \t]*$/, "");print}' file.txt
Delete leading white space
awk '{sub(/^[ \t]+/, ""); print}' file.txt
Add some characters at the beginning of matching lines
Add ++++ at lines matching regexp.
awk '/regexp/{sub(/^/, "++++"); print;next;}{print}' file.txt
Color gcc warnings in red
gcc -Wall main.c |& awk '/: warning:/{print "\x1B[01;31m" $0 "\x1B[m";next;}{print}'
The "\x1B" means the ascii character with hex number 1B (ESC).
Print only lines of less than 80 characters
awk 'length < 80' file.txt
Renaming files with AWK
You can use awk to generate shell commands such as e.g mv commands to rename files according to a given recipe. I suggest to always print the commands before piping them to sh in order to execute them. A small typo can have very significant side effect so double check what would happen by printing the commands first.

Rename all .MP3 file to be lower case:
ls *.MP3 | awk '{ printf("mv \"%s\" \"%s\"\n", $0, tolower($0)) }'
The above will just print what would happen. To actually execute it you run:
ls *.MP3 | awk '{ printf("mv \"%s\" \"%s\"\n", $0, tolower($0)) }' | sh

Substitute a regexp pattern with a given replacement string. We can e.g replace " " (spaces in the file names) by "-":
ls | awk '{ printf("mv \"%s\" \"%s\"\n", $0, gensub(/ +/,"-","g")) }'
The above will just print what would happen. To actually execute it you run:
ls | awk '{ printf("mv \"%s\" \"%s\"\n", $0, gensub(/ +/,"-","g")) }' | sh
The gensub function reads the strings from $0 (=current line) and returns the modified string. The third argument, the "g", means to find and replace everywhere (globally) on the current line.
AWK as a command-line calculator
This prints 5.1:
awk 'BEGIN{print 3.1+4/2}'

This prints 1.41421:
awk 'BEGIN{print sqrt(2)}'

This prints 1.41421:
awk 'BEGIN{print 2^(1/2)}'

This prints 3.141592653589793 (PI with a 15 digits behind the decimal point):
awk 'BEGIN{printf "%.15f\n",4*atan2(1,1)}'

Print decimal number as hex (this prints 0x20):
awk 'BEGIN{printf "0x%x\n", 32}'

Convert hex string to decimal (this prints 32):
awk 'BEGIN{print strtonum(0x20)}'

Math operators in gnu awk:
+ - * /
^ or ** Exponentiation
% Modulo
exp(), log() Exponential function and natural logarithm
atan2(y, x), sin(), cos() work all in radians (fraction of PI)
sqrt() same as **(1/2) Square root
strtonum() Convert hex (start with 0x) and octal (start with 0) to decimal
If you want to use this frequently then you could put this into your .bashrc file:
# add the awc function to .basrc
# use awc like this: awc "3.4+2+8+99.2" (do not forget the quotes)
awc(){ awk "BEGIN{ print $* }" ;}
On the shell you can then type awc "3.4+2+8+99.2" and it will print 112.6.
AWK minimal web server
You can't write a web server as a reasonable one-liner in AWK, you can do that with netcat but there are some cases where you don't have a real web server and you don't have netcat but you have a very basic shell environment and that does usally include gawk (note: you need gawk, gnu version of awk). Here is a web server that allows you to serve files at port 8080 (or any port, just change the number):
#!/usr/bin/gawk -f
BEGIN {
if (ARGC < 2) { print "Usage: wwwawk file.html"; exit 0 }
Concnt = 1;
while (1) {
RS = ORS = "\r\n";
HttpService = "/inet/tcp/8080/0/0";
getline Dat < ARGV[1];
Datlen = length(Dat) + length(ORS);
while (HttpService |& getline ){
if (ERRNO) { print "Connection error: " ERRNO; exit 1}
print "client: " $0;
if ( length($0) < 1 ) break;
}
print "HTTP/1.1 200 OK" |& HttpService;
print "Content-Type: text/html" |& HttpService;
print "Server: wwwawk/1.0" |& HttpService;
print "Connection: close" |& HttpService;
print "Content-Length: " Datlen ORS |& HttpService;
print Dat |& HttpService;
close(HttpService);
print "OK: served file " ARGV[1] ", count " Concnt;
Concnt++;
}
}
Copy this code and save it into a file called wwwawk and then make it executable with "chmod 755 wwwawk". Now take some file (e.g somefile.html) and you can serve it via that little web server:
chmod 755 wwwawk
./wwwawk somefile.html

from another shell:
curl http://localhost:8080
or
lynx http://localhost:8080
or
firefox -new-tab http://localhost:8080
This is e.g. a great way to serve kickstart files for automated Linux installations. Note that this awk web server requires gawk. Most linux distributions use gawk by default except for raspberry pi which uses mawk and mawk does not support network connections.

Grep your bash history commands and execute one

While working in the Bash shell it is common to want to repeat a command that you have recently executed. Bash keeps a history of executed commands in a history file .bash_history that you can access by simply typing history.

> history
1 ls
2 cd ~
3 ls .*
4 cat .bash_history
5 history
This will output a list of commands prefixed with an identification number. If you only want to see the last N entries in the history, type history N.

> history 4
3 ls .*
4 cat .bash_history
5 history
6 history 4
To execute a command from your history, you can use the history expansion ! followed by the identification number.

> !4
cat .bash_history
Note that the !4 expands to cat .bash_history which is echoed to the terminal before being executed. You can also use !! as a shortcut for executing the last command. This avoids having to type the identification number, which is often more than one character, depending on the length of your history.

A more convenient method of executing a command is to use the ! expansion followed by a matching string. For example:

> !cat
cat .bash_history
executes that last command to begin with cat. Note that the matching string cannot contain any spaces.

You can get a lot of mileage out of these expansions, but you may run into a couple problems. First, your history will grow. Reviewing all those entries for the one you want can be tedious, especially given that there will be many duplicate commands. Second, the identification numbers will get longer and less convenient to type.

To solve the first problem, you can pipe the output of history to grep so that you only review only those commands that match a pattern. For example:

history | grep mplayer
will show all the previous incantations of mplayer. A convenient alias that you can add to your .bashrc file (located in your home directory) is:

alias gh='history | grep '
which will shorten the previous command to:

gh mplayer
This is quite useful, but you will note that there are still duplicate entries and the numbers are not necessarily consecutive. To address these problems, I have created a shell function that will return a list of the top ten commands matching a specified pattern and make it very easy to execute one of them. For example:

> ghf brew
1 brew install rcm
2 brew install karabiner
3 brew install z
4 brew install wget mplayer
5 brew install wget --with-iri
6 brew install wget
7 brew install pv
8 brew install phantomjs
9 brew install mplayer
10 brew install imagemagick
and then I can use the !! shell expansion to choose one of the 10 commands to execute:

> !! 5
brew install wget --with-iri
Note the space between the !! and the identification number.

Here is the full text of the ghf function, which can be added to your .bashrc file so that ghf is available in your shell. I hope you find it useful!

# ghf - [G]rep [H]istory [F]or top ten commands and execute one
# usage:
# Most frequent command in recent history
# ghf
# Most frequent instances of {command} in all history
# ghf {command}
# Execute {command-number} after a call to ghf
# !! {command-number}
function latest-history { history | tail -n 50 ; }
function grepped-history { history | grep "$1" ; }
function chop-first-column { awk '{for (i=2; i<NF; i++) printf $i " "; print $NF}' ; }
function add-line-numbers { awk '{print NR " " $0}' ; }
function top-ten { sort | uniq -c | sort -r | head -n 10 ; }
function unique-history { chop-first-column | top-ten | chop-first-column | add-line-numbers ; }
function ghf {
if [ $# -eq 0 ]; then latest-history | unique-history; fi
if [ $# -eq 1 ]; then grepped-history "$1" | unique-history; fi
if [ $# -eq 2 ]; then
`grepped-history "$1" | unique-history | grep ^$2 | chop-first-column`;
fi
}

Grep, Awk, and Sed in bash

I have used grep, awk, and sed to manipulate and rewrites these log files. Here are some commands that I found useful to document for the future. If you have never used these bash tools, this might be useful, especially if you are trying to mess around with files that are really big. I’m going to consider sample log file and explain a couple different things you can do with these tools. I am using these tools on OSX and these commands or one-line scripts will work with any Linux flavor.

So, let’s consider the following log file:

www.three.com 10.15.101.11 1353280801 TEST 345
www.one.com 10.14.101.11 1353280801 TEST 343
www.three.com 1.10.11.71 1353280801 TEST 323
www.one.com 10.15.11.61 1353280801 TEST 365
www.two.com 10.10.11.51 1353290801 TEST 55
www.two.com 10.20.13.11 1353290801 REST 435
www.one.com 10.20.14.41 1353290801 REST 65
www.two.com 10.10.11.14 1353290801 REST 345
www.three.com 10.10.11.31 1354280801 REST 34
www.one.com 10.10.13.144 1354280801 JSON 65
www.two.com 10.50.11.141 1354280801 JSON 665
www.three.com 120.10.11.11 1354280801 JSON 555
www.two.com 10.144.11.11 1383280801 RAW 33
www.one.com 10.103.141.141 1383280801 RAW 315
view rawsample.log hosted

Now, here are some things you can do to this log file:

How many files are in a directory: ls | wc -l

Print the file: cat sample.log

Print lines that match a particular word: grep “RAW” sample.log

Print those lines to a file called test.log: grep “RAW” sample.log > test.log

Print particular columns and sort: cat sample.log | awk ‘{ print $1,$2}’ | sort -k 1

Find and Replace using SED and Regex: cat sample.log | sed ‘s/TEST/JSON/g’

Split a log file into multiple files using a column as name with AWK: awk ‘{ print >>($4″.log”); close($4″.log”) }’ sample.log

Use substr (removes last character) in AWK to manipulate a string per line: cat sample.log | awk ‘{ print $1,$2,$3,substr($4,1,length($4)-1),$5}’

Print first line of file with SED: sed q test.log

Print last line of file with SED: sed ‘$!d’ sample.log

Perform a regular expression on last character of entire file using SED: cat sample.log | sed ‘$ s/5$//’

Add some text to beginning and end of a file with AWK: cat sample.log | awk ‘BEGIN{print “START” } { print } END{print “END”}’

Count and print how many unique fields are in all rows using AWK: cat sample.log | awk ‘{ if (a[$1]++ == 0) print $1 }’ | wc -l

Make everything lowercase with AWK: cat sample.log | awk ‘{print tolower($0)}’

Multiple SED regular expressions: sed ’1s/^/START/;$ s/5$/END/’ sample.log

Regex with SED on multiple files: for file in *; do sed ’1s/^/START/’ $file > $file’.json’; done

USING GREP TO SEARCH FOR TEXT IN LINUX

Grep is a Unix tool used for finding text within files. This tool is very simple to use, much misunderstood and very powerful. It is an essential command to master when using Unix and Linux.

Grep is the Linux equivalent of Windows Find in Files. Grep can use regular expressions to search files or output for text, it can also use plain text searches.

To search for files containing a particular string, it is as easy as typing in

grep "findme"
This will list out the files and the text surrounding the match. You can add the -r flag to search recursively.

grep -r "findme"
You can also search the output of other commands, for example, a list of the currently installed packages. To see a list of the currently installed packages is a simple command, however, the output is very large and can be difficult to locate all the installed Apache packages.

sudo apt --installed list
To make things easier, we can pipe this output to the grep command which will then search and show only the packages with Apache in the name.

sudo apt --installed list | grep apache
apache2/now 2.4.7-1ubuntu4.13 amd64 [installed,upgradable to: 2.4.7-1ubuntu4.15]
apache2-bin/now 2.4.7-1ubuntu4.13 amd64 [installed,upgradable to: 2.4.7-1ubuntu4.15]
apache2-data/now 2.4.7-1ubuntu4.13 all [installed,upgradable to: 2.4.7-1ubuntu4.15]
apache2-mpm-prefork/now 2.4.7-1ubuntu4.13 amd64 [installed,upgradable to: 2.4.7-1ubuntu4.15]
apache2-utils/now 2.4.7-1ubuntu4.13 amd64 [installed,upgradable to: 2.4.7-1ubuntu4.15]
libapache2-mod-php5/trusty-updates,trusty-security,now 5.5.9+dfsg-1ubuntu4.21 amd64 [installed]
libapache2-mod-svn/trusty-updates,trusty-security,now 1.8.8-1ubuntu3.2 amd64 [installed]
libapache2-svn/trusty-updates,trusty-security,now 1.8.8-1ubuntu3.2 all [installed]
Using Regular Expressions with Grep
Using regular expressions with grep allows us to search for text beginning or ending with a string. These commands can work on files or the piped output of a command. In these examples, I'm just working on a file for ease of demonstration.

You have to use the -E flag for enhanced search, which allows the use of regex. This command will show files starting with fig.

grep -E ^fig /usr/share/dict/words
This will show files ending with ion

grep -E ion$ /usr/share/dict/words
This will show the lines where the word toon is a word, that is preceded and followed by a word boundary (spaces, punctuations, carriage returns etc.)

grep -E '*toon*' /usr/share/dict/words
This shows matches which start with po, contain any two characters, and ends in ute.

grep -E '^po..ute$' /usr/share/dict/words
And this shows all matches which contain any 5 of the specified letters in the brackets

grep -E '[aeiou]{5}' /usr/share/dict/words

One line programs in awk

Awk can do very useful things with as little as one line of code, only few other programming languages can do so much with so little. In this article, I show some examples of these one liners.

Unix/Linux word count utility

awk '{ C += length($0) +1; W += NF } END {print NR, W, C}'

To print origional data values and their logarithms for one column datafiles

awk '{print $1, log($1) }' file(s)

To print a random sample of about 5 percent of the lines from text file

awk 'rand() < 0.05' file(s)

Reporting the sum of the nth column in tables with whitespace separated columns

awk -v COLUMN=n '{ sum += $COLUMN } END { print sum }' file(s)

Report the average of column n

awk -v COLUMN=n '{ sum += $COLUMN } END { print sum / NR }' file(s)

To print the sum of an amount in the last field( number of columns are vary)

awk '{ sum += $NF ; print $0, sum}' file(s)

Some simple ways to search for text in files

egrep 'pattern|pattern' file(s)
awk '/pattern|pattern/' file(s)
awk '/pattern|pattern/ {print FILENAME ":" FNR ":" $0 }' file(s)

Search range of lines

Search lines between 100-150 for the text

awk '{100 <= FNR ) && ( FNR <= 150 ) && /pattern/ {print FILENAME ":" FNR ":" $0 }' file(s)

An alternative way in shell

sed -n -e 100,150p -s file(s) | egrep 'pattern'

To swap the second and third columns in a four column table, assuming tab separators, use any of them below

awk -F'\t' -v OFS='\t' '{print $1,$3,$2,$4}' old >new
awk 'BEGIN { FS = OFS ="\t" } {print $1,$3,$2,$4}' old >new

To convert column separators from tab to ampersand

sed -e 's/tab/\&/g' file(s)
awk 'BEGIN { FS = "\T"; OFS = "&" } { $1 = $1; print }' file(s)

To eliminate duplicate lines from a sorted stream

sort file(s) | uniq
sort file(s) | awk 'Last != $0 { print } { Last = $0 }'

To convert carriage return/newline line terminators to newline terminators, use one of them below

sed -e 's/\r$//' file(s)
sed -e 's/^M$//' file(s)
mawk 'BEGIN { RS = "\r\n" } { print }' file(s)

Note:

The first sed example needs a modern version that recognizes escape sequences.

In the second example, ^M represents a literal Ctrl-M(Carriage return) character.

For the third example, we need either gawk or mawk because nawk and POSIX awk do not support more than a single character in RS.

To convert single spaced text lines to double spaced lines, use any of these

sed -e 's/$/\n' file(s)
awk 'BEGIN { ORS ="\n\n" } { print }' file(s)
awk 'BEGIN { ORS = "\n\n" } 1' file(s)
awk '{print $0 "\n" }' file(s)
awk '{print; print ""}' file(s)

Conversion of double spaced lines to single spacing is equally easy

gwak 'BEGIN { RS="\n *\n" } { print }' file(s)

To strip angle bracketed markup tags from HTML documents, treat the tags as record separators, like this:

mawk 'BEGIN { ORS = " "; RS = "<[^<>]*>" } { print }' *.html

By setting ORS to a space, HTML markup gets converted to a space, and all input line breaks are preserved.

To extract all of the titles from a collection of XML documents

mawk -v ORS=' ' -v RS='[ \n]' '/<title *>/, /<\title *>/' *.xml | sed -e 's@<title *> *@&\n@g

In the example above, it extracts the titles from XML documents, print them one title per line, with surrounding markup. it works correctly even when the titles span multiple lines, and handles the uncommon, but legal, case of spaces between the tag word and the closing angle bracket

Rearranging Fields with awk

Awk is a useful programming language. Although you can do a lot with awk, it was purposely designed to be useful in text manipulation, such as field extraction and rearrangement. In this article, I just show the basics of awk so that you can understand One line programs in awk

Awk patterns and actions

awk's basic paradiam is different from many programming languages. It is similar in many ways to sed:

awk 'program' [ file ...]

The basic structure of an awk program is :

pattern {action}
pattern {action}
...

The pattern can be almost any expression, but most is print in text manipulation.

awk '{print something}' ...

Fields

awk has fields and records as a central part of its design. awk reads input records and automatically splits each record into fields. It sets the built in valuable NF to the number of fields in each record.

awk '{print NF}'

In above example, awk prints the total number of fields for each record.

Field values are designated as such with the $ character. Usually $ is followed by a numeric constant. However it can be followed by an expression. Here are some examples:

awk '{print $1}'        Print first field 
awk '{print $2,$5}'     print second field and fifth fields 
awk '{print $1,$NF}'    print first and last fields
awk 'NF > 0 {print $0}' print nonempty lines 
awk 'NF > 0'

Setting the field separators

For simple programs, you can change the input field separator with the -F option

awk -F: '{print $1, $5}' /etc/passwd

As for output tet field, the separator can be specified by variable OFS

$awk -F: -v 'OFS=,' '{print $1,$5}' /etc/passwd
root,root
bin,bin
daemon,daemon
adm,adm
...

Printing lines

awk printing is not just limited to fields, but also variables, or strings. For example:

$awk -F: '{print "user:"$1,"description:"$5}' /etc/passwd
...
user:dbus description:System message bus
user:rpc description:Rpcbind Daemon
user:usbmuxd description:usbmuxd user
user:avahi-autoipd description:Avahi IPv4LL Stack

Startup and cleanup actions

There are two special patterns, BEGIN and END, awk uses them to do startup and cleanup actions. It is common to use them in larger awk programs.

The basic structure is like this

BEGIN { start up code }
pattern1 {action1}
pattern2 {action2}
END { cleanup code }

For example:

awk 'BEGIN { FS=":"; COLUMN=2 }' '{ sum += $COLUMN } END { print sum, sum / NR }' file(s)

The example above, input file separator is ":", and column 2 is to be processed, print sum and average of column from input files.

Checkout file original timestamp for files from Git

For people like me used to use cvs for version control, when switching to Git, one of the thing bothers me a lot is file's origional timestamp. In case you want to checkout the files origional create/commit timestamp, here is my trick.

After checkout, run the script below, it will get the original create/commit timestamp from Git db.

#!/bin/bash -e

PATH=/usr/bin:/bin
unalias -a

get_file_rev() {
    git rev-list -n 1 HEAD "$1"
}

update_file_timestamp() {
  file_time=$(git show --pretty=format:%ai --abbrev-commit "$(get_file_rev "$1")" | head -n 1)
  touch -d "$file_time" "$1"
}

OLD_IFS=$IFS
IFS=$'\n'

for file in `git ls-files`
do
  if [ -f "$file" ] ; then
    update_file_timestamp "$file"
  fi
done

IFS=$OLD_IFS

git update-index --refresh

convert upper case to lower case on Linux, in different ways

convert upper case to lower case on linux, in different ways
There are quite a lot ways to do the task on linux, most of them are easy to use. Here are different useful examples for different cases.

#1 text file editor, vi, vim, ex etc..

They are in the same editor family
Open the file you want to conver all upper case to lower case, then use the following commands

:1,$ s/[a-z]/\u &/g

Above command can be explained as follows:

Command         Explanation
1,$             Line Address location is all i.e. find all lines for following pattern
s               Substitute command
/[a-z]/         Find all lowercase letter - Target
\u&/            Substitute to Uppercase. \u& means substitute last patter (&) matched with its UPPERCASE replacement (\u) Note: Use \l (small L) for lowercase character.
g               Global replacement

#2 tr command -- translate or delete characters

$echo "HOW ARE YOU TODAY" |tr '[:upper:]' '[:lower:]'
how are you today

also works with variable

$echo $VAR_NAME | tr '[:upper:]' '[:lower:]'

Convert whole file

To convert file test1 content to lower case

$cat test1 ; tr '[:upper:]' '[:lower:]' <test1 >test2
SINGLE WORDS OR PHRASES THAT DESCRIBE YOURSELF
$cat test2
single words or phrases that describe yourself

This style works the same way

tr '[A-Z]' '[a-z]'

All of the character classes are here:
http://ss64.com/bash/tr.html

#3. Using Bash

Convert lower case to upper case

$ string1="i love itmyshare"
$ echo ${string1^^}
I LOVE ITMYSHARE

Convert upper case to lower case

$ string1="I LOVE FIBREVILLAGE"
$ echo ${string1,,}
i love fibrevillage

Typeset

$typeset -u string1="i love fibrefillage"
$echo $string1
I LOVE FIBREVILLAGE

In same way, option '-l' do the opposite way

$typeset -u string1="I LOVE FIBREVILLAGE"
$echo $string1
i love fibrevillage

#4. dd

$ cat test1
SINGLE WORDS OR PHRASES THAT DESCRIBE YOURSELF

$ dd if=test1 of=test2 conv=lcase
$cat test2
single words or phrases that describe yourself

From lower case to upper case, use

dd in=test2 of=test1 conv=ucase

#5. awk

Upper case to lower case

$ awk '{ print tolower($0) }' test1 >test2

From lower case to upper ase, use

$ awk '{ print toupper($0) }' test1 >test2

#6. Sed

$ sed -e 's/\(.*\)/\L\1/' test1 >test2
$ cat test2
single words or phrases that describe yourself

The backreference \1 to refer to the entire line and the \L to convert to lower case.

From lower case to upper case

$ sed -e 's/\(.*\)/\U\1/' test2
SINGLE WORDS OR PHRASES THAT DESCRIBE YOURSELF

#7. perl

To conver lower case to upper case

$ perl -pe '$_= uc($_)' test1
SINGLE WORDS OR PHRASES THAT DESCRIBE YOURSELF

To conver upper case to lower case

$ perl -pe '$_= lc($_)' test2
single words or phrases that describe yourself

A piece of simple code to convert file to lower or upper case

        echo "Menu "
        echo "1. Lower to Upper"
        echo "2. Upper to lower "
        echo "3. Quit"
        echo "Enter ur Choice \c"
        read Choice
        case"$Choice"in
           1) echo "Enter File: \c"
              read f1
              if [ -f $f1 ]
          then
               echo "Converting Lower Case to Upper Case "
                 tr '[a-z]''[A-Z]' <$f1
              else
                     echo "$f1 does not exist"
              fi
              ;;
          2) echo "Enter the File :\c"
             read f1
             if [ -f $f1 ]
             then
               echo "Converting Upper case to Lower Case to "
             tr '[A-Z]''[a-z]' <$f1
             else
                  echo "$f1 file does not exist "
             fi
             ;;
         3|*)
             echo "Exit......."
             exit;;
        esac

Linux - awk useful examples

Awk is a pattern scanning and processing language, full-featured text processing language with a syntax reminiscent of C. While it possesses an extensive set of operators and capabilities, we will cover only a few of these here - the ones most useful in shell scripts.

Awk breaks each line of input passed to it into fields. By default, a field is a string of consecutive characters delimited by whitespace, though there are options for changing this. Awk parses and operates on each separate field. This makes it ideal for handling structured text files -- especially tables -- data organized into consistent chunks, such as rows and columns.

Let's see how it works. At the command line, enter the following command:

Print out the whole file

$ awk '{ print }' /etc/fstab or awk '{ print $0 }' /etc/fstab

You should see the contents of your /etc/fstab file as output, same as cat /etc/fstab.

When we executed awk, it evaluated the print command for each line in /etc/passwd in order.

For an explanation of the { print } code block.In awk, curly braces are used to group blocks of code together, similar to C.

Inside our block of code, we have a single print command.In awk, when a print command appears by itself, the full contents of the current line are printed, the $0 variable represents the entire current line, so print and print $0 do exactly the same thing.

Deal with multiple fields

It works like cut, but more powerful than cut, which can only use single character as seperator. By default, it uses whitespace as separator.

As we mentioned above, $0 represents the entire current line of the input, $1 represents the first colomn of the input, while $2 is for the second column, etc..

$awk '{print $1,$2}' /etc/fstab

It will print out the first and the second column of the file /etc/fstab

Print out your own string

$ awk '{ print "#" $0 }' /etc/fstab

It prints every line in /etc/fstab, and adds "#" the begining of every line.

Specify separator for the input file

The following script will print out a list of all user accounts on your system:
$ awk -F":" '{ print $1 }' /etc/passwd

In above case, we use the -F option to specify ":" as the field separator. When awk processes the print $1 command, it will print out the first field that appears on each line in the input file.

Here's another example:

$ awk -F":" '{print $1,$3}' /etc/passwd
root 0
bin 1
daemon 2
adm 3

In above example, awk prints out username and uid of each user in your system. Also you may have noticed that there is a ',' between $1 and $2 field, this is to tell awk to separate the two fields in output. Default output seperator is a single space.

Specify separator for the output

Awk default seperator is 'OFS', a single space.

If you want to assign a different seperator, for example, a tab

$ awk -F":" --assign OFS="\t" '{print "user:"$1,"uid:"$3}' /etc/passwd
user:root    uid:0
user:bin    uid:1
user:daemon    uid:2

In above example, awk prints out the the first and third column of /etc/passwd file, separated by ":" and output fields are separated by tab. Note: there is no OFS between "user:" and $1, "uid:" and $3.

why ?

"," is needed between fields to ask awk to use output separator, otherwise, fields will be concatenated.

Search pattern

awk search pattern is a regular expression, for example,

Search and print lines with ext string

# awk '/ext/  {print }' /etc/fstab
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2

Print uncommented out lines in the file /etc/fstab

# awk '$0 !~ "^#" {print}' /etc/fstab
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2
LABEL=/opt              /opt                    ext3    defaults        1 2
...

Print file systems that kernel will mount by default.

# awk '$4 == "defaults" && $1 !~ "^#"  {print}' /etc/fstab
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/tmp              /tmp                    ext3    defaults        1 2
LABEL=/home             /home                   ext3    defaults        1 2
LABEL=/usr              /usr                    ext3    defaults        1 2

The BEGIN and END blocks

Normally, awk executes each block of your script's code once for each input line. However, there are many programming situations where you may need to execute initialization code before awk begins processing the text from the input file. For such situations, awk allows you to define a BEGIN block. The BEGIN block is evaluated before awk starts processing the input file, it's an excellent place to initialize the FS (field separator) variable, print a heading, or initialize other global variables that you'll reference later in the program.

Awk also provides another special block, called the END block. Awk executes this block after all lines in the input file have been processed. Typically, the END block is used to perform final calculations or print summaries that should appear at the end of the output stream.

# awk 'BEGIN{FS=":";OFS="\t\t"; print "username\tuid"}  {print $1,$3}' /etc/passwd
username    uid
root        0
bin        1
daemon        2
adm        3

Another fine print control example: using printf

awk 'BEGIN{FS=":";OFS="\t\t"; print "username\tuid"} {printf "%8s\t%d\n", $1,$3} END{print "Total " NR " fields have seen so far"}' /etc/passwd
username    uid
    root    0
     bin    1
...
      nx    990
  Salina    1003
Total 36 fields have seen so far

Note: in the example above, OFS is ignored

Below is the common variable awk uses

       NF          The number of fields in the current input record.
       NR          The total number of input records seen so far.
       FS          The output field separator, a space by default.
       OFS        The output field separator, a space by default.

Conditional statements

Awk also offers very nice C-like if statements.

{ if ( $5 ~ /root/ ) { print $3 } }

In the example, the block is executed for every input line

Here's a more complicated example of an awk if statement. As you can see, even with complex, nested conditionals, if statements look identical to their C counterparts:

{
  if ( $1 == "foo" ) 
    { if ( $2 == "foo" ) 
      { print "uno" } 
    else
      { print "one" }
    }
  else if ($1 == "bar" ) 
    { print "two" } 
  else 
    { print "three" } 
}

Numeric variables

So far, we've either printed strings, the entire line, or specific fields. However, awk also allows us to perform both integer and floating point math. Using mathematical expressions, it's very easy to write a script that counts the number of blank lines in a file. Here's one that does just that:

BEGIN { x=0 }
/^$/  { x=x+1 }
END   { print "I found " x " blank lines. :)" }

In the BEGIN block, we initialize our integer variable x to zero. Then, each time awk encounters a blank line, awk will execute the x=x+1 statement, incrementing x. After all the lines have been processed, the END block will execute, and awk will print out a final summary, specifying the number of blank lines it found.

Linux - cut command useful examples

cut is a command from coreutility which allows you to print selected parts of lines from each FILE to standard output.

Here is an output from lscpu -p option, it will be used in examples below as testfile

# Socket,CPU,Core,Address,Online,Configured,Polarization
0,0,0,,Y,,
0,1,1,,Y,,
0,2,2,,Y,,
0,3,3,,Y,,
0,4,4,,Y,,
0,5,5,,Y,,
1,6,6,,Y,,
1,7,7,,Y,,
1,8,8,,Y,,
1,9,9,,Y,,
1,10,10,,Y,,
1,11,11,,Y,,

Example 1: print one specified field(second field in below example)

# cut -d',' -f2 testfile
CPU
0
1
2
3
...

Option -d and -f
       -d, --delimiter=DELIM
              use DELIM instead of TAB for field delimiter
       -f, --fields=LIST
              select only these fields; also print any line that contains no delimiter character, unless the -s option is specified

Example 2: print two fields

# cut -d',' -f2,5 testfile
CPU,Online
0,Y
1,Y
2,Y
...

Example 3: print a range of fields

# cut -d',' -f2-5 testfile
CPU,Core,Address,Online
0,0,,Y
1,1,,Y
...

Example 4: print a range of fields, from specified field to rest

# cut -d',' -f2- testfile
CPU,Core,Address,Online,Configured,Polarization
0,0,,Y,,
1,1,,Y,,
...

Example 5: print a range of fields, the first to the specified one

# cut -d',' -f-2 testfile
# Socket,CPU
0,0
0,1
...

Same rule applys to -b and -c

-b and -c
       -b, --bytes=LIST
              select only these bytes
       -c, --characters=LIST
              select only these characters

# cut -c1 testfile
#
0
...

# cut -c1,5 testfile
#c
00
...

# cut -c1-5 testfile
# Soc
0,0,0
...

# cut -c-5 testfile
# Soc
0,0,0
...
# cut -c5- testfile
cket,CPU,Core,Address,Online,Configured,Polarization
0,,Y,,
1,,Y,,
...

More options

       -n     with -b: don't split multi byte characters
       --complement
              complement the set of selected bytes, characters or fields
       -s, --only-delimited
              do not print lines not containing delimiters
       --output-delimiter=STRING
              use STRING as the output delimiter the default is to use the input delimiter

Wednesday 31 July 2019

Unix/Linux word count utility

To print origional data values and their logarithms for one column datafiles

To print a random sample of about 5 percent of the lines from text file

Reporting the sum of the nth column in tables with whitespace separated columns

Report the average of column n

To print the sum of an amount in the last field( number of columns are vary)

Some simple ways to search for text in files

Search range of lines

To swap the second and third columns in a four column table, assuming tab separators, use any of them below

To convert column separators from tab to ampersand

To eliminate duplicate lines from a sorted stream

To convert carriage return/newline line terminators to newline terminators, use one of them below

To convert single spaced text lines to double spaced lines, use any of these

Conversion of double spaced lines to single spacing is equally easy

To strip angle bracketed markup tags from HTML documents, treat the tags as record separators, like this:

To extract all of the titles from a collection of XML documents

Awk patterns and actions

Fields

Setting the field separators

Printing lines

Startup and cleanup actions

#1 text file editor, vi, vim, ex etc..

#2 tr command -- translate or delete characters

Convert whole file

#3. Using Bash

Typeset

#4. dd

#5. awk

#6. Sed

#7. perl

A piece of simple code to convert file to lower or upper case

Print out the whole file

Deal with multiple fields

Print out your own string

Specify separator for the input file

Specify separator for the output

Search pattern

Search and print lines with ext string

Print uncommented out lines in the file /etc/fstab

Print file systems that kernel will mount by default.

The BEGIN and END blocks

Conditional statements

Numeric variables

Example 1: print one specified field(second field in below example)

Example 2: print two fields

Example 3: print a range of fields

Example 4: print a range of fields, from specified field to rest

Example 5: print a range of fields, the first to the specified one

Same rule applys to -b and -c

More options

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]