One line programs in awk ~ Tech Blog

Wednesday, 31 July 2019

One line programs in awk

Awk can do very useful things with as little as one line of code, only few other programming languages can do so much with so little. In this article, I show some examples of these one liners.

Unix/Linux word count utility

awk '{ C += length($0) +1; W += NF } END {print NR, W, C}'

To print origional data values and their logarithms for one column datafiles

awk '{print $1, log($1) }' file(s)

To print a random sample of about 5 percent of the lines from text file

awk 'rand() < 0.05' file(s)

Reporting the sum of the nth column in tables with whitespace separated columns

awk -v COLUMN=n '{ sum += $COLUMN } END { print sum }' file(s)

Report the average of column n

awk -v COLUMN=n '{ sum += $COLUMN } END { print sum / NR }' file(s)

To print the sum of an amount in the last field( number of columns are vary)

awk '{ sum += $NF ; print $0, sum}' file(s)

Some simple ways to search for text in files

egrep 'pattern|pattern' file(s)
awk '/pattern|pattern/' file(s)
awk '/pattern|pattern/ {print FILENAME ":" FNR ":" $0 }' file(s)

Search range of lines

Search lines between 100-150 for the text

awk '{100 <= FNR ) && ( FNR <= 150 ) && /pattern/ {print FILENAME ":" FNR ":" $0 }' file(s)

An alternative way in shell

sed -n -e 100,150p -s file(s) | egrep 'pattern'

To swap the second and third columns in a four column table, assuming tab separators, use any of them below

awk -F'\t' -v OFS='\t' '{print $1,$3,$2,$4}' old >new
awk 'BEGIN { FS = OFS ="\t" } {print $1,$3,$2,$4}' old >new

To convert column separators from tab to ampersand

sed -e 's/tab/\&/g' file(s)
awk 'BEGIN { FS = "\T"; OFS = "&" } { $1 = $1; print }' file(s)

To eliminate duplicate lines from a sorted stream

sort file(s) | uniq
sort file(s) | awk 'Last != $0 { print } { Last = $0 }'

To convert carriage return/newline line terminators to newline terminators, use one of them below

sed -e 's/\r$//' file(s)
sed -e 's/^M$//' file(s)
mawk 'BEGIN { RS = "\r\n" } { print }' file(s)

Note:

The first sed example needs a modern version that recognizes escape sequences.

In the second example, ^M represents a literal Ctrl-M(Carriage return) character.

For the third example, we need either gawk or mawk because nawk and POSIX awk do not support more than a single character in RS.

To convert single spaced text lines to double spaced lines, use any of these

sed -e 's/$/\n' file(s)
awk 'BEGIN { ORS ="\n\n" } { print }' file(s)
awk 'BEGIN { ORS = "\n\n" } 1' file(s)
awk '{print $0 "\n" }' file(s)
awk '{print; print ""}' file(s)

Conversion of double spaced lines to single spacing is equally easy

gwak 'BEGIN { RS="\n *\n" } { print }' file(s)

To strip angle bracketed markup tags from HTML documents, treat the tags as record separators, like this:

mawk 'BEGIN { ORS = " "; RS = "<[^<>]*>" } { print }' *.html

By setting ORS to a space, HTML markup gets converted to a space, and all input line breaks are preserved.

To extract all of the titles from a collection of XML documents

mawk -v ORS=' ' -v RS='[ \n]' '/<title *>/, /<\title *>/' *.xml | sed -e 's@<title *> *@&\n@g

In the example above, it extracts the titles from XML documents, print them one title per line, with surrounding markup. it works correctly even when the titles span multiple lines, and handles the uncommon, but legal, case of spaces between the tag word and the closing angle bracket

Wednesday, 31 July 2019

Unix/Linux word count utility

To print origional data values and their logarithms for one column datafiles

To print a random sample of about 5 percent of the lines from text file

Reporting the sum of the nth column in tables with whitespace separated columns

Report the average of column n

To print the sum of an amount in the last field( number of columns are vary)

Some simple ways to search for text in files

Search range of lines

To swap the second and third columns in a four column table, assuming tab separators, use any of them below

To convert column separators from tab to ampersand

To eliminate duplicate lines from a sorted stream

To convert carriage return/newline line terminators to newline terminators, use one of them below

To convert single spaced text lines to double spaced lines, use any of these

Conversion of double spaced lines to single spacing is equally easy

To strip angle bracketed markup tags from HTML documents, treat the tags as record separators, like this:

To extract all of the titles from a collection of XML documents

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains