awk regular expressions
gsub
Global substitution for the pattern in target
gsub(regexp, replacement [, target])
gensub()
it is a general substitution function providing more features than the standard sub() and gsub() functions- the ability to specify components of a regexp in the replacement text
localhost ~]$ df | awk '{ print gensub(/\%/, " Percent", 1) }'
Filesystem 1K-blocks Used Available Use Percent Mounted on
/dev/mapper/fedora-root 51475068 10831316 38005928 23 Percent /
devtmpfs 1956180 0 1956180 0 Percent /dev
/dev/sda9 487652 123767 334189 28 Percent /boot
/dev/mapper/fedora-home 58642620 51118476 4522188 92 Percent /home
/dev/sda2 98304 66006 32298 68 Percent /boot/efi
index(in, find)
Find the index value of a sub string .
localhost ~]$ awk 'BEGIN { print index("SomeLongString", "tr") }'
10
length([string])
Find the length of string, length of lines in the example below
localhost ~]$ awk ' { print length($0) }' testfile
31
29
29
29
match(string, regexp [, array])
match alphabet characters in file and print whole line
localhost ~]$ awk ' match($0, /[a-z]/) { print $0 }' testfile
column1 column2 column3 column4
split(string, array [, fieldsep [, seps ] ])
Split a list of rpm names at dashes.
content of the files – rpms
libhbalinux-1.0.16-2.fc20.x86_64
gucharmap-3.10.1-1.fc20.x86_64
libplist-1.11-2.fc20.x86_64
libgcc-4.8.3-7.fc20.i686
glx-utils-8.1.0-4.fc20.x86_64
vlgothic-fonts-20140801-1.fc20.noarch
Split along dashes , keep in array and print selected index values , keep separators in a array called sep .
localhost ~]$ cat rpms | awk '{split($0, ary, "-", seps) ; print ary[1],ary[2],ary[3]}'
libhbalinux 1.0.16 2.fc20.x86_64
gucharmap 3.10.1 1.fc20.x86_64
libplist 1.11 2.fc20.x86_64
libgcc 4.8.3 7.fc20.i686
glx utils 8.1.0
vlgothic fonts 20140801
print both arrays , ary and sep , the seprator arry contents
localhost ~]$ cat rpms | awk '{split($0, ary, "-", seps) ; print ary[1],ary[2],ary[3],seps[1],seps[2]}'
libhbalinux 1.0.16 2.fc20.x86_64 —
gucharmap 3.10.1 1.fc20.x86_64 —
libplist 1.11 2.fc20.x86_64 —
libgcc 4.8.3 7.fc20.i686 —
glx utils 8.1.0 —
vlgothic fonts 20140801 —
sub(regexp, replacement [, target])
Substitute a pattern with a string , in the example below replace dash followed by any number with –>
localhost ~]$ cat rpms | awk 'sub(/-[0-9]/, " –> " )';
libhbalinux –> .0.16-2.fc20.x86_64
gucharmap –> .10.1-1.fc20.x86_64
libplist –> .11-2.fc20.x86_64
libgcc –> .8.3-7.fc20.i686
glx-utils –> .1.0-4.fc20.x86_64
vlgothic-fonts –> 0140801-1.fc20.noarch
substr(string, start [, length ])
Get a substring of defined length from a given position
Lets use this file having two fields
localhost ~]$ cat nums
123456789 abcdef
find 3rd position and print two values from first field.
localhost ~]$ awk '{print substr($1,3,2) }' nums
34
find 3rd position and print two values from second field.
localhost ~]$ awk '{print substr($2,3,2) }' nums
cd
tolower(string)
Convert alphabet string into lower case
tolower("MiXeD cAsE 123") returns "mixed case 123".
Changing entire files to lowercase in the example below
localhost ~]$ cat letters
This is Just Some Random Text Here ..
localhost ~]$ awk '{ print tolower($0)}' letters
this is just some random text here ..
toupper(string)
Convert alphabet string into upper case
localhost ~]$ awk '{ print toupper($0)}' letters
THIS IS JUST SOME RANDOM TEXT HERE ..
Selective fields can be used for this operation, to make only first field as upper case:
awk '{ print toupper($1)}' letters
THIS
gsub
Global substitution for the pattern in target
gsub(regexp, replacement [, target])
gensub()
it is a general substitution function providing more features than the standard sub() and gsub() functions- the ability to specify components of a regexp in the replacement text
localhost ~]$ df | awk '{ print gensub(/\%/, " Percent", 1) }'
Filesystem 1K-blocks Used Available Use Percent Mounted on
/dev/mapper/fedora-root 51475068 10831316 38005928 23 Percent /
devtmpfs 1956180 0 1956180 0 Percent /dev
/dev/sda9 487652 123767 334189 28 Percent /boot
/dev/mapper/fedora-home 58642620 51118476 4522188 92 Percent /home
/dev/sda2 98304 66006 32298 68 Percent /boot/efi
index(in, find)
Find the index value of a sub string .
localhost ~]$ awk 'BEGIN { print index("SomeLongString", "tr") }'
10
length([string])
Find the length of string, length of lines in the example below
localhost ~]$ awk ' { print length($0) }' testfile
31
29
29
29
match(string, regexp [, array])
match alphabet characters in file and print whole line
localhost ~]$ awk ' match($0, /[a-z]/) { print $0 }' testfile
column1 column2 column3 column4
split(string, array [, fieldsep [, seps ] ])
Split a list of rpm names at dashes.
content of the files – rpms
libhbalinux-1.0.16-2.fc20.x86_64
gucharmap-3.10.1-1.fc20.x86_64
libplist-1.11-2.fc20.x86_64
libgcc-4.8.3-7.fc20.i686
glx-utils-8.1.0-4.fc20.x86_64
vlgothic-fonts-20140801-1.fc20.noarch
Split along dashes , keep in array and print selected index values , keep separators in a array called sep .
localhost ~]$ cat rpms | awk '{split($0, ary, "-", seps) ; print ary[1],ary[2],ary[3]}'
libhbalinux 1.0.16 2.fc20.x86_64
gucharmap 3.10.1 1.fc20.x86_64
libplist 1.11 2.fc20.x86_64
libgcc 4.8.3 7.fc20.i686
glx utils 8.1.0
vlgothic fonts 20140801
print both arrays , ary and sep , the seprator arry contents
localhost ~]$ cat rpms | awk '{split($0, ary, "-", seps) ; print ary[1],ary[2],ary[3],seps[1],seps[2]}'
libhbalinux 1.0.16 2.fc20.x86_64 —
gucharmap 3.10.1 1.fc20.x86_64 —
libplist 1.11 2.fc20.x86_64 —
libgcc 4.8.3 7.fc20.i686 —
glx utils 8.1.0 —
vlgothic fonts 20140801 —
sub(regexp, replacement [, target])
Substitute a pattern with a string , in the example below replace dash followed by any number with –>
localhost ~]$ cat rpms | awk 'sub(/-[0-9]/, " –> " )';
libhbalinux –> .0.16-2.fc20.x86_64
gucharmap –> .10.1-1.fc20.x86_64
libplist –> .11-2.fc20.x86_64
libgcc –> .8.3-7.fc20.i686
glx-utils –> .1.0-4.fc20.x86_64
vlgothic-fonts –> 0140801-1.fc20.noarch
substr(string, start [, length ])
Get a substring of defined length from a given position
Lets use this file having two fields
localhost ~]$ cat nums
123456789 abcdef
find 3rd position and print two values from first field.
localhost ~]$ awk '{print substr($1,3,2) }' nums
34
find 3rd position and print two values from second field.
localhost ~]$ awk '{print substr($2,3,2) }' nums
cd
tolower(string)
Convert alphabet string into lower case
tolower("MiXeD cAsE 123") returns "mixed case 123".
Changing entire files to lowercase in the example below
localhost ~]$ cat letters
This is Just Some Random Text Here ..
localhost ~]$ awk '{ print tolower($0)}' letters
this is just some random text here ..
toupper(string)
Convert alphabet string into upper case
localhost ~]$ awk '{ print toupper($0)}' letters
THIS IS JUST SOME RANDOM TEXT HERE ..
Selective fields can be used for this operation, to make only first field as upper case:
awk '{ print toupper($1)}' letters
THIS
0 comments:
Post a Comment