Thursday, 8 August 2019

Linux - AWK SCRIPTING: LEARN AWK BUILT-IN VARIABLES WITH EXAMPLES

AWK INBUILT VARIABLES: FS, OFS, RS, ORS, NR, NF, FNR, FILENAME

AWK is supplied with good number of built-in variables which come in handy when working with data files. We will see each AWK built-in variables with one or two examples to familiarize with them. Without these built-in variables it’s very much difficult to write simple AWK code. These variable are used to format output of an AWK command, as input field separator and even we can store current input file name in them for using them with in the script. Some of the AWK concepts already covered are.
AWK scripting: What is an AWK and how to use it?
AWK scripting: 14 AWK print statement examples
AWK scripting: 8 AWK printf statements examples
AWK scripting: 10 BEGIN and END block examples
AWK Scripting: How to define awk variables

AWK BUILT-IN VARIABLES:

  • NR: Current count of the number of input records.
  • NF: Keeps a count of the number of fields
  • FILENAME: The name of the current input-file.
  • FNR: No of records in current filename
  • FS: Contains the “field separator” character
  • RS: Stores the current “record separator” or Row Separator.
  • OFS: Stores the “output field separator”.
  • ORS: Stores the “output record separator” or Output RS.
    Our sample DB file for this post is db.txt
    cat db.txt

    John,29,MS,IBM,M,Married
    Barbi,45,MD,JHH,F,Single
    Mitch,33,BS,BofA,M,Single
    Tim,39,Phd,DELL,M,Married
    Lisa,22,BS,SmartDrive,F,Married
    In order to make it simple we can divide above  inbuilt variables in to groups on basis of their operations.
    Group1: FS(input field separator), OFS, 
    Group2: RS(Row separator) and ORS(Output record separator)
    Group3: NR, NF and FNR
    Group4: FILENAME variable

    GROUP1: FS(INPUT FIELD SEPARATOR), OFS

    Let us start with FS and OFS built-in variables.
    FS AWK variable: This variable is useful in storing the input field separator. By default AWK can understand only spaces, tabs as input and output separators. But if your file contains some other character as separator other than these mention one’s, AWK cannot understand them. For example Linux password file which contain ‘:’ as a separator. So in order to mention the input filed separator we use this inbuilt variable.
    We will see what issue we face if we don’t mention the field separator for our db.txt.
    Example1: Print first column data from db.txt file.
    awk ‘{print $1}’ db.txt

    Output:

    John,29,MS,IBM,M,Married
    Barbi,45,MD,JHH,F,Single
    Mitch,33,BS,BofA,M,Single
    Tim,39,Phd,DELL,M,Married
    Lisa,22,BS,SmartDrive,F,Married
    If you see entire file is displayed which indicates AWK do not understand db.txt file separator “,”. We have to tell AWK what is the field separator.
    Example2: List only first column data from db.txt file which have field separator as ‘,’.
    awk ‘BEGIN{FS=”,”}{print $1}’ db.txt
    Output:
    John
    Barbi
    Mitch
    Tim
    Lisa
    Example3: We can use AWK option –F for mentioning input field separator as shown in below example for printing 4th column.
    awk -F’,’ ‘{print $4}’ db.txt
    Output:
    IBM
    JHH
    BofA
    DELL
    SmartDrive
    OFS AWK variable: This variable is useful for mentioning what is your output field separator which separates output data.
    Example4: Display only 1st and 4th column and the separator between at output for these columns should be $.
    awk ‘BEGIN{FS=”,”;OFS=” $ “}{print $1,$4}’ db.txt
    Output:
    John $ IBM
    Barbi $ JHH
    Mitch $ BofA
    Tim $ DELL
    Lisa $ SmartDrive
    Note: I given space before and after $ in OFS variable to show better output. You can remove the spaces if required.
    I will leave printing only first and fourth columns to readers without using OFS and see the issue.

    GROUP2: RS(ROW SEPARATOR) AND ORS(OUTPUT RECORD SEPARATOR)


    RS(Row separator) and ORS(Output record separator).
    RS AWK Variable: Row Separator is helpful in defining separator between rows in a file. By default AWK takes row separator as new line. We can change this by using RS built-in variable.
    Example5: I want to convert a sentence to a word per line. We can use RS variable for doing it.
    echo “This is how it works” | awk ‘BEGIN{RS=” ”}{print $0}’
    Output:
    This
    is
    how
    it
    works
    ORS(Output Record Separator): This variable is useful for defining the record separator for the AWK command output. By default ORS is set to new line.
    Example6: Print all the company names in single line which are in 4th column.
    awk -F’,’ ‘BEGIN{ORS=” “}{print $4}’ db.txt
    Output:
    IBM JHH BofA DELL SmartDrive

    GROUP3: NF, NR AND FNR

     NF AWK variable: This variable keeps information about total fields in a given row. The final value of a row can be represented with $NF.
    Example7: Print number of fields each row in db.txt file.
     awk ‘{print NF}’ db.txt
    Output:
    5
    5
    4
    5
    4
    Example8: Print last field in each row of db.txt file.
    awk ‘{print $NF}’ db.txt
    Output:
    77
    45
    37
    95
    47

    Note: If you observe above two examples We used Just NF for giving us the count of fields in a given row and $NF for displaying last element in each row. $NF will come handy when you are not sure what is your last column number.
    NR AWK variable: This variable keeps the value of present line number. This will come handy when you want to print line numbers in a file.
    Example9: Print line number for each line in a given file.
    awk ‘{print NR, $0}’ db.txt
    Output:
    1 Jones 2143 78 84 77
    2 Gondrol 2321 56 58 45
    3 RinRao 2122234 38 37
    4 Edwin 253734 87 97 95
    5 Dayan 24155 30 47
     This can be treated as cat command -n option for displaying line number for a file.
    FNR AWK variable: This variable keeps count of number of lines present in a given file/data. This will come handy when you want to print no of line present in a given file. This command is equivalent to wc -l command.
    Example10: Print total number of lines in a given file.
    awk ‘END{print FNR}’ db.txt
    Output:
    5
    From the above output we can conclude that number of lines present in db.txt file is 5.

    GROUP4: FILENAME VARIABLE

    FILENAME AWK variable: This variable contain file awk command is processing.
    Example11: Print filename for each line in a given file.
     awk ‘{print FILENAME, NR, $0}’ abc.txt
    Output:
    abc.txt 1 Jones 2143 78 84 77
    abc.txt 2 Gondrol 2321 56 58 45
    abc.txt 3 RinRao 2122234 38 37
    abc.txt 4 Edwin 253734 87 97 95
    abc.txt 5 Dayan 24155 30 47

    In our next post we will see how to use ARRAY’s in AWK scripting.

0 comments:

Post a Comment