Wednesday 31 July 2019

COUNT UNIQUE ELEMENTS IN TEXT FILE WITH AWK

Counting the distinct/unique elements of text file is a common task. Below is an example of doing this is AWK, using sample_data_1.txt.
Here is what is happening above:
  • cat sample_data_1.txt – reading the file piping the data to AWK
  • BEGIN{FS=”\t”} – specifying the field separators of the file
  • NR>1 – Only executing the following code block if the record number is greater than 1 (removing the header)
  • names[$2]=1 – This script counts the distinct elements of column number 2. So here we are storing the values of this column in an array.  AWK arrays are associated arrays (holding keys and values).  Each value is simply set to “1” as a place holder for the value.
  • END{print length(names)} – printing the length of the names array
Using an array in AWK is much faster than a common alternative of using the sort and uniq:

0 comments:

Post a Comment