Friday, 2 August 2019

Linux - grep Vs sed Vs awk

How many times have we used grep to narrow our searches on a Linux FS (File System)? Well, this is a good question since almost everyone (intended as Linux average User) knows grep and its basic features set. To recap: g/re/p stands for globally search a regular expression and print, a name, a manifesto I would say.
The Linux’s ecosystem has two other very useful and powerful tools for patterns search: sed that stands for stream editor, and awk that instead is named by the names of its creators, AhoWeinberger and Kerningham.
regex
A regex
Given the three, what is the main difference? Which is the best usage for each one of the three? Straight to the point, very good questions that hereafter are answered.
  • grep. A fast and powerful pattern search tool that can be easily combined with other filters to find results and customize the display, even if the main aim is to search for matches. Its main usage consists in narrowing search results by forcing the match with the given pattern.
  • sed. A fast stream editor, able to search for a pattern and apply the given transformations and/or commands; still easy to combine in sophisticated filters, but serving a different aim: modifying the text in the stream. Its main usage consists in editing in-memory a stream according to the given pattern.
  • awk. A loosely typed programming language for stream processing, where the basic unit is the String (intended as an array of characters) that can be i. matched, ii. substituted and iii. worked around; most of the times, it is no really needed to combine awk with other filters, since its reporting capabilities are very powerful (the printf built-in function allows to format the output text as in C). Its main usage consists in perform fine-grained (variables can be defined and modified incrementally) and programmatic manipulations (flow control statements) to the input stream.

According to the above definitions, the three tools serve different purposes, may still be used in combination, and as said work in matching patterns, but, there is still no net difference between sed and awk so let’s try to clarify by examples.
grep
Input Data
total 68
-rw-rw-r--. 1 pmaresca pmaresca 49 Mar 21 20:34 blanks
-rw-rw-r--. 1 pmaresca pmaresca 36257 Mar 22 20:05 commands
-rw-rw-r--. 1 pmaresca pmaresca 79 Mar 20 23:18 json
-rw-rw-r--. 1 pmaresca pmaresca 37 Mar 21 20:44 keyvalue
-rw-rw-r--. 1 pmaresca pmaresca 873 Mar 21 22:51 menu_json
-rw-rw-r--. 1 pmaresca pmaresca 85 Mar 22 18:41 phones
-rw-rw-r--. 1 pmaresca pmaresca 16 Mar 21 19:01 sum
-rw-rw-r--. 1 pmaresca pmaresca 67 Mar 22 18:31 telephones
-rw-rw-r--. 1 pmaresca pmaresca 199 Mar 22 14:21 test
Processing – Take the ‘ls’ output and grep for a pattern ‘b.+s’
 ls -l | grep -E 'b.+s' 
Output Data
-rw-rw-r–. 1 pmaresca pmaresca    49 Mar 21 20:34 blanks
sed
Input Data – ‘phones’
(555)555-1212
(555)555-1213
(555)555-1214
(666)555-1215
(666)555-1216
(777)555-1217
Processing – take in Input some US numbers and split each one of them in i. Area, ii. Second and iii. Third

 sed -e 's/\(^.*)\)\(.*-\)\(.*$\)/Area: \1 Second: \2 Third: \3/g' phones 

Output Data
Area: (555) Second: 555- Third: 1212
Area: (555) Second: 555- Third: 1213
Area: (555) Second: 555- Third: 1214
Area: (666) Second: 555- Third: 1215
Area: (666) Second: 555- Third: 1216
Area: (777) Second: 555- Third: 1217 
awk



Input Data – ‘menu_json’
{"menu": {
   "header": "SVG Viewer",
   "items": [
     {"id": "Open"},
     {"id": "OpenNew", "label": "Open New"},
     null,
     {"id": "ZoomIn", "label": "Zoom In"},
     {"id": "ZoomOut", "label": "Zoom Out"},
     {"id": "OriginalView", "label": "Original View"},
     null,
     {"id": "Quality"}, 
     {"id": "Pause"},
     {"id": "Mute"},
     null,
     {"id": "Find", "label": "Find..."},
     {"id": "FindAgain", "label": "Find Again"},
     {"id": "Copy"},
     {"id": "CopyAgain", "label": "Copy Again"},
     {"id": "CopySVG", "label": "Copy SVG"},
     {"id": "ViewSVG", "label": "View SVG"},
     {"id": "ViewSource", "label": "View Source"},
     {"id": "SaveAs", "label": "Save As"},
     null,
     {"id": "Help"},
     {"id": "About", "label": "About Adobe CVG Viewer..."}
  ]
}}
Processing – take in Input the menu data,  extract the IDs, the first value for each one of them, and build a set of Shell Exports
 awk 'BEGIN { sum = 0 }; \

/id/ { sum += 1; gsub(/[\",}]/, ""); sub(/{id:/, "export VAR_"sum"="); \

printf("%s %s%s%s%s\n", $1, $2, "\"", $3, "\"") }; \

END { print "Total", sum }' menu_json 
Output Data
export VAR_1="Open"
export VAR_2="OpenNew"
export VAR_3="ZoomIn"
export VAR_4="ZoomOut"
export VAR_5="OriginalView"
export VAR_6="Quality"
export VAR_7="Pause"
export VAR_8="Mute"
export VAR_9="Find"
export VAR_10="FindAgain"
export VAR_11="Copy"
export VAR_12="CopyAgain"
export VAR_13="CopySVG"
export VAR_14="ViewSVG"
export VAR_15="ViewSource"
export VAR_16="SaveAs"
export VAR_17="Help"
export VAR_18="About"
Total 18

0 comments:

Post a Comment