Linux: Text Processing: grep, cat, awk, uniq ~ Tech Blog

Thursday 1 August 2019

Linux: Text Processing: grep, cat, awk, uniq

This page is a basic tutorial on using Linux shell's text processing tools. They are especially useful for processing lines.

Get Lines: grep
grep is the most important command. You should master it.

Show Matching Lines
# show lines containing xyz in myFile
grep 'xyz' myFile
# show lines containing xyz in all files ending in html in current dir top level files
grep 'xyz' *html
Grep for All Files in a Dir
# show matching lines in dir and subdir, file name ending in html
grep -r 'xyz' --include='*html' ~/web
Here's what the options mean:

-r → all subdirectories.
--include='*html' → match file name by a glob pattern (* is a wildcard that matches 0 or more any char.).
grep without regex
Use the option -F. (F means “Fixed string”)

# search ruby source files that contains .* literally
grep -F '.*' *rb
This is useful when you want to search complicated string in source code, such as *@$.*#+-/\|`.

If your string is really complicated, you can put it in a file, and use the option --file=my_pattern_filename for the search text. Example:

# search js source code in dir and all subdirs. The regex is stored in file named myPattern.txt
grep -r --file=myPattern.txt --include=*js .
Most Useful Grep Options

Options for Pattern String
-F → use fixed string. (no regex)
-P → use Perl's regex syntax. (Perl and Python's regex are basically compatible.)
-i → ignore case.
-v → print lines NOT containing the pattern.
Examples:

# print lines not matching a string, for all files ending in “log”
grep -v 'html HTTP' *log
# print lines containing “png HTTP” or “jpg HTTP”
grep -P 'png HTTP|jpg HTTP' *log
Options for File Selection
*.html = search all files ending in ".html”, in current dir. (files in subdir are ignored)
grep -r --include='*html' pattern dirname = search files for pattern in dirname including subdirs, but only files ending in ".html”.
Output Options
-H = include file name in the result.
-h = do NOT print file name.
-l = print just file name; do NOT print the matched lines.
-L = print just file name that does NOT match.
More Grep Examples
# print lines containing “html HTTP” in a log file, show only the 12th and 7th columns, show only certain lines, then sort, then condense repeation with count, then sort that by the count.

grep 'html HTTP' apache.log | awk '{print $12 , $7}' | grep -i -P "livejournal|blogspot" | sort | uniq -c | sort -n
# print all links in all html files of a dir, except certain links. Output to xx.txt

grep -r --include='*html' -F 'http://' ~/web | grep -v -P 'google.com|twitter.com|reddit.com|wikipedia.org' > xx.txt
text columns, awk, sort, unique, sum column …
show only nth column in a text file
# print the 7th column. (columns are separated by spaces by default.)
cat myFile | awk '{print $7}'
For delimiter other than space, for example tab, use -F option. Example:

# print 12th atd 7th column, Tab is the separator
cat myFile | awk -F\t '{print $12 , $7}'
Alternative solution is to use the cut utility, but it does not accept regex as delimeters. So, if you have column separated by different number of spaces, “cut” cannot do it.

remove duplicate lines
sort myFile -u
or

sort myFile | uniq
To prepend the line with a count of repetition, use sort myFile | uniq -c

sum up 2nd column
awk '{sum += $2} END {print sum}' file_name → sum the 2nd column in a file.

show only first few lines of a huge file
head file_name → show first n lines of a file.

head -n 100 file_name → show first 100 lines of a file.

tail file_name → show the last n lines of a file.

head -n 100 file_name → show last 100 lines of a file.

Tech Blog

Thursday 1 August 2019

Linux: Text Processing: grep, cat, awk, uniq

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains

Thursday 1 August 2019

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains