Monday, 6 August 2018

grep Command in Linux

grep is a program for searching a given string pattern in files. It searches the files for the pattern and prints the lines that contain strings matching the pattern. For example,

$ # grep pattern filenames ... $ grep 'hbox' find.c GtkWidget *window, *scrolled_win, *hbox, *vbox, *find; hbox = gtk_hbox_new (FALSE, 5); gtk_box_pack_start (GTK_BOX (hbox), w -> entry, TRUE, TRUE, 0); gtk_box_pack_start (GTK_BOX (hbox), find, FALSE, TRUE, 0); gtk_box_pack_start (GTK_BOX (vbox), hbox, FALSE, TRUE, 0);

2.0 grep Command Syntax

grep [OPTIONS] pattern [file ...] grep [OPTIONS] [-e pattern | -f file ] [file ...]

3.0 Regular Expressions

The pattern, mentioned above, is a regular expression. A regular expression is a pattern with certain metacharacters having special meanings. A regular expression matches a set of strings. So, if the input has a string from the set associated with a regular expression, we get a match. If we we look closely at the grep command name, we see g re p. In one of the earliest text editor made available on UNIX systems, ed, the g command is the global command, for all lines in the file. re is a regular expression between two slashes and p is the command to print all lines that match the regular expression. This works in the vi editor as well, just try the command, :g/regular-expression/p in vi on a Linux system, substituting regular-expression with a search string. It should print all lines containing the search string. Let's look at some of the regular expressions, listed in the decreasing order of precedence below. rr1 and r2are regular expressions.
grep - Regular Expressions
Regular ExpressionDescription
cAny character, c, except for special characters, matches itself.
\cFor any special character, c, the meaning is turned off and c is matched.
^Anchors to the beginning of the line.
$Anchors to the end of the line.
.Any single character.
[...]Any one of the characters inside brackets. Ranges like a-e are OK.
[[:lower:]]Any one of the lowercase letters (for C locale and ASCII character coding, a-z).
[[:upper:]]Any one of the uppercase letters (for C locale and ASCII character coding, A-Z).
[[:alpha:]]Any one of the alphabetic characters (from the union of [[:lower:]] and [[:upper:]]).
[[:digit:]]Any one of the digits, 0-9
[[:alnum:]]Any one of the alphanumeric characters (from the union of [[:alpha:]] and [[:digit:]]).
[[:punct:]]Any one of the punctuation characters (for C locale and ASCII character coding, from ! " # $ % & ' ( ) * + , - . / : ; ? @ [ \ ] ^ _ ` { | } ~)
[[:graph:]]Any one of the graphical characters (from the union of [[:alnum:]] and [[:punct:]])
[[:space:]]Any one of the space characters (for C locale and ASCII character encoding, from tab, newline, vertical tab, form feed, carriage return, and space).
[[:print:]]Any one of the printable characters ([[:graph:]] and space).
[[:blank:]]One of the blank characters (space and tab).
[[:cntrl:]]Any one of the control characters (for ASCII, octal 000 through octal 037, and octal 177 (DEL)).
[[:xdigit:]]Any one of the hexadecimal digits (0-9 and a-f)
[^...]Any character not in ...
r*r is matched 0 or more times.
r+r is matched 1 or more times (grep -E only)
r?r is matched zero or 1 time (grep -E only)
r{n}r is matched exactly n times (grep -E only)
r{n,}r is matched at least n times (grep -E only)
r{,m}r is matched at most m times (grep -E only)
r{n,m}r is matched at least n times but not more than m times (grep -E only)
r1r2r1 followed by r2 are matched.
r1|r2Either r1 or r2 is matched. (grep -E only)
(r)r is matched. Can be nested. (grep -E only)
The -E option is for extended regular expressions.

4.0 Examples

4.1 Case insensitive search

With the -i option, you can ask grep to ignore case and do case insensitive search.
$ grep -i 'GTK' find.c #include <gtk/gtk.h> GtkWidget *entry, *textview; gtk_init (&argc, &argv); window = gtk_window_new (GTK_WINDOW_TOPLEVEL); gtk_window_set_title (GTK_WINDOW (window), "Searching Buffers"); ...

4.2 Search directories recursively

If an input file is a directory and the -r option is used, grep searches pattern recursively in files in that directory. That is, if a file in that directory is again a directory, it searches in files in it and so on. For example,
$ grep -r 'gtk' * buttons.c:#include <gtk/gtk.h> buttons.c: gtk_init (&argc, &argv); buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ... new/wind.c:#include <gtk/gtk.h> new/wind.c: gtk_init (&argc, &argv); new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); new/wind.c: gtk_window_set_title (GTK_WINDOW (window), "Searching Buffers"); new/wind.c: gtk_container_set_border_width (GTK_CONTAINER (window), 10); ...
grep has a -R option also. The difference between the -r and -R options is the handling of symbolic links. In case of the -r option, grep follows the symbolic link only if it has been specifically passed on the command line. With the -Roption, grep always follows the symbolic link. For example,
$ ln -s ~/src/tables.c tablelink $ # not passing tablelink explicitly on the command line $ grep -r 'gtk' . ./new/wind.c:#include <gtk/gtk.h> ./new/wind.c: gtk_init (&argc, &argv); ... ./buttons.c:#include ./buttons.c: gtk_init (&argc, &argv); ./buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ... $ # passing tablelink on the command line $ grep -r 'gtk' * buttons.c:#include <gtk/gtk.h> buttons.c: gtk_init (&argc, &argv); ... new/wind.c:#include <gtk/gtk.h> new/wind.c: gtk_init (&argc, &argv); new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ... tablelink:#include <gtk/gtk.h> tablelink: gtk_init (&argc, &argv); tablelink: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ... $ # trying grep -R without passing tablelink explicitly $ grep -R 'gtk' . ./tablelink:#include <gtk/gtk.h> ./tablelink: gtk_init (&argc, &argv); ./tablelink: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ... ./new/wind.c:#include &ltgtk/gtk.h> ./new/wind.c: gtk_init (&argc, &argv); ./new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ... ./buttons.c:#include <gtk/gtk.h> ./buttons.c: gtk_init (&argc, &argv); ./buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL); ...

4.3 Print line numbers along with lines matching the pattern

$ grep -n 'vbox' find.c 13: GtkWidget *window, *scrolled_win, *hbox, *vbox, *find; 42: vbox = gtk_vbox_new (FALSE, 5); 43: gtk_box_pack_start (GTK_BOX (vbox), scrolled_win, TRUE, TRUE, 0); 44: gtk_box_pack_start (GTK_BOX (vbox), hbox, FALSE, TRUE, 0); 46: gtk_container_add (GTK_CONTAINER (window), vbox);

4.4 Print non-matching lines

Suppose we wish to print lines that do not contain the given pattern. The -v option inverts the selection and prints the lines that do not contain the pattern.
$ # find lines not containing the string 'gtk' in file find.c $ grep -v 'gtk' find.c typedef struct { GtkWidget *entry, *textview; } Widgets; static void destroy (GtkWidget*, gpointer); static gboolean delete_event (GtkWidget*, GdkEvent *, gpointer); static void search (GtkButton *, Widgets *); ... ... ...

4.5 Print count of matches

The -c option suppresses the normal output and, instead, prints the count of matching lines for each input file. And, if you use the -v option along, you get the count of non-matching lines.
$ grep -c 'gtk' * buttons.c:15 city-palace-jaipur.jpg:0 grep.txt:0 helloworld:0 myhello:0 grep: new: Is a directory new:0 tablelink:19 x:0 y:0 $ # print the count of non-matching lines $ grep -cv 'gtk' * buttons.c:32 city-palace-jaipur.jpg:359 grep.txt:646 helloworld:0 myhello:0 grep: new: Is a directory new:0 tablelink:39 x:10 y:32

4.6 Full word matches

The -w option causes matching of the pattern with words. A word comprises of letter(s), digit(s) and underscore. For a match, a word must either start at the beginning of the line or be preceded by a non-word character. Also, it must end at the end of the line or be succeeded by a non-word character. For example,
$ grep -w 'gtk' * buttons.c:#include <gtk/gtk.h> buttons.c: // gtk example buttons.c: gtk_button_set_image (GTK_BUTTON (button), gtk_image_new_from_stock ("gtk-apply", GTK_ICON_SIZE_MENU)); buttons.c: gtk_button_set_image (GTK_BUTTON (widget), gtk_image_new_from_stock ("gtk-discard", GTK_ICON_SIZE_MENU)); grep: new: Is a directory tablelink:#include <gtk/gtk.h>

4.7 Match full line

The -x option matches the full line. For example,
$ grep -x ' // gtk example' * buttons.c: // gtk example grep: new: Is a directory

4.8 Print file names having matches

Using the -l (ell) option, we can get the names of files containing matching strings. For example,
$ grep -lr 'gtk' * buttons.c new/wind.c tablelink

4.9 Files containing blank lines

Suppose we want to know which files contain blank lines, the grep command would be,
$ grep -Elx '[[:space:]]*' * city-palace-jaipur.jpg grep.txt grep: new: Is a directory tablelink

4.10 Find certain files but skip some extensions

Suppose we wish to find files with names containing the string hello but want to skip the .o and .php files. We can do that with the command,
$ find . -name "*hello*" | grep -Ev '(o$)|(php$)' ./hello.c ./helloworld.cpp ./helloworld.c ./helloworld

4.11 Find all directories in the present working directory

$ cd /usr/include $ ls -l | grep '^d' drwxr-xr-x 2 root root 4096 Dec 6 21:51 arpa drwxr-xr-x 2 root root 4096 Dec 6 21:51 asm-generic drwxr-xr-x 2 root root 4096 Sep 30 17:11 avahi-client drwxr-xr-x 2 root root 4096 Sep 30 17:11 avahi-common drwxr-xr-x 3 root root 4096 Sep 29 00:52 c++ ...

4.12 Using multiple search patterns

With the -e option, we can specify multiple search patterns. For example,
$ grep -e 'signal' -e 'container' * buttons.c: gtk_container_set_border_width (GTK_CONTAINER (window), 25); buttons.c: g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy), buttons.c: /* Connect the button to the clicked signal. The callback function receives buttons.c: g_signal_connect (G_OBJECT (button), "clicked", buttons.c: gtk_container_add (GTK_CONTAINER (window), button); ...
We can, alternatively, put patterns in a file, keeping one pattern per line. Then we can use the grep command with -foption.
$ cat ../pat signal container init kjohri@veena:~/tmp$ grep -f ../pat * buttons.c: gtk_init (&argc, &argv); buttons.c: gtk_container_set_border_width (GTK_CONTAINER (window), 25); buttons.c: g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy), ...

0 comments:

Post a Comment