1.0 grep
grep is a program for searching a given string pattern in files. It searches the files for the pattern and prints the lines that contain strings matching the pattern. For example,
$ # grep pattern filenames ...
$ grep 'hbox' find.c
GtkWidget *window, *scrolled_win, *hbox, *vbox, *find;
hbox = gtk_hbox_new (FALSE, 5);
gtk_box_pack_start (GTK_BOX (hbox), w -> entry, TRUE, TRUE, 0);
gtk_box_pack_start (GTK_BOX (hbox), find, FALSE, TRUE, 0);
gtk_box_pack_start (GTK_BOX (vbox), hbox, FALSE, TRUE, 0);
2.0 grep Command Syntax
grep [OPTIONS] pattern [file ...]
grep [OPTIONS] [-e pattern | -f file ] [file ...]
3.0 Regular Expressions
The pattern, mentioned above, is a regular expression. A regular expression is a pattern with certain metacharacters having special meanings. A regular expression matches a set of strings. So, if the input has a string from the set associated with a regular expression, we get a match. If we we look closely at the grep command name, we see g re p. In one of the earliest text editor made available on UNIX systems, ed, the g command is the global command, for all lines in the file. re is a regular expression between two slashes and p is the command to print all lines that match the regular expression. This works in the vi editor as well, just try the command, :g/regular-expression/p in vi on a Linux system, substituting regular-expression with a search string. It should print all lines containing the search string. Let's look at some of the regular expressions, listed in the decreasing order of precedence below. r, r1 and r2 are regular expressions.
grep - Regular Expressions
Regular Expression Description
c Any character, c, except for special characters, matches itself.
\c For any special character, c, the meaning is turned off and c is matched.
^ Anchors to the beginning of the line.
$ Anchors to the end of the line.
. Any single character.
[...] Any one of the characters inside brackets. Ranges like a-e are OK.
[[:lower:]] Any one of the lowercase letters (for C locale and ASCII character coding, a-z).
[[:upper:]] Any one of the uppercase letters (for C locale and ASCII character coding, A-Z).
[[:alpha:]] Any one of the alphabetic characters (from the union of [[:lower:]] and [[:upper:]]).
[[:digit:]] Any one of the digits, 0-9
[[:alnum:]] Any one of the alphanumeric characters (from the union of [[:alpha:]] and [[:digit:]]).
[[:punct:]] Any one of the punctuation characters (for C locale and ASCII character coding, from ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~)
[[:graph:]] Any one of the graphical characters (from the union of [[:alnum:]] and [[:punct:]])
[[:space:]] Any one of the space characters (for C locale and ASCII character encoding, from tab, newline, vertical tab, form feed, carriage return, and space).
[[:print:]] Any one of the printable characters ([[:graph:]] and space).
[[:blank:]] One of the blank characters (space and tab).
[[:cntrl:]] Any one of the control characters (for ASCII, octal 000 through octal 037, and octal 177 (DEL)).
[[:xdigit:]] Any one of the hexadecimal digits (0-9 and a-f)
[^...] Any character not in ...
r* r is matched 0 or more times.
r+ r is matched 1 or more times (grep -E only)
r? r is matched zero or 1 time (grep -E only)
r{n} r is matched exactly n times (grep -E only)
r{n,} r is matched at least n times (grep -E only)
r{,m} r is matched at most m times (grep -E only)
r{n,m} r is matched at least n times but not more than m times (grep -E only)
r1r2 r1 followed by r2 are matched.
r1|r2 Either r1 or r2 is matched. (grep -E only)
(r) r is matched. Can be nested. (grep -E only)
The -E option is for extended regular expressions.
4.0 Examples
4.1 Case insensitive search
With the -i option, you can ask grep to ignore case and do case insensitive search.
$ grep -i 'GTK' find.c
#include <gtk/gtk.h>
GtkWidget *entry, *textview;
gtk_init (&argc, &argv);
window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
gtk_window_set_title (GTK_WINDOW (window), "Searching Buffers");
...
4.2 Search directories recursively
If an input file is a directory and the -r option is used, grep searches pattern recursively in files in that directory. That is, if a file in that directory is again a directory, it searches in files in it and so on. For example,
$ grep -r 'gtk' *
buttons.c:#include <gtk/gtk.h>
buttons.c: gtk_init (&argc, &argv);
buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
new/wind.c:#include <gtk/gtk.h>
new/wind.c: gtk_init (&argc, &argv);
new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
new/wind.c: gtk_window_set_title (GTK_WINDOW (window), "Searching Buffers");
new/wind.c: gtk_container_set_border_width (GTK_CONTAINER (window), 10);
...
grep has a -R option also. The difference between the -r and -R options is the handling of symbolic links. In case of the -r option, grep follows the symbolic link only if it has been specifically passed on the command line. With the -R option, grep always follows the symbolic link. For example,
$ ln -s ~/src/tables.c tablelink
$ # not passing tablelink explicitly on the command line
$ grep -r 'gtk' .
./new/wind.c:#include <gtk/gtk.h>
./new/wind.c: gtk_init (&argc, &argv);
...
./buttons.c:#include
./buttons.c: gtk_init (&argc, &argv);
./buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
$ # passing tablelink on the command line
$ grep -r 'gtk' *
buttons.c:#include <gtk/gtk.h>
buttons.c: gtk_init (&argc, &argv);
...
new/wind.c:#include <gtk/gtk.h>
new/wind.c: gtk_init (&argc, &argv);
new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
tablelink:#include <gtk/gtk.h>
tablelink: gtk_init (&argc, &argv);
tablelink: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
$ # trying grep -R without passing tablelink explicitly
$ grep -R 'gtk' .
./tablelink:#include <gtk/gtk.h>
./tablelink: gtk_init (&argc, &argv);
./tablelink: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
./new/wind.c:#include <gtk/gtk.h>
./new/wind.c: gtk_init (&argc, &argv);
./new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
./buttons.c:#include <gtk/gtk.h>
./buttons.c: gtk_init (&argc, &argv);
./buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
4.3 Print line numbers along with lines matching the pattern
$ grep -n 'vbox' find.c
13: GtkWidget *window, *scrolled_win, *hbox, *vbox, *find;
42: vbox = gtk_vbox_new (FALSE, 5);
43: gtk_box_pack_start (GTK_BOX (vbox), scrolled_win, TRUE, TRUE, 0);
44: gtk_box_pack_start (GTK_BOX (vbox), hbox, FALSE, TRUE, 0);
46: gtk_container_add (GTK_CONTAINER (window), vbox);
4.4 Print non-matching lines
Suppose we wish to print lines that do not contain the given pattern. The -v option inverts the selection and prints the lines that do not contain the pattern.
$ # find lines not containing the string 'gtk' in file find.c
$ grep -v 'gtk' find.c
typedef struct {
GtkWidget *entry, *textview;
} Widgets;
static void destroy (GtkWidget*, gpointer);
static gboolean delete_event (GtkWidget*, GdkEvent *, gpointer);
static void search (GtkButton *, Widgets *);
...
...
...
4.5 Print count of matches
The -c option suppresses the normal output and, instead, prints the count of matching lines for each input file. And, if you use the -v option along, you get the count of non-matching lines.
$ grep -c 'gtk' *
buttons.c:15
city-palace-jaipur.jpg:0
grep.txt:0
helloworld:0
myhello:0
grep: new: Is a directory
new:0
tablelink:19
x:0
y:0
$ # print the count of non-matching lines
$ grep -cv 'gtk' *
buttons.c:32
city-palace-jaipur.jpg:359
grep.txt:646
helloworld:0
myhello:0
grep: new: Is a directory
new:0
tablelink:39
x:10
y:32
4.6 Full word matches
The -w option causes matching of the pattern with words. A word comprises of letter(s), digit(s) and underscore. For a match, a word must either start at the beginning of the line or be preceded by a non-word character. Also, it must end at the end of the line or be succeeded by a non-word character. For example,
$ grep -w 'gtk' *
buttons.c:#include <gtk/gtk.h>
buttons.c: // gtk example
buttons.c: gtk_button_set_image (GTK_BUTTON (button), gtk_image_new_from_stock ("gtk-apply", GTK_ICON_SIZE_MENU));
buttons.c: gtk_button_set_image (GTK_BUTTON (widget), gtk_image_new_from_stock ("gtk-discard", GTK_ICON_SIZE_MENU));
grep: new: Is a directory
tablelink:#include <gtk/gtk.h>
4.7 Match full line
The -x option matches the full line. For example,
$ grep -x ' // gtk example' *
buttons.c: // gtk example
grep: new: Is a directory
4.8 Print file names having matches
Using the -l (ell) option, we can get the names of files containing matching strings. For example,
$ grep -lr 'gtk' *
buttons.c
new/wind.c
tablelink
4.9 Files containing blank lines
Suppose we want to know which files contain blank lines, the grep command would be,
$ grep -Elx '[[:space:]]*' *
city-palace-jaipur.jpg
grep.txt
grep: new: Is a directory
tablelink
4.10 Find certain files but skip some extensions
Suppose we wish to find files with names containing the string hello but want to skip the .o and .php files. We can do that with the command,
$ find . -name "*hello*" | grep -Ev '(o$)|(php$)'
./hello.c
./helloworld.cpp
./helloworld.c
./helloworld
4.11 Find all directories in the present working directory
$ cd /usr/include
$ ls -l | grep '^d'
drwxr-xr-x 2 root root 4096 Dec 6 21:51 arpa
drwxr-xr-x 2 root root 4096 Dec 6 21:51 asm-generic
drwxr-xr-x 2 root root 4096 Sep 30 17:11 avahi-client
drwxr-xr-x 2 root root 4096 Sep 30 17:11 avahi-common
drwxr-xr-x 3 root root 4096 Sep 29 00:52 c++
...
4.12 Using multiple search patterns
With the -e option, we can specify multiple search patterns. For example,
$ grep -e 'signal' -e 'container' *
buttons.c: gtk_container_set_border_width (GTK_CONTAINER (window), 25);
buttons.c: g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy),
buttons.c: /* Connect the button to the clicked signal. The callback function receives
buttons.c: g_signal_connect (G_OBJECT (button), "clicked",
buttons.c: gtk_container_add (GTK_CONTAINER (window), button);
...
We can, alternatively, put patterns in a file, keeping one pattern per line. Then we can use the grep command with -f option.
$ cat ../pat
signal
container
init
$ grep -f ../pat *
buttons.c: gtk_init (&argc, &argv);
buttons.c: gtk_container_set_border_width (GTK_CONTAINER (window), 25);
buttons.c: g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy),
...
grep is a program for searching a given string pattern in files. It searches the files for the pattern and prints the lines that contain strings matching the pattern. For example,
$ # grep pattern filenames ...
$ grep 'hbox' find.c
GtkWidget *window, *scrolled_win, *hbox, *vbox, *find;
hbox = gtk_hbox_new (FALSE, 5);
gtk_box_pack_start (GTK_BOX (hbox), w -> entry, TRUE, TRUE, 0);
gtk_box_pack_start (GTK_BOX (hbox), find, FALSE, TRUE, 0);
gtk_box_pack_start (GTK_BOX (vbox), hbox, FALSE, TRUE, 0);
2.0 grep Command Syntax
grep [OPTIONS] pattern [file ...]
grep [OPTIONS] [-e pattern | -f file ] [file ...]
3.0 Regular Expressions
The pattern, mentioned above, is a regular expression. A regular expression is a pattern with certain metacharacters having special meanings. A regular expression matches a set of strings. So, if the input has a string from the set associated with a regular expression, we get a match. If we we look closely at the grep command name, we see g re p. In one of the earliest text editor made available on UNIX systems, ed, the g command is the global command, for all lines in the file. re is a regular expression between two slashes and p is the command to print all lines that match the regular expression. This works in the vi editor as well, just try the command, :g/regular-expression/p in vi on a Linux system, substituting regular-expression with a search string. It should print all lines containing the search string. Let's look at some of the regular expressions, listed in the decreasing order of precedence below. r, r1 and r2 are regular expressions.
grep - Regular Expressions
Regular Expression Description
c Any character, c, except for special characters, matches itself.
\c For any special character, c, the meaning is turned off and c is matched.
^ Anchors to the beginning of the line.
$ Anchors to the end of the line.
. Any single character.
[...] Any one of the characters inside brackets. Ranges like a-e are OK.
[[:lower:]] Any one of the lowercase letters (for C locale and ASCII character coding, a-z).
[[:upper:]] Any one of the uppercase letters (for C locale and ASCII character coding, A-Z).
[[:alpha:]] Any one of the alphabetic characters (from the union of [[:lower:]] and [[:upper:]]).
[[:digit:]] Any one of the digits, 0-9
[[:alnum:]] Any one of the alphanumeric characters (from the union of [[:alpha:]] and [[:digit:]]).
[[:punct:]] Any one of the punctuation characters (for C locale and ASCII character coding, from ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~)
[[:graph:]] Any one of the graphical characters (from the union of [[:alnum:]] and [[:punct:]])
[[:space:]] Any one of the space characters (for C locale and ASCII character encoding, from tab, newline, vertical tab, form feed, carriage return, and space).
[[:print:]] Any one of the printable characters ([[:graph:]] and space).
[[:blank:]] One of the blank characters (space and tab).
[[:cntrl:]] Any one of the control characters (for ASCII, octal 000 through octal 037, and octal 177 (DEL)).
[[:xdigit:]] Any one of the hexadecimal digits (0-9 and a-f)
[^...] Any character not in ...
r* r is matched 0 or more times.
r+ r is matched 1 or more times (grep -E only)
r? r is matched zero or 1 time (grep -E only)
r{n} r is matched exactly n times (grep -E only)
r{n,} r is matched at least n times (grep -E only)
r{,m} r is matched at most m times (grep -E only)
r{n,m} r is matched at least n times but not more than m times (grep -E only)
r1r2 r1 followed by r2 are matched.
r1|r2 Either r1 or r2 is matched. (grep -E only)
(r) r is matched. Can be nested. (grep -E only)
The -E option is for extended regular expressions.
4.0 Examples
4.1 Case insensitive search
With the -i option, you can ask grep to ignore case and do case insensitive search.
$ grep -i 'GTK' find.c
#include <gtk/gtk.h>
GtkWidget *entry, *textview;
gtk_init (&argc, &argv);
window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
gtk_window_set_title (GTK_WINDOW (window), "Searching Buffers");
...
4.2 Search directories recursively
If an input file is a directory and the -r option is used, grep searches pattern recursively in files in that directory. That is, if a file in that directory is again a directory, it searches in files in it and so on. For example,
$ grep -r 'gtk' *
buttons.c:#include <gtk/gtk.h>
buttons.c: gtk_init (&argc, &argv);
buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
new/wind.c:#include <gtk/gtk.h>
new/wind.c: gtk_init (&argc, &argv);
new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
new/wind.c: gtk_window_set_title (GTK_WINDOW (window), "Searching Buffers");
new/wind.c: gtk_container_set_border_width (GTK_CONTAINER (window), 10);
...
grep has a -R option also. The difference between the -r and -R options is the handling of symbolic links. In case of the -r option, grep follows the symbolic link only if it has been specifically passed on the command line. With the -R option, grep always follows the symbolic link. For example,
$ ln -s ~/src/tables.c tablelink
$ # not passing tablelink explicitly on the command line
$ grep -r 'gtk' .
./new/wind.c:#include <gtk/gtk.h>
./new/wind.c: gtk_init (&argc, &argv);
...
./buttons.c:#include
./buttons.c: gtk_init (&argc, &argv);
./buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
$ # passing tablelink on the command line
$ grep -r 'gtk' *
buttons.c:#include <gtk/gtk.h>
buttons.c: gtk_init (&argc, &argv);
...
new/wind.c:#include <gtk/gtk.h>
new/wind.c: gtk_init (&argc, &argv);
new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
tablelink:#include <gtk/gtk.h>
tablelink: gtk_init (&argc, &argv);
tablelink: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
$ # trying grep -R without passing tablelink explicitly
$ grep -R 'gtk' .
./tablelink:#include <gtk/gtk.h>
./tablelink: gtk_init (&argc, &argv);
./tablelink: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
./new/wind.c:#include <gtk/gtk.h>
./new/wind.c: gtk_init (&argc, &argv);
./new/wind.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
./buttons.c:#include <gtk/gtk.h>
./buttons.c: gtk_init (&argc, &argv);
./buttons.c: window = gtk_window_new (GTK_WINDOW_TOPLEVEL);
...
4.3 Print line numbers along with lines matching the pattern
$ grep -n 'vbox' find.c
13: GtkWidget *window, *scrolled_win, *hbox, *vbox, *find;
42: vbox = gtk_vbox_new (FALSE, 5);
43: gtk_box_pack_start (GTK_BOX (vbox), scrolled_win, TRUE, TRUE, 0);
44: gtk_box_pack_start (GTK_BOX (vbox), hbox, FALSE, TRUE, 0);
46: gtk_container_add (GTK_CONTAINER (window), vbox);
4.4 Print non-matching lines
Suppose we wish to print lines that do not contain the given pattern. The -v option inverts the selection and prints the lines that do not contain the pattern.
$ # find lines not containing the string 'gtk' in file find.c
$ grep -v 'gtk' find.c
typedef struct {
GtkWidget *entry, *textview;
} Widgets;
static void destroy (GtkWidget*, gpointer);
static gboolean delete_event (GtkWidget*, GdkEvent *, gpointer);
static void search (GtkButton *, Widgets *);
...
...
...
4.5 Print count of matches
The -c option suppresses the normal output and, instead, prints the count of matching lines for each input file. And, if you use the -v option along, you get the count of non-matching lines.
$ grep -c 'gtk' *
buttons.c:15
city-palace-jaipur.jpg:0
grep.txt:0
helloworld:0
myhello:0
grep: new: Is a directory
new:0
tablelink:19
x:0
y:0
$ # print the count of non-matching lines
$ grep -cv 'gtk' *
buttons.c:32
city-palace-jaipur.jpg:359
grep.txt:646
helloworld:0
myhello:0
grep: new: Is a directory
new:0
tablelink:39
x:10
y:32
4.6 Full word matches
The -w option causes matching of the pattern with words. A word comprises of letter(s), digit(s) and underscore. For a match, a word must either start at the beginning of the line or be preceded by a non-word character. Also, it must end at the end of the line or be succeeded by a non-word character. For example,
$ grep -w 'gtk' *
buttons.c:#include <gtk/gtk.h>
buttons.c: // gtk example
buttons.c: gtk_button_set_image (GTK_BUTTON (button), gtk_image_new_from_stock ("gtk-apply", GTK_ICON_SIZE_MENU));
buttons.c: gtk_button_set_image (GTK_BUTTON (widget), gtk_image_new_from_stock ("gtk-discard", GTK_ICON_SIZE_MENU));
grep: new: Is a directory
tablelink:#include <gtk/gtk.h>
4.7 Match full line
The -x option matches the full line. For example,
$ grep -x ' // gtk example' *
buttons.c: // gtk example
grep: new: Is a directory
4.8 Print file names having matches
Using the -l (ell) option, we can get the names of files containing matching strings. For example,
$ grep -lr 'gtk' *
buttons.c
new/wind.c
tablelink
4.9 Files containing blank lines
Suppose we want to know which files contain blank lines, the grep command would be,
$ grep -Elx '[[:space:]]*' *
city-palace-jaipur.jpg
grep.txt
grep: new: Is a directory
tablelink
4.10 Find certain files but skip some extensions
Suppose we wish to find files with names containing the string hello but want to skip the .o and .php files. We can do that with the command,
$ find . -name "*hello*" | grep -Ev '(o$)|(php$)'
./hello.c
./helloworld.cpp
./helloworld.c
./helloworld
4.11 Find all directories in the present working directory
$ cd /usr/include
$ ls -l | grep '^d'
drwxr-xr-x 2 root root 4096 Dec 6 21:51 arpa
drwxr-xr-x 2 root root 4096 Dec 6 21:51 asm-generic
drwxr-xr-x 2 root root 4096 Sep 30 17:11 avahi-client
drwxr-xr-x 2 root root 4096 Sep 30 17:11 avahi-common
drwxr-xr-x 3 root root 4096 Sep 29 00:52 c++
...
4.12 Using multiple search patterns
With the -e option, we can specify multiple search patterns. For example,
$ grep -e 'signal' -e 'container' *
buttons.c: gtk_container_set_border_width (GTK_CONTAINER (window), 25);
buttons.c: g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy),
buttons.c: /* Connect the button to the clicked signal. The callback function receives
buttons.c: g_signal_connect (G_OBJECT (button), "clicked",
buttons.c: gtk_container_add (GTK_CONTAINER (window), button);
...
We can, alternatively, put patterns in a file, keeping one pattern per line. Then we can use the grep command with -f option.
$ cat ../pat
signal
container
init
$ grep -f ../pat *
buttons.c: gtk_init (&argc, &argv);
buttons.c: gtk_container_set_border_width (GTK_CONTAINER (window), 25);
buttons.c: g_signal_connect (G_OBJECT (window), "destroy", G_CALLBACK (destroy),
...
0 comments:
Post a Comment