Monday, 6 August 2018

Sed command in Linux

sed is a stream editor that accepts a list of commands and input file names. It applies the commands one by one on each line of input and writes the resulting lines on the standard output. The commands are separated by newlines.

2.0 sed Command syntax

sed [OPTIONS] 'list-of-sed-commands' filenames ...
sed is a filter. It prints each line of input on its output. More often than not, there are one or two commands that transform the input file in some way. Since it reads standard input and writes on standard output, sed is often used in command pipelines. sed commands are basically ed text editor commands.
By default, the sed command is executed for each line of input. However, it is possible to specify an address range before a command and the command is executed for the corresponding line numbers. If only one number is specified, the command is executed for that line only. For line numbers, $ matches the last line of file.

3.0 Examples

3.1 Substitution

The most common use of sed is substituting strings. It is of the form,
[m,n]s/old-string/new-string/x
if x is not given, only the first occurrence of old-string in a line is changed to new-string
if x = g, all occurrences of old-string in a line are changed to new-string
if x = p, the modified line is printed (in addition to the default printing of the line)
if x = w file, the modified line is written to file
For example, if printf is misspelled as print in a C program, we can correct it with the command,
$ sed 's/print *(/printf (/g' hello.c
The above code reads as, for each line of input, match print *(, and change all its occurrences to printf (. The * after the space takes care of zero or more spaces between print and the left parenthesis. This, of course, only takes care of the case where print and the left parenthesis are on the same line.

3.2 Append, Insert and Change lines

a \ Append text between backslash and the end of line as a line. 
i \ Insert text between backslash and the end of line as a line. 
c \ Change selected lines with text between backslash and the end of line as a line. 
For example,
$ # add Copyright notice at the top of file $ sed '1i \# Copyright (c) Tom, Dick and Harry > ' hello.c $ # add newline after each line (double space) $ sed 'a \ > ' hello.c $ # if a line contains main, change it $ sed '/main/c\int main (int argc, char **argv) > ' hello.c

3.3 Delete lines

The d command deletes a line. By default, the current line is deleted. To delete more lines, a range is selected. For example,
$ # if a line contains main, delete it $ sed '/main/d' hello.c $ # delete line numbered 1 through 4 $ sed '1,4d' hello.c

3.4 Quit

The q command quits the sed program. For example, the following scripts prints 10 lines of the file, hello.c, and quits.
$ sed '10q' hello.c
Which is just like the head command.

3.5 Read File

The file command reads a file and appends its contents to the line. For example,
$ # read justaline and append its contents after each line of hello.c $ sed 'r justaline' hello.c $ # read justaline and append its contents after line number 10 of hello.c $ sed '10r justaline' hello.c

3.6 Write line to file

The file command writes the current line to file. For example,
$ sed 'w hello.bak' hello.c # makes a copy of hello.c in hello.bak, writing each line as sed goes through hello.c $ # write first 10 lines to new $ sed '1,10w new' hello.c

3.7 sed -n

The default printing may not be required in some cases and the -n option accomplishes that. When the -n option is used, printing of lines can be done with the p command. For example, in the following command, sed prints a line if it contains the string int.
$ sed -n '/int/p' hello.c
Which is almost the grep command.

3.8 !cmd

The syntax !cmd means, execute the command, only if a line is not selected. So, in the example of para 3.7, if we wanted a list of lines not matching a pattern, we can execute,
$ # print lines in hello.c not containing the string int $ sed -n '/int/!p' hello.c
Which the like the grep -v command. Also, we can do the same thing by letting sed print all lines except those containing the pattern.
$ # print lines in hello.c not containing the string int $ sed '/int/d' hello.c

3.9 Replace characters

The command is of the form, y/string1/string2/. It replaces each character in input which is in string1 to the corresponding character in string2. The two strings must have the same length. For example,
$ # replace lowercase characters by uppercase $ sed 'y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' hello.c $ # replace tab by space $ sed 'y/\t/ /' hello.c

3.10 Read commands from file

Instead of passing commands on the command line, we can put commands in a file and invoke sed with the option, -ffile-name. For example, we can make two files, upper-to-lower and lower-to-upper with contents,
$ cat upper-to-lower # convert uppercase to lowercase y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/ $ cat lower-to-upper # convert lowercase to uppercase y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/ $ # convert file, lowercase to uppercase $ sed -f lower-to-upper hello.c $ # convert lowercase to uppercase and back to lowercase $ sed -f lower-to-upper hello.c | sed -f upper-to-lower $ sed -f lower-to-upper -f upper-to-lower hello.c # same effect as the previous command
We can pass multiple scripts with -f option and the subsequent script is appended to the previous one. So lower-to-upper converts lowercase to uppercase and upper-to-lower converts back the uppercase to lowercase.

3.11 Multiple scripts on command line

We can pass multiple scripts with the -e option. For example,
$ # find /etc/passwd entries for users alice, bob and carol $ sed -n -e '/alice/p' -e '/bob/p' -e '/carol/p' /etc/passwd alice:x:1000:1000:Alice B,,,:/home/alice:/bin/bash bob:x:1001:1001:Bob C,,,:/home/bob:/bin/bash carol:x:1002:1002:Carol D,,,:/home/carol:/bin/bash

3.12 Extended Regular Expressions

sed -r gives extended regular expressions, just like the grep E command. Suppose we have a file named expenditurecontaining a list of expenditure for three years,
Expenditure 2012 2013 2014 Advertising 200015 233912 189928 Bank charges 23029 26667 34990 Boarding and Lodging 237899 453326 356625 Communication 34556 28928 34222 Conveyance 23444 27889 43882 Insurance 43344 41992 45667 Repairs and maintenance 26609 19887 29008 Servers 34556 35662 32118 Stationery 38828 28779 39887 Sub-contracting 29988 30992 21334 Travel 891112 788277 655489 END-OF-FILE
This has a table, where each row gives an expenditure for three years. Suppose we want to convert it into a Comma Separated Values (CSV) file, where we replace spaces by a comma. The last line in the file has no spaces and is not required in the new CSV file. We can run the sed command,
$ sed -nr 's/[[:blank:]]+/,/gp' expenditure Expenditure,2012,2013,2014 Advertising,200015,233912,189928 Bank,charges,23029,26667,34990 Boarding,and,Lodging,237899,453326,356625 Communication,34556,28928,34222 Conveyance,23444,27889,43882 Insurance,43344,41992,45667 Repairs,and,maintenance,26609,19887,29008 Servers,34556,35662,32118 Stationery,38828,28779,39887 Sub-contracting,29988,30992,21334 Travel,891112,788277,655489
This is mostly OK but has a small problem. Some of the expenditure descriptions in the first column have spaces and these are also substituted by commas. So, we ask sed to substitute only two or more spaces by a comma.
$ sed -nr 's/[[:blank:]][[:blank:]]+/,/gp' expenditure Expenditure,2012,2013,2014 Advertising,200015,233912,189928 Bank charges,23029,26667,34990 Boarding and Lodging,237899,453326,356625 Communication,34556,28928,34222 Conveyance,23444,27889,43882 Insurance,43344,41992,45667 Repairs and maintenance,26609,19887,29008 Servers,34556,35662,32118 Stationery,38828,28779,39887 Sub-contracting,29988,30992,21334 Travel,891112,788277,655489

0 comments:

Post a Comment