Monday, 6 August 2018

tr Command in Linux

The tr command is a filter which reads the standard input, translates or deletes characters and writes on its standard output. The tr command syntax is,

tr [OPTION]... SET1 [SET2]
tr transliterates characters from SET1 into corresponding characters of SET2 in input and writes resulting text on the standard output. For example, to convert lowercase to uppercase and vice-versa,
$ cat names Alan Bloggs Erika Mustermann James Bond Jane Doe Jimmy Fernandes Joe Bloggs John Doe John Roe Max Mustermann Richard Roe Tommy Atkins $ # Convert lowercase to uppercase $ tr 'a-z' 'A-Z' < names ALAN BLOGGS ERIKA MUSTERMANN JAMES BOND JANE DOE JIMMY FERNANDES JOE BLOGGS JOHN DOE JOHN ROE MAX MUSTERMANN RICHARD ROE TOMMY ATKINS $ # Convert uppercase to lowercase $ tr 'A-Z' 'a-z' < names alan bloggs erika mustermann james bond jane doe jimmy fernandes joe bloggs john doe john roe max mustermann richard roe tommy atkins
Ideally, SET1 and SET2 should be of the same size. If SET2 is smaller than SET1, the last character of SET2 is repeated as many times as necessary to make both the same size. If SET2 is larger than SET1, the excess characters of SET2are ignored.

2.0 SPECIFYING SETS

Sets are strings of characters. Each character in a set specifies itself. However, when there is a backslash (\), it indicates a sequence defining a special character. Also, there are representations that indicate character sequences.
tr - Interpreted sequences
SequenceDescription
\NNNCharacter with octal value NNN.
\\Backslash.
\aBell.
\bBackspace.
\fForm feed.
\nNewline.
\rCarriage return.
\tHorizontal tab.
\vVertical tab.
CHAR1 - CHAR2Sequence of characters from CHAR1 to CHAR2, in ascending order.
[CHAR*]Copies of CHAR in SET2 so that the size of SET2 becomes equal to that of SET1.
[CHAR*REPEAT]REPEAT copies of CHARCHAR is considered octal if it starts with 0.
[:alnum:]Alphanumeric; letters and digits.
[:alpha:]Alphabetic: letters only.
[:blank:]Horizontal white space characters.
[:cntrl:]Control characters.
[:digits:]Digits, 0 - 9
[:graph:]Printable characters, excluding white space characters.
[:lower:]All the lowercase characters.
[:print:]All the printable characters, including space.
[:punct:]All the punctuation characters.
[:space:]White space characters, horizontal and vertical.
[:upper:]All the uppercase characters.
[:xdigit:]All hexadecimal digits, 0-9a-f and A-F.
Using the above definitions, we can re-write the tr commands for changing case,
$ # change uppercase to lowercase $ tr '[:upper:]' '[:lower:]' < names alan bloggs erika mustermann james bond jane doe jimmy fernandes joe bloggs john doe john roe max mustermann richard roe tommy atkins

3.0 Delete characters

The -d option is for deleting characters specified in SET1 from the input. For example, the text files in Windows have CR-LF at the end of each line. In Linux, the text files just have an LF at the end of each line. Converting a Windows text file to Linux involves deleting CR from each line. We can do this using the tr command,
$ file 404.php 404.php: PHP script, ASCII text, with CRLF line terminators $ tr -d '\r' < 404.php > 404-new.php $ file 404-new.php 404-new.php: PHP script, ASCII text

4.0 Squeeze repeated characters

With the -s option, we can replace an occurrence of a repeated character which is given in SET1 with a single occurrence of that character. For example, if the input has multiple space and blank lines, we can replace multiple spaces with a single space and delete blank lines with the tr command.
$ cat names Alan Bloggs Erika Mustermann James Bond Jane Doe Jimmy Fernandes Joe Bloggs John Doe John Roe Max Mustermann Richard Roe Tommy Atkins $ tr -s '[:space:]' < names Alan Bloggs Erika Mustermann James Bond Jane Doe Jimmy Fernandes Joe Bloggs John Doe John Roe Max Mustermann Richard Roe Tommy Atkins

5.0 Complement of SET1

With the -c option, we can ask tr to use the complement of SET1. For example, if wish to delete all the unprintable characters in a file, leaving only alphanumeric characters and space, we can execute the command,
$ sed 's/$/ /' names | tr -cd '[:print:]' Alan Bloggs Erika Mustermann James Bond Jane Doe Jimmy Fernandes Joe Bloggs John Doe John Roe Max Mustermann Richard Roe Tommy Atkins $
We first add a space at the end of each line of file using sed. Then we pipe the output to the tr command. tr deletes all the unprintable characters, including the newlines.

0 comments:

Post a Comment