The tr command is a filter which reads the standard input, translates or deletes characters and writes on its standard output. The tr command syntax is,
tr [OPTION]... SET1 [SET2]
tr transliterates characters from SET1 into corresponding characters of SET2 in input and writes resulting text on the standard output. For example, to convert lowercase to uppercase and vice-versa,
$ cat names
Alan Bloggs
Erika Mustermann
James Bond
Jane Doe
Jimmy Fernandes
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe
Tommy Atkins
$ # Convert lowercase to uppercase
$ tr 'a-z' 'A-Z' < names
ALAN BLOGGS
ERIKA MUSTERMANN
JAMES BOND
JANE DOE
JIMMY FERNANDES
JOE BLOGGS
JOHN DOE
JOHN ROE
MAX MUSTERMANN
RICHARD ROE
TOMMY ATKINS
$ # Convert uppercase to lowercase
$ tr 'A-Z' 'a-z' < names
alan bloggs
erika mustermann
james bond
jane doe
jimmy fernandes
joe bloggs
john doe
john roe
max mustermann
richard roe
tommy atkins
Ideally, SET1 and SET2 should be of the same size. If SET2 is smaller than SET1, the last character of SET2 is repeated as many times as necessary to make both the same size. If SET2 is larger than SET1, the excess characters of SET2are ignored.
2.0 SPECIFYING SETS
Sets are strings of characters. Each character in a set specifies itself. However, when there is a backslash (\), it indicates a sequence defining a special character. Also, there are representations that indicate character sequences.
Sequence | Description |
---|---|
\NNN | Character with octal value NNN. |
\\ | Backslash. |
\a | Bell. |
\b | Backspace. |
\f | Form feed. |
\n | Newline. |
\r | Carriage return. |
\t | Horizontal tab. |
\v | Vertical tab. |
CHAR1 - CHAR2 | Sequence of characters from CHAR1 to CHAR2, in ascending order. |
[CHAR*] | Copies of CHAR in SET2 so that the size of SET2 becomes equal to that of SET1. |
[CHAR*REPEAT] | REPEAT copies of CHAR. CHAR is considered octal if it starts with 0. |
[:alnum:] | Alphanumeric; letters and digits. |
[:alpha:] | Alphabetic: letters only. |
[:blank:] | Horizontal white space characters. |
[:cntrl:] | Control characters. |
[:digits:] | Digits, 0 - 9 |
[:graph:] | Printable characters, excluding white space characters. |
[:lower:] | All the lowercase characters. |
[:print:] | All the printable characters, including space. |
[:punct:] | All the punctuation characters. |
[:space:] | White space characters, horizontal and vertical. |
[:upper:] | All the uppercase characters. |
[:xdigit:] | All hexadecimal digits, 0-9, a-f and A-F. |
Using the above definitions, we can re-write the tr commands for changing case,
$ # change uppercase to lowercase
$ tr '[:upper:]' '[:lower:]' < names
alan bloggs
erika mustermann
james bond
jane doe
jimmy fernandes
joe bloggs
john doe
john roe
max mustermann
richard roe
tommy atkins
3.0 Delete characters
The -d option is for deleting characters specified in SET1 from the input. For example, the text files in Windows have CR-LF at the end of each line. In Linux, the text files just have an LF at the end of each line. Converting a Windows text file to Linux involves deleting CR from each line. We can do this using the tr command,
$ file 404.php
404.php: PHP script, ASCII text, with CRLF line terminators
$ tr -d '\r' < 404.php > 404-new.php
$ file 404-new.php
404-new.php: PHP script, ASCII text
4.0 Squeeze repeated characters
With the -s option, we can replace an occurrence of a repeated character which is given in SET1 with a single occurrence of that character. For example, if the input has multiple space and blank lines, we can replace multiple spaces with a single space and delete blank lines with the tr command.
$ cat names
Alan Bloggs
Erika Mustermann
James Bond
Jane Doe
Jimmy Fernandes
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe
Tommy Atkins
$ tr -s '[:space:]' < names
Alan Bloggs
Erika Mustermann
James Bond
Jane Doe
Jimmy Fernandes
Joe Bloggs
John Doe
John Roe
Max Mustermann
Richard Roe
Tommy Atkins
5.0 Complement of SET1
With the -c option, we can ask tr to use the complement of SET1. For example, if wish to delete all the unprintable characters in a file, leaving only alphanumeric characters and space, we can execute the command,
$ sed 's/$/ /' names | tr -cd '[:print:]'
Alan Bloggs Erika Mustermann James Bond Jane Doe Jimmy Fernandes Joe Bloggs John Doe John Roe Max Mustermann Richard Roe Tommy Atkins $
We first add a space at the end of each line of file using sed. Then we pipe the output to the tr command. tr deletes all the unprintable characters, including the newlines.
0 comments:
Post a Comment