for screening, filtering, and extracting rows and columns of text tables
Examplesprint column 3 and 4 of file 'table.txt'awk '{ print $3, $4 }' table.txt file input/output: read from file 'table.txt' and write to file 'newTable.txt' awk < table.txt '{ print $3, $4 }' > newTable.txt to use in a pipe cat table.txt | awk '{ print $3, $4 }' > newTable.txt print only columns 3 and 4 when column 3 contains "gene123" awk '$3=="gene123"{ print $3, $4}' table.txt print complete line when column 3 contains "gene123"awk '$3=="gene123"' table.txt print complete line when column 3 contains NOT "NA" (alternatively use grep, see below) awk '$3!="NA"' table.txt print lines in which the ratio between columns 4 and 5 is smaller than 0.4 awk '$4/$5<0.4' table.txt print columns separated by tab change output field separator (OFS) to tabular "\t", instead of using space by default additionally, fix also input field separator (-F) to tabular "\t", in case of empty fields awk -F"\t" '$3=="gene123"{OFS = "\t"; print $3, $4}' table.txt switch column 3 and 4 awk '{tmp=$3; $3=$4; $4=tmp}; {print }' switch only in lines where field 3 is larger than field 4 awk '$3>$4{tmp = $3; $3=$4; $4=tmp;}; {print }'
print columns in any specific order
awk '{ print $2, $1, $4, $3 }' For extracting columns from text files, see also: cut combine multiple conditions AND operator: && awk '$3=="gene01" && $4>95 { print $3, $4}' table.txt
OR operator: ||
awk '$3=="gene01" ||
Count number of columns in each line
# file separator 'FS' can be replaced by any column separator
FS="\t" or FS="|" or FS=":" ...awk '{ FS = "\t" } ; { print NF}' table.txt
Sum and mean average of numbers in column 2
awk 'BEGIN{C=0}; {C=C+$2}; END{print C "\t" C/NR}' table.csv 2345 56
For getting word frequencies, see also: uniq -c
sed '/^$/d' textfile.txt > new.txt # remove all lines containing 'NA' grep -vw NA textfile.txt > new.txt # remove all lines that contain multiple strings from the unwanted word list file grep -Fwf unwanted_word_list.txt -v textfile.txt > new.txt -w search for full word, not substring -F read pattern as plain text, not interpreting special characters as regular expressions -f read pattern from file -v keep non-matching lines (remove lines that contain the searched word) |
0 comments:
Post a Comment