Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables
A shell wrapper script that uses awk to convert a delimited file (where delimiter can be any character) to HTML tables.
Example 1 -- simple comma delimited file
Simple comma separated file "test.csv" containing:
abc,efg,hij
klm,nop,qrs
Running the script with just input file name as the argument:
$ csv2htm.sh test.csv
Would produce:
abc | efg | hij |
klm | nop | qrs |
Example 2 -- comma delimited with column labels
Again a comma separated file "test.csv", but with first and last rows containing column labels:
H1,H2,H3
abc,efg,hij
klm,nop,qrs
H1,H2,H3
Running the script with optional "--head" and "--foot" arguments, will surround fields from first and last lines of input file in "thead", "tfoot", and "th" HTML tags:
$ csv2htm.sh --head --foot test.csv
Result:
H1 | H2 | H3 |
---|---|---|
abc | efg | hij |
klm | nop | qrs |
H1 | H2 | H3 |
Example 3 -- tab delimited with column labels
Input file can be delimited by characters other than comma -- tab, pipe, colon, whatever. Even multiple characters, such as double tabs, as in this case:
col1 col2 col3
abc efg hij
klm nop qrs
First line contains column labels (col1, col2, col3), so in addition to specifying double tab as the delimiter, we'll add the "--head" argument also:
$ csv2htm.sh -d '\t\t' --head test.tsv
Would produce:
col1 | col2 | col3 |
---|---|---|
abc | efg | hij |
klm | nop | qrs |
And here's the script:
#!/bin/bash
usage()
{
cat < output
Script to produce HTML tables from delimited input. Delimiter can be specified
as an optional argument. If omitted, script defaults to comma.
Options:
-d Specify delimiter to look for, instead of comma.
--head Treat first line as header, enclosing in and tags.
--foot Treat last line as footer, enclosing in and tags.
Examples:
1. $(basename $0) input.csv
Above will parse file 'input.csv' with comma as the field separator and
output HTML tables to STDOUT.
2. $(basename $0) -d '|' < input.psv > output.htm
Above will parse file "input.psv", looking for the pipe character as the
delimiter, then output results to "output.htm".
3. $(basename $0) -d '\t' --head --foot < input.tsv > output.htm
Above will parse file "input.tsv", looking for tab as the delimiter, then
process first and last lines as header/footer (that contain data labels), then
write output to "output.htm".
EOF
}
while true; do
case "$1" in
-d)
shift
d="$1"
;;
--foot)
foot="-v ftr=1"
;;
--help)
usage
exit 0
;;
--head)
head="-v hdr=1"
;;
-*)
echo "ERROR: unknown option '$1'"
echo "see '--help' for usage"
exit 1
;;
*)
f=$1
break
;;
esac
shift
done
if [ -z "$d" ]; then
d=","
fi
if [ -z "$f" ]; then
echo "ERROR: input file is required"
echo "see '--help' for usage"
exit 1
fi
if ! [ -f "$f" ]; then
echo "ERROR: input file '$f' is not readable"
exit 1
else
data=$(sed '/^$/d' $f)
last=$(wc -l <<< "$data")
fi
awk -F "$d" -v last=$last $head $foot '
BEGIN {
print " "
}
{
gsub(//, "\\>")
if(NR == 1 && hdr) {
printf " \n"
gsub(/&/, "\\>") }
if(NR == last && ftr) {
printf " \n"
}
print " "
for(f = 1; f <= NF; f++) {
if((NR == 1 && hdr) || (NR == last && ftr)) {
printf " \n", $f
}
else printf " \n", $f
}
print " "
if(NR == 1 && hdr) {
printf " \n"
}
if(NR == last && ftr) {
printf " \n"
}
}
END {
print "
%s %s
"
}
' <<< "$data"
0 comments:
Post a Comment