Monday, 23 July 2018

Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables

Bash and awk to convert delimited data (csv, tsv, etc) to HTML tables
A shell wrapper script that uses awk to convert a delimited file (where delimiter can be any character) to HTML tables.

Example 1 -- simple comma delimited file

Simple comma separated file "test.csv" containing:
abc,efg,hij
klm,nop,qrs
Running the script with just input file name as the argument:
$ csv2htm.sh test.csv
Would produce:
  
abcefghij
klmnopqrs

Example 2 -- comma delimited with column labels

Again a comma separated file "test.csv", but with first and last rows containing column labels:
H1,H2,H3
abc,efg,hij
klm,nop,qrs
H1,H2,H3
Running the script with optional "--head" and "--foot" arguments, will surround fields from first and last lines of input file in "thead", "tfoot", and "th" HTML tags:
$ csv2htm.sh --head --foot test.csv
Result:
  
H1H2H3
abcefghij
klmnopqrs
H1H2H3

Example 3 -- tab delimited with column labels

Input file can be delimited by characters other than comma -- tab, pipe, colon, whatever. Even multiple characters, such as double tabs, as in this case:
col1  col2  col3
abc  efg  hij
klm  nop  qrs
First line contains column labels (col1, col2, col3), so in addition to specifying double tab as the delimiter, we'll add the "--head" argument also:
$ csv2htm.sh -d '\t\t' --head test.tsv
Would produce:
  
col1col2col3
abcefghij
klmnopqrs
And here's the script:
#!/bin/bash

usage()
{
cat < output

Script to produce HTML tables from delimited input. Delimiter can be specified
as an optional argument. If omitted, script defaults to comma.

Options:

  -d       Specify delimiter to look for, instead of comma.

  --head   Treat first line as header, enclosing in  and  tags.

  --foot   Treat last line as footer, enclosing in  and  tags. 

Examples:

  1. $(basename $0) input.csv

  Above will parse file 'input.csv' with comma as the field separator and
  output HTML tables to STDOUT.

  2. $(basename $0) -d '|' < input.psv > output.htm

  Above will parse file "input.psv", looking for the pipe character as the
  delimiter, then output results to "output.htm".

  3. $(basename $0) -d '\t' --head --foot < input.tsv > output.htm

  Above will parse file "input.tsv", looking for tab as the delimiter, then
  process first and last lines as header/footer (that contain data labels), then
  write output to "output.htm".

EOF
}

while true; do
  case "$1" in
    -d)
      shift
      d="$1"
      ;;
    --foot)
      foot="-v ftr=1"
      ;;
    --help)
      usage
      exit 0
      ;;
    --head)
      head="-v hdr=1"
      ;;
    -*)
      echo "ERROR: unknown option '$1'"
      echo "see '--help' for usage"
      exit 1
      ;;
    *)
      f=$1
      break
      ;;
  esac
  shift
done

if [ -z "$d" ]; then
  d=","
fi

if [ -z "$f" ]; then
  echo "ERROR: input file is required"
  echo "see '--help' for usage"
  exit 1
fi

if ! [ -f "$f" ]; then
  echo "ERROR: input file '$f' is not readable"
  exit 1
else
  data=$(sed '/^$/d' $f)
  last=$(wc -l <<< "$data")
fi

awk -F "$d" -v last=$last $head $foot '
  BEGIN {
    print "  "
  }       
  {
    gsub(//, "\\>")
    if(NR == 1 && hdr) {  
      printf "    \n"
    gsub(/&/, "\\>")    }
    if(NR == last && ftr) {  
      printf "    \n"
    }
    print "      "
    for(f = 1; f <= NF; f++)  {
      if((NR == 1 && hdr) || (NR == last && ftr)) {
        printf "        \n", $f
      }
      else printf "        \n", $f
    }     
    print "      "
    if(NR == 1 && hdr) {
      printf "    \n"
    }
    if(NR == last && ftr) {
      printf "    \n"
    }
  }       
  END {
    print "  
%s%s
" } ' <<< "$data"

0 comments:

Post a Comment