Howto Transform TSV to CSV, or Just Remove Tabs
Unfortunately, statistics and machine learning seem to degenerate into a giant mess of getting data from multiple sources, munging it together, transforming it, and formatting the output, even before you can get to the work proper. A common problem is taking tab separate value (tsv) files, perhaps produced as the output of a mysql or postgres query, and turning them into comma separated value (csv) files.
Here’s one method, using sed and pretty standard regexp syntax:
The key bit above is this:
"s/\t/,/g"
. That says turn every tab (\t) into a comma (,). If you instead preferred to just remove tabs from the file period, you could use sed on "s/\t//g"
.
So, yet another thing I learned today: the version of
sed
that ships with MacOS, even through 10.5.7, doesn’t support special character sequences. If the above isn’t working for you, and instead is just replacing every t character in the file with a comma, then try this:
Note that to type those tabs, you’ll have to hit ctrl-v (^V). If the output isn’t
",a,"
, then you have to type literal tabs in your sed
command. The \t
works under reasonable versions of linux; you’ll have to use literal tabs under OS X. Bleh.
0 comments:
Post a Comment