Shell script to export data from MySQL tables, modify, then import into another set of tables
For a research project I'm working on, I have a data set with about 12.5 million records, continuously being added to. I'm using a combination of horizontal and vertical database partitioning to split the data among multiple tables to achieve reasonably quick query performance that should hold as the data set grows. I also need to track additional information for each item, which I'm doing in separate tables. For this, needed to take data from two columns of each row from each of the source tables, modify and then import in the destination tables.
Specific steps were:
- Export (dump) columns filename and downloaded from each source table (filings_%)
- Edit data from the downloaded column, replacing no with nothing and replacing yes with done
- Import (load) modified data into filename and status columns into corresponding destination table (downloads_%)
The following script processed 12.5 million rows across 80 tables in under 2 min:
#!/bin/bash
# Script to export all tables matching a pattern from a MySQL DB using
# the SELECT INTO OUTFILE method, then modify contents of the export
# file, then load processed data into another set of tables. Requires
# superuser privileges to modify and remove tempoary export files in
# the MySQL data dir.
log=$(echo $(basename $0) | sed 's/.sh/.log/')
D="sec" # database
T="filings_%" # source table pattern
myopts="--skip-column-names -Be" # add username/password if not using ~/.my.cnf
err_exit()
{
echo -e 1>&2
exit 1
}
mysql_data_dir=$(mysql $myopts "show variables like 'datadir'" | awk '{sub(/\/$/,"");print$NF}')
if ! (mysqlshow $D 1>/dev/null); then
echo ERROR: unable to access database
exit 1
fi
if ! (sudo touch $mysql_data_dir/$D/test 2>/dev/null); then
echo ERROR: this script will need sudo access to
echo move exported files out of mysql data dir.
echo Come back when you get sudo, son.
exit 1
else
sudo rm $mysql_data_dir/$D/test
fi
ExportTable()
{
mysql $D $myopts "
SELECT filename,downloaded
INTO OUTFILE '${srcT}.psv'
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
FROM $srcT
"
}
ImportTable()
{
mysql $D $myopts "
DELETE FROM $dstT;
ALTER TABLE $dstT AUTO_INCREMENT = 1;
LOAD DATA INFILE '${srcT}.psv'
INTO TABLE ${dstT}
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
(filename,status);
"
}
ProcessExport()
{
sudo sed -i '
s/no//
s/yes/done/
' $mysql_data_dir/$D/${srcT}.psv
}
for srcT in $(mysql $D $myopts "SHOW TABLES LIKE '$T'"); do
echo -e "\n$(date +%F\ %T.%N) INFO: exporting from $srcT" | tee -a $log
ExportTable || err_exit
echo "$(date +%F\ %T.%N) INFO: processing export file" | tee -a $log
ProcessExport || err_exit
dstT=$(echo $srcT | sed 's/filings/downloads/')
echo "$(date +%F\ %T.%N) INFO: loading into destination table" | tee -a $log
ImportTable || err_exit
echo "$(date +%F\ %T.%N) INFO: removing export file" | tee -a $log
sudo rm $mysql_data_dir/$D/${srcT}.psv || err_exit
done
0 comments:
Post a Comment