Export CSV files via MySQL
As part of an AI project, I had to export data to train a tool developed for a customer. Instead of writing a script, I decided to use MySQL's built-in functions, which turned out to be a challenging experience but a valuable learning opportunity.
Since I knew the basic syntax of how to run an export, I gave this command a try:
SELECT *
FROM sales_order
WHERE status = 'exported'
INTO OUTFILE '/tmp/export.csv';
And instantly, MySQL returned the following error: "ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement"
How to find out how secure_file_priv
is configured? Run this command in the MySQL client:
SHOW VARIABLES LIKE "secure_file_priv";
The output revealed that /var/lib/mysql-files/
was the configured directory. Great. Since I run the database in a Docker container, how can I access the exported files? Let's mount the directory to some host volume, e.g. like this in a docker-compose.yaml
file:
version: "3.7"
services:
db:
image: percona:5.7
ports:
- "3306:3306"
env_file: env/develop.env
volumes:
- ./mysql-files:/var/lib/mysql-files/
After restarting the db
container and re-running the command, I found an export.csv
file in the local ./mysql-files
. It seems like we're making good progress.
I examined the exported file and realized a few problems that needed to be fixed:
- The CSV file header is missing. It is not clear which column is which.
- Technically, the file is not a CSV file. The fields are separated by tabs. Ideally, we can change that to something closer to a CSV file.
- Null values are represented as "\N" instead of null. Look for a way to change that.
Let's tackle each of the findings one by one.
- How to add a CSV file header row with proper column names? Since MYSQL does not seem to support that natively, I followed some StackOverflow guidance:
SELECT 'field 1', 'field 2', 'field 3'
UNION ALL
SELECT field1, field2, field3
FROM sales_order
WHERE status = 'exported'
INTO OUTFILE '/tmp/export.csv';
Using the UNION ALL
operation, we prefix the data with the static strings from the first SQL statement. This is not an ideal solution, especially if you have many columns to select, but it worked quite well.
- Can we configure the structure of the OUTFILE? Yes, we can.
SELECT 'field 1', 'field 2', 'field 3'
UNION ALL
SELECT field1, field2, field3
FROM sales_order
WHERE status = 'exported'
INTO OUTFILE '/tmp/export.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n';
This is not the best solution as it will "just" enclose the values with quotes and separate fields with a comma. The export logic isn't clever enough to escape quotes within the field values. That was not an issue in my case, but it could hit you hard.
- How can "\N" values be converted to "null"?
The easiest "fix" I could come up with was using sed to manipulate the exported file:
sed -i 's/\\N/null/g' export.csv
In the end, I might have been faster writing a quick-and-dirty script to export the data, but at least I learned a lot. Again.