Managing Data in MySQL

MySQL Admin Cookbook

,
March 2010

$29.99

99 great recipes for mastering MySQL configuration and administration

Exporting data to a simple CSV file

While databases are a great tool to store and manage your data, you sometimes need to extract some of the data from your database to use it in another tool (a spreadsheet application being the most prominent example for this). In this recipe, we will show you how to utilize the respective MySQL commands for exporting data from a given table into a fi le that can easily be imported by other programs.

Getting ready

To step through this recipe, you will need a running MySQL database server and a working installation of a SQL client (like MySQL Query Browser or the mysql command line tool). You will also need to identify a suitable export target, which has to meet the following requirements:

  • The MySQL server process must have write access to the target file
  • The target file must not exist

The export target file is located on the machine that runs your MySQL server, not on the client side!

If you do not have file access to the MySQL server, you could instead use export functions of MySQL clients like MySQL Query Browser.

In addition, a user with FILE privilege is needed (we will use an account named sample_install for the following steps; see also Chapter 8 Creating an installation user).

Finally, we need some data to export. Throughout this recipe, we will assume that the data to export is stored in a table named table1 inside the database sample. As export target, we will use the file C:/target.csv (MySQL accepts slashes instead of backslashes in Windows path expressions). This is a file on the machine that runs the MySQL server instance, so in this example MySQL is assumed to be running on a Windows machine. To access the results from the client, you have to have access to the file (for example, using a fi le share or executing the MySQL client on the same machine as the server).

How to do it...

  • Connect to the database using the sample_install account.
  • Issue the following SQL command:
  • mysql> SELECT * FROM sample.table1 INTO OUTFILE 'C:/target.csv'
    FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"' LINES
    TERMINATED BY '\r\n';

Please note that when using a backslash instead of a slash in the target file's path, you have to use C:\\target.csv (double backslash for escaping) instead.

If you do not give a path, but only a fi le name, the target fi le will be placed in the data directory of the currently selected schema of your MySQL server.

How it works...

In the previous SQL statement, a file C:/target.csv was created, which contains the content of the table sample.table1. The file contains a separate line for each row of the table, and each line is terminated by a sequence of a carriage return and a line feed character. This line ending was defined by the LINES TERMINATED BY '\r\n' portion of the command.

Each line contains the values of each column of the row. The values are separated by semicolons, as stated in the TERMINATED BY ';' clause. Every value is enclosed by a double quotation mark ("), which results from the FIELDS ENCLOSED BY '"' option.

When writing the data to the target fi le, no character conversion takes place; the data is exported using the binary character set. This should be kept in mind especially when importing tables with different character sets for some of its values.

You might wonder why we chose the semicolon instead of a comma as the field separator. This is simply because of a greatly improved Microsoft Excel compatibility (you can simply open the resulting files), without the need to import external data from the fi les. But you can, however, open these fi les in a different spreadsheet program (like OpenOffice.org Calc) as well. If you think the usage of semicolons is in contradiction to the notion of a CSV file, think of it as a Character Separated File.

The use of double quotes to enclose single values prevents problems when field values contain semicolons (or generally the field separator character). These are not recognized as field separators if they are enclosed in double quotes.

There's more...

While the previous SELECT … INTO OUTFILE statement will work well in most cases, there are some circumstances in which you still might encounter problems. The following topics will show you how to handle some of those.

Handling errors if the target fi le already exists

If you try to execute the SELECT … INTO OUTFILE statement twice, an error File 'C:/target.csv' already exists occurs. This is due to a security feature in MySQL that makes sure that you cannot overwrite existing fi les using the SELECT … INTO OUTFILE statement. This makes perfect sense if you think about the consequences. If this were not the case, you could overwrite the MySQL data files using a simple SELECT because MySQL server needs write access to its data directories. As a result, you have to choose different target files for each export (or remove old files in advance).

Unfortunately, it is not possible to use a non-constant file name (like a variable) in the SELECT … INTO OUTFILE export statement. If you wish to use different file names, for example, with a time stamp as part of the file name, you have to construct the statement inside a variable value before executing it:

 

mysql> SET @selInOutfileCmd := concat("SELECT * FROM sample.table1 INTO
OUTFILE 'C:/target-", DATE_FORMAT(now(),'%Y-%m-%d_%H%i%s'), ".csv' FIELDS
ENCLOSED BY '\"' TERMINATED BY ';' ESCAPED BY '\"' LINES TERMINATED BY
'\r\n';");
mysql> PREPARE statement FROM @selInOutfileCmd;
mysql> EXECUTE statement;

The first SET statement constructs a string, which contains a SELECT statement. While it is not allowed to use variables for statements directly, you can construct a string that contains a statement and use variables for this. With the next two lines, you prepare a statement from the string and execute it.

Handling NULL values

Without further handling, NULL values in the data you export using the previous statement would show up as "N in the resulting file. This combination is not recognized, for example, by Microsoft Excel, which breaks the file (for typical usage). To prevent this, you need to replace NULL entries by appropriate values. Assuming that the table sample.table1 consists of a numeric column a and a character column b, you should use the following statement:

mysql> SELECT IFNULL(a, 0), IFNULL(b, "NULL") FROM sample.table1 INTO
OUTFILE 'C:/target.csv' FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED
BY '"' LINES TERMINATED BY '\r\n';

The downside to this approach is that you have to list all fi elds in which a NULL value might occur.

Handling line breaks

If you try to export values that contain the same character combination used for line termination in the SELECT … INTO OUTFILE statement, MySQL will try to escape the character combination with the characters defined by the ESCAPED BY clause. However, this will not always work the way it is intended. You will typically define \r\n as the line separators. With this constellation, values that contain a simple line break \n will not cause problems, as they are exported without any conversion and can be imported to Microsoft Excel flawlessly. If your values happen to contain a combination of carriage return and line feed, the \r\n characters will be prepended with an escape character ("\r\n), but still the target file cannot be imported correctly. Therefore, you need to convert the full line breaks to simple line breaks:

mysql> SELECT a, REPLACE(b, '\r\n', '\n') FROM sample.table1 INTO OUTFILE
'C:/target.csv' FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"'
LINES TERMINATED BY '\r\n';

With this statement, you will export only line breaks \n, which are typically accepted for import by other programs.

Including headers

For better understanding, you might want to include headers in your target fi le. You can do so by using a UNION construct:

mysql> (SELECT 'Column a', 'Column b') UNION ALL (SELECT * FROM sample.
table1 INTO OUTFILE 'C:/target.csv' FIELDS ENCLOSED BY '"' TERMINATED BY
';' ESCAPED BY '"' LINES TERMINATED BY '\r\n');

The resulting file will contain an additional first line with the given headers from the first SELECT clause.

Exporting data to a custom file format

You sometimes have the task to export data in a special format in order to fulfill the requirements of the recipient of the data. In this recipe, we will show you one way to export data in a format that is beyond the possibilities of the SELECT … INTO OUTFILE format options.

In the following recipe, we will show you how to create an export file in a hypothetical format. This includes the name of the file, a time stamp, a short description of the file's content, and the number of data rows contained in the file in the first four lines. The data portion starts with a header line with names for all columns followed by the actual data rows. Every data row should start with a prefix consisting of the hash character (#), the line number, a colon, and a space. This prefix is followed by the data items separated by pipe (|) characters. Each line should end with a dollar sign ($) (and the line break, of course).

This format is used as an example, but the steps involved can be adapted to more complex file formats if necessary.

Getting ready

As in the previous recipe, we will need an account with appropriate permissions (FILE), a SQL client, and a file name for the target file. Again, we will assume an account named sample_install and we will export data from table sample.table2 (which consists of three columns c1, c2, and c3) to a fi le C:/target.txt in the format mentioned previously. We also propose to create a file customExport.sql for the SQL commands using an editor to store the SQL commands.

 

How to do it...

  • Create a new script named customExport.sql and add the following statements to it:
  • SET @filename := 'Filename: C:/target.txt';
    SET @description := 'Description: This is a test export from
    sample.table2 with columns c1, c2, and c3';
    SELECT NOW() INTO @timestamp;
    SELECT COUNT(*) FROM sample.table2 INTO @rowcount;
    SET @rows := CONCAT('Row count: ', @rowcount);
    SET @header := '#Row Nr: Column c1 | Column c2 | Column c3 $';
    SET @counter := 0;
    SELECT @filename
    UNION SELECT @description
    UNION SELECT @timestamp
    UNION SELECT @rows
    UNION SELECT @header
    UNION SELECT CONCAT('#',
    @counter := @counter + 1,
    ': ',
    CONCAT_WS(' | ', c1, c2, c3),
    ' $')
    FROM sample.table2
    INTO OUTFILE 'C:/target.txt';
  • Connect to the database using the sample_install account.
  • Execute the SQL statements from customExport.sql (as an alternative to copying the statements to your SQL client, you could also execute the statements in the file using mysql's source command)
  • The target file will look as follows:
  • Filename: C:/target.txt
    Description: This is a test export from sample.table2 with columns
    c1, c2, and c3
    2009-06-14 13:25:05
    Row count: 3
    #Row Nr: Column c1 | Column c2 | Column c3 $
    #1: 209 | Some text in my test data | Some more text $
    #2: 308 | Next test text for testing | Text to test $
    #3: 406 | The quick brown fox jumps | Really? $

How it works...

Although this solution takes some commands, they are divided into a preparation part (the first seven commands that defi ne the user variables) and the actual export command. Of course, you could minimize the number of statements by omitting the user variables, but the resulting statement would be somewhat bulky.

The preparation part simply defines some user variables for later reference. The final SELECT command consists of a UNION construct, which basically concatenates the rows that are required for the file header. The actual data from the table is prepared by the following SELECT clause:

SELECT CONCAT('#', @counter := @counter + 1, ': ', CONCAT_WS(' | ',
c1, c2, c3), ' $') FROM sample.table

The CONCAT statement concatenates its parameters; so let us have a look at the statement's parts. The clause '#', @counter := @counter + 1, ': ' forms the required line number portion of the rows. The variable @counter gets incremented for every row, which produces the intended line number. The following CONCAT_WS will also concatenate the given values, but the first parameter is used as a separator character (_WS stands for with separator). For every row, this will result in a string with the values of columns (c1, c2, and c3) separated by the pipe character. With a closing dollar sign as the final parameter, the first CONCAT is closed.

While this approach allows for the creation of rather complex file formats, it is not suitable for every situation. For advanced requirements, we encourage the use of other programming techniques beyond the SQL commands (for example, reading and processing the data using a scripting language). This holds especially true when the target file has to be in XML format.

For advanced formatting capabilities, consider exporting your data in XML format (using mysqldump --xml,) and processing the resulting file using an XSLT processor!

There's more...

Please note that using parentheses on the UNION to clarify the separation of the different SELECT statements might lead to unexpected problems: the INTO OUTFILE clause has to be attached to the last SELECT statement of a UNION construct. A statement like SELECT (…) UNION (SELECT …) INTO OUTFILE … will not work, while SELECT (…) UNION (SELECT … INTO OUTFILE) does. While this might not seem too intuitive, it is a well documented behavior and not a bug.

Importing data from a simple CSV file

A common task when working with databases is to import data from different sources. Unfortunately, this data will typically not be provided as a convenient set of well-formed SQL statements that you can simply run against your database. Therefore, here you will have to deal with data in a different format.

As a common denominator, character-separated values (CSV) are still a prevalent way of exchanging data. In this chapter, we will show you how to import data stored in CSV files. As a typical example, we will use the file format Microsoft Excel produces when storing files using the *.CSV file type.

This recipe is the counterpart of the Exporting data to a simple CSV file recipe in this chapter.

Getting ready

To step through this recipe, we will definitely need a file to import (here: C:/source.csv) and a table to import the data into (here: sample.table1). The source file and target table have to have a matching format concerning the number of columns and the type of data stored in them. Furthermore, an account with INSERT and FILE privileges is required; we will assume an account sample_install in this recipe.

The source fi le has to be located on the machine that runs your MySQL server, not on the client side!

How to do it...

  • Connect to the database using the sample_install account.
  • Issue the following SQL command:
  • mysql> LOAD DATA INFILE 'C:/source.csv' INTO TABLE sample.table1
    FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"' LINES
    TERMINATED BY '\r\n';
    Query OK, 20 rows affected (0.06 sec)
    Records: 20 Deleted: 0 Skipped: 0 Warnings: 0

How it works...

The LOAD DATA INFILE command works analogous to the SELECT … INTO OUTFILE command discussed in the previous recipes, but as a means for importing data rather than exporting. The format options available for both commands are identical, so you can typically import data exported by a SELECT … INTO OUTFILE statement using a LOAD DATA INFILE command with the same format options.

As most files consist of lines terminated by a sequence of a carriage return and a line feed character, we use the LINES TERMINATED BY '\r\n' option. The choice of the semicolon character as a separator for different fields of every line (TERMINATED BY ';')—is mainly due to the fact that Excel uses this format. If you happen to receive CSV files that, for example, use a comma instead, you have to adjust this accordingly.

The term FIELDS ENCLOSED BY '"' tells the import to look for double quotes at the start of every field imported. If there is one, the fi eld is considered to end at the next double quote. To be able to have double quotes inside a fi eld value, we define an escape character (ESCAPED BY '"'). With this constellation, a sequence of two double quotes is not treated as the end of the field, but as a double-quote character as part of the value.

There's more...

The data is read from the file using the default character set of the database. If the file uses a different character encoding, you can specify this by adding a CHARACTER SET clause after the table definition (LOAD DATA INFILE … INTO TABLE sample.table1 CHARACTER SET utf8;). Please note that the character sets ucs2, utf16, and utf32 are not supported (as of MySQL version 5.1.35).

 

Importing data from custom file formats

In the previous recipe Importing data from a simple CSV file, we discussed a way of importing data from a nicely formatted file. Unfortunately, you sometimes have to deal with far less convenient data sources. In this recipe, we will present some more advanced topics of importing data from files with a less strict structure.

Obviously, it is not possible to present a universal recipe for every file format imaginable, so we will use an example that covers some of the common problems one has to tackle when importing data from custom files. For this, we will refer to the same hypothetical format as in Export data to a custom file format, which defines four initial lines (containing name of the file, a time stamp, a description, and the number of rows), a header line with the name of the columns, and subsequently the rows with the actual data to import. Each data row starts with a hash character (#), the line number, a colon, and a space. The data values that follow the row number are separated by a pipe (|) character and the row closes with a dollar sign ($).

Getting ready

Again, the account used in the recipe needs the FILE privilege (besides the INSERT permission for the table the data should be imported into). With a SQL client, a file with the appropriate format, and a table as the import target, we are ready to go. As in previous recipes, we use sample_install as the account name, C:/source.txt as the source file, and sample.table2 (consisting of three columns c1, c2, and c3) as the target table. We assume the source file to have the following content:

Filename: C:/source.txt
Description: This is a file for test import to sample.table2, columns
c1, c2, and c3
2009-06-14 13:25:05
Row count: 3
#Row Nr: Column c1 | Column c2 | Column c3 $
#1: 209 | Some text in my test data | Some more text $
#2: 308 | Next test text for testing | Text to test $
#3: 406 | "A water | pipe" | Really? $

How to do it...

  1. Connect your favorite client (for example, the mysql command-line client) to your MySQL server using the sample_install account.
  2. Execute the following SQL command:
    mysql> LOAD DATA INFILE "C:/source.txt"
    -> INTO TABLE sample.table2
    -> FIELDS TERMINATED BY ' | '
    -> OPTIONALLY ENCLOSED BY '"'
    -> LINES STARTING BY ':'
    -> TERMINATED BY '\r\n'
    -> IGNORE 5 LINES
    -> SET c3=TRIM(TRAILING ' $' FROM c3);
    Query OK, 3 rows affected (0.05 sec)
    Records: 3 Deleted: 0 Skipped: 0 Warnings: 0

How it works...

Let us dissect the above statement by having a look at the source file: first of all, we want to import data from the file C:\source.txt into the table sample.table2, which is represented by the first two lines (LOAD DATA INFILE … INTO TABLE …).

At the top of the file, we have five lines (the initial four lines with information about the file plus the header) that should not be imported into the target table. To achieve this, the IGNORE 5 lines option is added.

The remaining lines are prefixed with a hash character, the row number, and a colon. This part of every line has to be ignored, which is what the LINES STARTING BY ':' option does: it tells MySQL to ignore the first colon of the line and any character before it. By doing this, the row number prefix is skipped.

After the prefix, the lines contain the actual values, separated by pipe characters. The FIELDS TERMINATED BY ' | ' option tells MySQL how to identify a field separator. With the additional setting OPTIONALLY ENCLOSED BY '"', the value itself might contain this field separator sequence—if the whole value is enclosed by double quotes (this is the case in the last row of the sample file).

At this point, there is only one problem left: the lines end with a dollar sign, which is not part of the last value. An intuitive approach would be to include this character in the line termination sequence, which means to use $\r\n as a line ending (instead of \r\n). Unfortunately, this definition of a line end does not work as expected for our example, as it would break the interpretation of the first five lines, which are not terminated the same way. As a result, the first six lines would be considered as one single line by the import because only the sixth line actually ends with a character sequence of $\r\n. To be able to explicitly exclude the header lines from the import, we have to rely on the "traditional" line ending defined by the [LINES] TERMINATED BY '\r\n' option.

Hence, the options for defining the field separators, and the beginning and termination of a line do not allow us to get rid of the closing dollar sign. Thus it is considered part of the last value, which is assigned to column c3. To finally get rid of this postfix, the SET clause of the LOAD DATA INFILE command comes in handy, which allows to clearly define the values that are assigned to the columns in the target table. The closing option SET c3=TRIM(TRAILING ' $' FROM c3); defines a way to strip the unwanted postfix from the last field.

If we put it all together, the import works as intended:

There's more...

As with exporting data, it is recommended to consider using an external programming language to import more complex data structures into MySQL. While it is possible to import rather sophisticated file formats using MySQL commands as well, it is often far more efficient to have a full-blown programming language at hand to solve the task of parsing input files. This is most notably the case when it comes to XML files.

For importing data from XML files, consider using an XSLT processor to produce corresponding SQL commands!

Summary

In this article, we discussed the following:

  • Exporting data to a simple CSV file
  • Exporting data to a custom file format
  • Importing data from a simple CSV file
  • Importing data from custom file formats

Books to Consider

comments powered by Disqus
X

An Introduction to 3D Printing

Explore the future of manufacturing and design  - read our guide to 3d printing for free