Managing Data in MySQL

Exclusive offer: get 50% off this eBook here
MySQL Admin Cookbook

MySQL Admin Cookbook — Save 50%

99 great recipes for mastering MySQL configuration and administration

$29.99    $15.00
by Daniel Schneller Udo Schwedt | March 2010 | MySQL

In this article by Daniel Schneller and Udo Schwedt, authors of MySQL Admin Cookbook, we will discuss the following:

  • Exporting data to a simple CSV file
  • Exporting data to a custom file format
  • Importing data from a simple CSV file
  • Importing data from custom file formats

Exporting data to a simple CSV file

While databases are a great tool to store and manage your data, you sometimes need to extract some of the data from your database to use it in another tool (a spreadsheet application being the most prominent example for this). In this recipe, we will show you how to utilize the respective MySQL commands for exporting data from a given table into a fi le that can easily be imported by other programs.

Getting ready

To step through this recipe, you will need a running MySQL database server and a working installation of a SQL client (like MySQL Query Browser or the mysql command line tool). You will also need to identify a suitable export target, which has to meet the following requirements:

  • The MySQL server process must have write access to the target file
  • The target file must not exist

The export target file is located on the machine that runs your MySQL server, not on the client side!

If you do not have file access to the MySQL server, you could instead use export functions of MySQL clients like MySQL Query Browser.

In addition, a user with FILE privilege is needed (we will use an account named sample_install for the following steps; see also Chapter 8 Creating an installation user).

Finally, we need some data to export. Throughout this recipe, we will assume that the data to export is stored in a table named table1 inside the database sample. As export target, we will use the file C:/target.csv (MySQL accepts slashes instead of backslashes in Windows path expressions). This is a file on the machine that runs the MySQL server instance, so in this example MySQL is assumed to be running on a Windows machine. To access the results from the client, you have to have access to the file (for example, using a fi le share or executing the MySQL client on the same machine as the server).

How to do it...

  • Connect to the database using the sample_install account.
  • Issue the following SQL command:
  • mysql> SELECT * FROM sample.table1 INTO OUTFILE 'C:/target.csv'
    FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"' LINES
    TERMINATED BY '\r\n';

Please note that when using a backslash instead of a slash in the target file's path, you have to use C:\\target.csv (double backslash for escaping) instead.

If you do not give a path, but only a fi le name, the target fi le will be placed in the data directory of the currently selected schema of your MySQL server.

How it works...

In the previous SQL statement, a file C:/target.csv was created, which contains the content of the table sample.table1. The file contains a separate line for each row of the table, and each line is terminated by a sequence of a carriage return and a line feed character. This line ending was defined by the LINES TERMINATED BY '\r\n' portion of the command.

Each line contains the values of each column of the row. The values are separated by semicolons, as stated in the TERMINATED BY ';' clause. Every value is enclosed by a double quotation mark ("), which results from the FIELDS ENCLOSED BY '"' option.

When writing the data to the target fi le, no character conversion takes place; the data is exported using the binary character set. This should be kept in mind especially when importing tables with different character sets for some of its values.

You might wonder why we chose the semicolon instead of a comma as the field separator. This is simply because of a greatly improved Microsoft Excel compatibility (you can simply open the resulting files), without the need to import external data from the fi les. But you can, however, open these fi les in a different spreadsheet program (like OpenOffice.org Calc) as well. If you think the usage of semicolons is in contradiction to the notion of a CSV file, think of it as a Character Separated File.

The use of double quotes to enclose single values prevents problems when field values contain semicolons (or generally the field separator character). These are not recognized as field separators if they are enclosed in double quotes.

There's more...

While the previous SELECT … INTO OUTFILE statement will work well in most cases, there are some circumstances in which you still might encounter problems. The following topics will show you how to handle some of those.

Handling errors if the target fi le already exists

If you try to execute the SELECT … INTO OUTFILE statement twice, an error File 'C:/target.csv' already exists occurs. This is due to a security feature in MySQL that makes sure that you cannot overwrite existing fi les using the SELECT … INTO OUTFILE statement. This makes perfect sense if you think about the consequences. If this were not the case, you could overwrite the MySQL data files using a simple SELECT because MySQL server needs write access to its data directories. As a result, you have to choose different target files for each export (or remove old files in advance).

Unfortunately, it is not possible to use a non-constant file name (like a variable) in the SELECT … INTO OUTFILE export statement. If you wish to use different file names, for example, with a time stamp as part of the file name, you have to construct the statement inside a variable value before executing it:

 

mysql> SET @selInOutfileCmd := concat("SELECT * FROM sample.table1 INTO
OUTFILE 'C:/target-", DATE_FORMAT(now(),'%Y-%m-%d_%H%i%s'), ".csv' FIELDS
ENCLOSED BY '\"' TERMINATED BY ';' ESCAPED BY '\"' LINES TERMINATED BY
'\r\n';");
mysql> PREPARE statement FROM @selInOutfileCmd;
mysql> EXECUTE statement;

The first SET statement constructs a string, which contains a SELECT statement. While it is not allowed to use variables for statements directly, you can construct a string that contains a statement and use variables for this. With the next two lines, you prepare a statement from the string and execute it.

Handling NULL values

Without further handling, NULL values in the data you export using the previous statement would show up as "N in the resulting file. This combination is not recognized, for example, by Microsoft Excel, which breaks the file (for typical usage). To prevent this, you need to replace NULL entries by appropriate values. Assuming that the table sample.table1 consists of a numeric column a and a character column b, you should use the following statement:

mysql> SELECT IFNULL(a, 0), IFNULL(b, "NULL") FROM sample.table1 INTO
OUTFILE 'C:/target.csv' FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED
BY '"' LINES TERMINATED BY '\r\n';

The downside to this approach is that you have to list all fi elds in which a NULL value might occur.

Handling line breaks

If you try to export values that contain the same character combination used for line termination in the SELECT … INTO OUTFILE statement, MySQL will try to escape the character combination with the characters defined by the ESCAPED BY clause. However, this will not always work the way it is intended. You will typically define \r\n as the line separators. With this constellation, values that contain a simple line break \n will not cause problems, as they are exported without any conversion and can be imported to Microsoft Excel flawlessly. If your values happen to contain a combination of carriage return and line feed, the \r\n characters will be prepended with an escape character ("\r\n), but still the target file cannot be imported correctly. Therefore, you need to convert the full line breaks to simple line breaks:

mysql> SELECT a, REPLACE(b, '\r\n', '\n') FROM sample.table1 INTO OUTFILE
'C:/target.csv' FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"'
LINES TERMINATED BY '\r\n';

With this statement, you will export only line breaks \n, which are typically accepted for import by other programs.

Including headers

For better understanding, you might want to include headers in your target fi le. You can do so by using a UNION construct:

mysql> (SELECT 'Column a', 'Column b') UNION ALL (SELECT * FROM sample.
table1 INTO OUTFILE 'C:/target.csv' FIELDS ENCLOSED BY '"' TERMINATED BY
';' ESCAPED BY '"' LINES TERMINATED BY '\r\n');

The resulting file will contain an additional first line with the given headers from the first SELECT clause.

MySQL Admin Cookbook 99 great recipes for mastering MySQL configuration and administration
Published: March 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

Exporting data to a custom file format

You sometimes have the task to export data in a special format in order to fulfill the requirements of the recipient of the data. In this recipe, we will show you one way to export data in a format that is beyond the possibilities of the SELECT … INTO OUTFILE format options.

In the following recipe, we will show you how to create an export file in a hypothetical format. This includes the name of the file, a time stamp, a short description of the file's content, and the number of data rows contained in the file in the first four lines. The data portion starts with a header line with names for all columns followed by the actual data rows. Every data row should start with a prefix consisting of the hash character (#), the line number, a colon, and a space. This prefix is followed by the data items separated by pipe (|) characters. Each line should end with a dollar sign ($) (and the line break, of course).

This format is used as an example, but the steps involved can be adapted to more complex file formats if necessary.

Getting ready

As in the previous recipe, we will need an account with appropriate permissions (FILE), a SQL client, and a file name for the target file. Again, we will assume an account named sample_install and we will export data from table sample.table2 (which consists of three columns c1, c2, and c3) to a fi le C:/target.txt in the format mentioned previously. We also propose to create a file customExport.sql for the SQL commands using an editor to store the SQL commands.

 

How to do it...

  • Create a new script named customExport.sql and add the following statements to it:
  • SET @filename := 'Filename: C:/target.txt';
    SET @description := 'Description: This is a test export from
    sample.table2 with columns c1, c2, and c3';
    SELECT NOW() INTO @timestamp;
    SELECT COUNT(*) FROM sample.table2 INTO @rowcount;
    SET @rows := CONCAT('Row count: ', @rowcount);
    SET @header := '#Row Nr: Column c1 | Column c2 | Column c3 $';
    SET @counter := 0;
    SELECT @filename
    UNION SELECT @description
    UNION SELECT @timestamp
    UNION SELECT @rows
    UNION SELECT @header
    UNION SELECT CONCAT('#',
    @counter := @counter + 1,
    ': ',
    CONCAT_WS(' | ', c1, c2, c3),
    ' $')
    FROM sample.table2
    INTO OUTFILE 'C:/target.txt';
  • Connect to the database using the sample_install account.
  • Execute the SQL statements from customExport.sql (as an alternative to copying the statements to your SQL client, you could also execute the statements in the file using mysql's source command)
  • The target file will look as follows:
  • Filename: C:/target.txt
    Description: This is a test export from sample.table2 with columns
    c1, c2, and c3
    2009-06-14 13:25:05
    Row count: 3
    #Row Nr: Column c1 | Column c2 | Column c3 $
    #1: 209 | Some text in my test data | Some more text $
    #2: 308 | Next test text for testing | Text to test $
    #3: 406 | The quick brown fox jumps | Really? $

How it works...

Although this solution takes some commands, they are divided into a preparation part (the first seven commands that defi ne the user variables) and the actual export command. Of course, you could minimize the number of statements by omitting the user variables, but the resulting statement would be somewhat bulky.

The preparation part simply defines some user variables for later reference. The final SELECT command consists of a UNION construct, which basically concatenates the rows that are required for the file header. The actual data from the table is prepared by the following SELECT clause:

SELECT CONCAT('#', @counter := @counter + 1, ': ', CONCAT_WS(' | ',
c1, c2, c3), ' $') FROM sample.table

The CONCAT statement concatenates its parameters; so let us have a look at the statement's parts. The clause '#', @counter := @counter + 1, ': ' forms the required line number portion of the rows. The variable @counter gets incremented for every row, which produces the intended line number. The following CONCAT_WS will also concatenate the given values, but the first parameter is used as a separator character (_WS stands for with separator). For every row, this will result in a string with the values of columns (c1, c2, and c3) separated by the pipe character. With a closing dollar sign as the final parameter, the first CONCAT is closed.

While this approach allows for the creation of rather complex file formats, it is not suitable for every situation. For advanced requirements, we encourage the use of other programming techniques beyond the SQL commands (for example, reading and processing the data using a scripting language). This holds especially true when the target file has to be in XML format.

For advanced formatting capabilities, consider exporting your data in XML format (using mysqldump --xml,) and processing the resulting file using an XSLT processor!

There's more...

Please note that using parentheses on the UNION to clarify the separation of the different SELECT statements might lead to unexpected problems: the INTO OUTFILE clause has to be attached to the last SELECT statement of a UNION construct. A statement like SELECT (…) UNION (SELECT …) INTO OUTFILE … will not work, while SELECT (…) UNION (SELECT … INTO OUTFILE) does. While this might not seem too intuitive, it is a well documented behavior and not a bug.

Importing data from a simple CSV file

A common task when working with databases is to import data from different sources. Unfortunately, this data will typically not be provided as a convenient set of well-formed SQL statements that you can simply run against your database. Therefore, here you will have to deal with data in a different format.

As a common denominator, character-separated values (CSV) are still a prevalent way of exchanging data. In this chapter, we will show you how to import data stored in CSV files. As a typical example, we will use the file format Microsoft Excel produces when storing files using the *.CSV file type.

This recipe is the counterpart of the Exporting data to a simple CSV file recipe in this chapter.

Getting ready

To step through this recipe, we will definitely need a file to import (here: C:/source.csv) and a table to import the data into (here: sample.table1). The source file and target table have to have a matching format concerning the number of columns and the type of data stored in them. Furthermore, an account with INSERT and FILE privileges is required; we will assume an account sample_install in this recipe.

The source fi le has to be located on the machine that runs your MySQL server, not on the client side!

How to do it...

  • Connect to the database using the sample_install account.
  • Issue the following SQL command:
  • mysql> LOAD DATA INFILE 'C:/source.csv' INTO TABLE sample.table1
    FIELDS ENCLOSED BY '"' TERMINATED BY ';' ESCAPED BY '"' LINES
    TERMINATED BY '\r\n';
    Query OK, 20 rows affected (0.06 sec)
    Records: 20 Deleted: 0 Skipped: 0 Warnings: 0

How it works...

The LOAD DATA INFILE command works analogous to the SELECT … INTO OUTFILE command discussed in the previous recipes, but as a means for importing data rather than exporting. The format options available for both commands are identical, so you can typically import data exported by a SELECT … INTO OUTFILE statement using a LOAD DATA INFILE command with the same format options.

As most files consist of lines terminated by a sequence of a carriage return and a line feed character, we use the LINES TERMINATED BY '\r\n' option. The choice of the semicolon character as a separator for different fields of every line (TERMINATED BY ';')—is mainly due to the fact that Excel uses this format. If you happen to receive CSV files that, for example, use a comma instead, you have to adjust this accordingly.

The term FIELDS ENCLOSED BY '"' tells the import to look for double quotes at the start of every field imported. If there is one, the fi eld is considered to end at the next double quote. To be able to have double quotes inside a fi eld value, we define an escape character (ESCAPED BY '"'). With this constellation, a sequence of two double quotes is not treated as the end of the field, but as a double-quote character as part of the value.

There's more...

The data is read from the file using the default character set of the database. If the file uses a different character encoding, you can specify this by adding a CHARACTER SET clause after the table definition (LOAD DATA INFILE … INTO TABLE sample.table1 CHARACTER SET utf8;). Please note that the character sets ucs2, utf16, and utf32 are not supported (as of MySQL version 5.1.35).

 

MySQL Admin Cookbook 99 great recipes for mastering MySQL configuration and administration
Published: March 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

Importing data from custom file formats

In the previous recipe Importing data from a simple CSV file, we discussed a way of importing data from a nicely formatted file. Unfortunately, you sometimes have to deal with far less convenient data sources. In this recipe, we will present some more advanced topics of importing data from files with a less strict structure.

Obviously, it is not possible to present a universal recipe for every file format imaginable, so we will use an example that covers some of the common problems one has to tackle when importing data from custom files. For this, we will refer to the same hypothetical format as in Export data to a custom file format, which defines four initial lines (containing name of the file, a time stamp, a description, and the number of rows), a header line with the name of the columns, and subsequently the rows with the actual data to import. Each data row starts with a hash character (#), the line number, a colon, and a space. The data values that follow the row number are separated by a pipe (|) character and the row closes with a dollar sign ($).

Getting ready

Again, the account used in the recipe needs the FILE privilege (besides the INSERT permission for the table the data should be imported into). With a SQL client, a file with the appropriate format, and a table as the import target, we are ready to go. As in previous recipes, we use sample_install as the account name, C:/source.txt as the source file, and sample.table2 (consisting of three columns c1, c2, and c3) as the target table. We assume the source file to have the following content:

Filename: C:/source.txt
Description: This is a file for test import to sample.table2, columns
c1, c2, and c3
2009-06-14 13:25:05
Row count: 3
#Row Nr: Column c1 | Column c2 | Column c3 $
#1: 209 | Some text in my test data | Some more text $
#2: 308 | Next test text for testing | Text to test $
#3: 406 | "A water | pipe" | Really? $

How to do it...

  1. Connect your favorite client (for example, the mysql command-line client) to your MySQL server using the sample_install account.
  2. Execute the following SQL command:
    mysql> LOAD DATA INFILE "C:/source.txt"
    -> INTO TABLE sample.table2
    -> FIELDS TERMINATED BY ' | '
    -> OPTIONALLY ENCLOSED BY '"'
    -> LINES STARTING BY ':'
    -> TERMINATED BY '\r\n'
    -> IGNORE 5 LINES
    -> SET c3=TRIM(TRAILING ' $' FROM c3);
    Query OK, 3 rows affected (0.05 sec)
    Records: 3 Deleted: 0 Skipped: 0 Warnings: 0

How it works...

Let us dissect the above statement by having a look at the source file: first of all, we want to import data from the file C:\source.txt into the table sample.table2, which is represented by the first two lines (LOAD DATA INFILE … INTO TABLE …).

At the top of the file, we have five lines (the initial four lines with information about the file plus the header) that should not be imported into the target table. To achieve this, the IGNORE 5 lines option is added.

The remaining lines are prefixed with a hash character, the row number, and a colon. This part of every line has to be ignored, which is what the LINES STARTING BY ':' option does: it tells MySQL to ignore the first colon of the line and any character before it. By doing this, the row number prefix is skipped.

After the prefix, the lines contain the actual values, separated by pipe characters. The FIELDS TERMINATED BY ' | ' option tells MySQL how to identify a field separator. With the additional setting OPTIONALLY ENCLOSED BY '"', the value itself might contain this field separator sequence—if the whole value is enclosed by double quotes (this is the case in the last row of the sample file).

At this point, there is only one problem left: the lines end with a dollar sign, which is not part of the last value. An intuitive approach would be to include this character in the line termination sequence, which means to use $\r\n as a line ending (instead of \r\n). Unfortunately, this definition of a line end does not work as expected for our example, as it would break the interpretation of the first five lines, which are not terminated the same way. As a result, the first six lines would be considered as one single line by the import because only the sixth line actually ends with a character sequence of $\r\n. To be able to explicitly exclude the header lines from the import, we have to rely on the "traditional" line ending defined by the [LINES] TERMINATED BY '\r\n' option.

Hence, the options for defining the field separators, and the beginning and termination of a line do not allow us to get rid of the closing dollar sign. Thus it is considered part of the last value, which is assigned to column c3. To finally get rid of this postfix, the SET clause of the LOAD DATA INFILE command comes in handy, which allows to clearly define the values that are assigned to the columns in the target table. The closing option SET c3=TRIM(TRAILING ' $' FROM c3); defines a way to strip the unwanted postfix from the last field.

If we put it all together, the import works as intended:

There's more...

As with exporting data, it is recommended to consider using an external programming language to import more complex data structures into MySQL. While it is possible to import rather sophisticated file formats using MySQL commands as well, it is often far more efficient to have a full-blown programming language at hand to solve the task of parsing input files. This is most notably the case when it comes to XML files.

For importing data from XML files, consider using an XSLT processor to produce corresponding SQL commands!

Summary

In this article, we discussed the following:

  • Exporting data to a simple CSV file
  • Exporting data to a custom file format
  • Importing data from a simple CSV file
  • Importing data from custom file formats

About the Author :


Daniel Schneller

Daniel Schneller works as a software developer, database administrator, and general IT professional for an independent software vendor in the retail sector. After successfully graduating from the University of Cooperative Education in Heidenheim, Germany with a degree in Business Computer Science, he started his career as a professional software developer, focused on the Microsoft technology stack. In 2002 he started focusing on enterprise-level Java development and has since gained extensive knowledge and experience implementing large scale systems based on Java EE and relational databases, especially MySQL since version 4.0.

Currently he is mostly involved with the ongoing development of framework-level functionality, including customization and extension of an ORM-based persistence layer. He is involved in different open source projects such as FindBugs, Eclipse, and Checkstyle and infrequently blogs about Java, MySQL, Windows, Linux and other insanities at http://www.danielschneller.com.

Udo Schwedt

Udo Schwedt is head of the Java architecture team and deputy head of the Java development department at the IT service provider for Germany's market leader in the Do-It-Yourself sector. After finishing his studies at RWTH Aachen, Germany with a degree in Computer Science, he started his career as a professional Java developer in a leading software company in the banking sector. In 2003, he joined his current company as a framework developer, subsequently taking the role of a lead architect.

Both authors are involved in a large scale international software project, which encompasses development of a Java-based merchandise management software solution. This decentralized multi-platform environment is based on more than 500 decentralized MySQL server instances with more than 5,500 replication slaves. Daniel and Udo are responsible for configuration and management of this infrastructure.

Books From Packt

 

Amazon SimpleDB Developer Guide
Amazon SimpleDB Developer Guide

NHibernate 2.x Beginner's Guide
NHibernate 2.x Beginner's Guide

Drupal 7 First look
Drupal 7 First look

Mastering phpMyAdmin 3.1 for Effective MySQL Management
Mastering phpMyAdmin 3.1 for Effective MySQL Management

jQuery 1.3 with PHP
jQuery 1.3 with PHP

Pentaho 3.2 Data Integration: Beginner's Guide
Pentaho 3.2 Data Integration: Beginner's Guide

High Availability MySQL Cookbook
High Availability MySQL Cookbook

Moodle 1.9 Theme Design: Beginner's Guide
Moodle 1.9 Theme Design: Beginner's Guide

 

 

No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
a
V
X
h
r
i
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software