Advanced aspects of Inserting and Deleting data in MySQL

Exclusive offer: get 50% off this eBook here
MySQL Admin Cookbook

MySQL Admin Cookbook — Save 50%

99 great recipes for mastering MySQL configuration and administration

$29.99    $15.00
by Daniel Schneller Udo Schwedt | September 2010 | Cookbooks MySQL Open Source

In this article, by Daniel Schneller & Udo Schwedt, authors of MySQL Admin Cookbook

  • Inserting new data and updating data if it already exists
  • Inserting data based on existing database content
  • Deleting all data from large tables
  • Deleting all but a fragment of a large table's data
  • Deleting all data incrementally from large tables

 

MySQL Admin Cookbook

MySQL Admin Cookbook

99 great recipes for mastering MySQL configuration and administration

  • Set up MySQL to perform administrative tasks such as efficiently managing data and database schema, improving the performance of MySQL servers, and managing user credentials
  • Deal with typical performance bottlenecks and lock-contention problems
  • Restrict access sensibly and regain access to your database in case of loss of administrative user credentials
  • Part of Packt's Cookbook series: Each recipe is a carefully organized sequence of instructions to complete the task as efficiently as possible

Read more about this book

(For more resources on MySQL, see here.)

Inserting new data and updating data if it already exists

Manipulating data in a database is part of everyday work and the basic SQL means of INSERT, UPDATE, and DELETE make this a pretty straightforward, almost trivial task—but is this always true?

When considering data manipulation, most of the time we think of a situation where we know the content of the database. With this information, it is usually pretty easy to find a way of changing the data the way you intend to. But what if you have to change data in circumstances where you do not know the actual database content beforehand?

You might answer: "Well, then look at your data before changing it!" Unfortunately, you do not always have this option. Think of distributed installations of any software that includes a database. If you have to design an update option for this software (and the respective databases), you might easily come to a situation where you simply do not know about the actual database content.

One example of a problem arising in these cases is the question of whether to insert or to update data: "Does the data in question already (partially) exist?" Let us assume a database table config that stores configuration settings. It holds key-value pairs, with name being the name (and thus the key) of the setting and value its value. This table exists in different database installations, one for every branch office of your company. Your task is to create an update package to set a uniform upper limit of 25% for the price discount that is allowed in your sales software. If no such limit has been defined yet, there is no respective entry in the config table, and you have to insert a new record. If the limit, however, has been set before (for example by the local manager), the entry does already exist, in which case you have to update it to hold the new value.

While the update of a potentially existing entry does not pose a problem, an INSERT statement that violates uniqueness constraints will simply cause an error. This is, however, typically not acceptable in an automated update procedure. The following recipe will show you how to solve this problem with only one SQL command.

Getting ready

Besides a running MySQL server, a SQL client, and an account with appropriate user rights (INSERT, UPDATE), we need a table to update. In the earlier example, we assumed a table named sample.config with two character columns name and value. The name column is defined as the primary key:

CREATE TABLE sample.config (
name VARCHAR(64) PRIMARY KEY,
value VARCHAR(64));

How to do it...

  1. Connect to your database using your SQL client
  2. Execute the following command:

    mysql> INSERT INTO sample.config VALUES ("maxPriceDiscount",
    "25%") ON DUPLICATE KEY UPDATE value='25%';

    Query OK, 1 row affected (0.05 sec)

How it works...

This command is easily explained because it simply does what it says: it inserts a new row in the table using the given values, as long as this does not cause a duplicate entry in either the primary key or another unique index. If a duplicate record exists, the existing row is updated according to the clauses defined after ON DUPLICATE KEY UPDATE.

While it is sometimes tedious to enter some of the data and columns two times (once for the INSERT and a second time for the UPDATE), this statement allows for a lot of flexibility when it comes to the manipulation of potentially existing data.

Please note that when executing the above statement, the result differs slightly with respect to the number of affected rows, depending on the actual data present in the database: When the record does not exist yet, it is inserted, which results in one affected row. But if the record is updated rather than inserted, it reports two affected rows instead, even if only one row gets updated.

There's more...

The INSERT INTO … ON DUPLICATE UPDATE construct does not work when there is no UNIQUE or PRIMARY KEY defined on the target table. If you have to provide the same semantics without having appropriate key definitions in place, it is recommended to use the techniques discussed in the next recipe.

Inserting data based on existing database content

In the previous recipe Inserting new data and updating data if it already exists, we discussed a method to either insert or update records depending on whether the records already exist in the database. A similar problem arises when you need to insert data to your database, but the data to insert depends on the data in your database.

As an example, consider a situation in which you need to insert a record with a certain message into a table logMsgs, but the message itself should be different depending on the current system language that is stored in a configuration table (config).

It is fairly easy to achieve a similar behavior for an UPDATE statement because this supports a WHERE clause that can be used to only perform an update if a certain precondition is met:

UPDATE logMsgs SET message=
CONCAT('Last update: ', NOW()) WHERE EXISTS
(SELECT value FROM config WHERE
name='lang' AND value = 'en');
UPDATE logMsgs SET message=
CONCAT('Letztes Update: ', NOW()) WHERE EXISTS
(SELECT value FROM config WHERE
name='lang' AND value = 'de');
UPDATE logMsgs SET message=
CONCAT('Actualisation derniere: ', NOW()) WHERE EXISTS
(SELECT value FROM config WHERE
name='lang' AND value = 'fr');

Unfortunately, this approach is not applicable to INSERT commands, as these do not support a WHERE clause. Despite this missing option, the following recipe describes a method to make INSERT statements execute only if an appropriate precondition in the database is met.

Getting ready

As before, we assume a database, a SQL client (mysql), and a MySQL user with sufficient privileges (INSERT and SELECT in this case). Additionally, we need a table to insert data into (here: logMsgs) and a configuration table config (please refer to the previous recipe for details).

How to do it...

  1. Connect to your database using your SQL client.
  2. Execute the following SQL commands:

    mysql> INSERT INTO sample.logMsgs(message)
    -> SELECT CONCAT('Last update: ', NOW())
    -> FROM sample.config WHERE name='lang' AND value='en';
    Query OK, 0 rows affected (0.00 sec)
    Records: 0 Duplicates: 0 Warnings: 0

How it works...

Our goal is to have an INSERT statement take into account the present language stored in the database. The trick to do so is to use a SELECT statement as input for the INSERT. The SELECT command provides a WHERE clause, so you can use a condition that only matches for the respective language. One restriction of this solution is that you can only insert one record at a time, so the size of scripts might grow considerably if you have to insert lots of data and/or have to cover many alternatives.

There's more...

If you have more than just a few values to insert, it is more convenient to have the data in one place rather than distributed over several individual INSERT statements. In this case, it might make sense to consolidate the data by putting it inside a temporary table; the final INSERT statement uses this temporary table to select the appropriate data rows for insertion into the target table. The downside of this approach is that the user needs the CREATE TEMPORARY TABLES privilege, but it typically compensates with much cleaner scripts:

After creating the temporary table with the first statement, we insert data into the table with the following INSERT statement. The next statement inserts the appropriate data into the target table sample.logMsgs by selecting the appropriate data from the temporary data that matches the language entry from the config table. The temporary table is then removed again. The final SELECT statement is solely for checking the results of the operation.

Deleting all data from large tables

Almost everyone who works with databases experiences the constant growth of the data stored in their database and it is typically well beyond the initial estimates. Because of that you often end up with rather large data sets. Another common observation is that in most databases, there are some tables that have a special tendency to grow especially big.

If a table's size reaches a virtual threshold (which is hard to define, as it depends heavily on the access patterns and the data structures), it gets harder and harder to maintain and performance degradation might occur. From a certain point on, it is even difficult to get rid of data in the table again, as the sheer number of records makes deletion a pretty expensive task. This particularly holds true for storage engines with Multi-Version Concurrency Control (MVCC): if you order the database to delete data from the table, it must not be deleted right away because you might still roll back the deletion. So even while the deletion was initiated, a concurrent query on the table still has to be able to see all the records (depending on the transaction isolation level). To achieve this, the storage engine will only mark the records as deleted, but the actual deletion takes place after the operation is committed and all other transactions that access this table are closed as well.

If you have to deal with large data sets, the most difficult task is to operate on the production system while other processes concurrently work on the data. In these circumstances, you have to keep the duration of your maintenance operations as low as possible in order to minimize the impact on the running system. As the deletion of data from a large table (typically starting at several millions of rows) might take quite some time, the following recipe shows a way of minimizing the duration of this operation in order to reduce side effects (like locking effects or performance degradation).

Getting ready

Besides a user account with appropriate privileges (DELETE), you need a sufficiently large table to delete data from.

For this recipe, we will use the employees database, which is an example database available from MySQL: http://dev.mysql.com/doc/employee/en/employee.html
This database provides some tables with sensible data and some pretty large tables, the largest having more than 2.8 million records.

We assume that the Employees database was installed with an InnoDB storage engine enabled. To delete all rows of the largest table employees.salaries in a quick way, please read on.

How to do it...

MySQL Admin Cookbook 99 great recipes for mastering MySQL configuration and administration
Published: March 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

Read more about this book

(For more resources on MySQL, see here.)

  1. Connect to your database.
  2. Enter the following SQL command:

    mysql> TRUNCATE TABLE employees.salaries;
    Query OK, 0 rows affected (0.16 sec)

How it works...

The TRUNCATE TABLE command is a rather fast way of deleting all data from a table. For tables that are not referenced by Foreign key constraints (more on that later), the command basically drops the table temporarily and recreates the table with the same structure as before. This operation has basically a constant time characteristic—the amount of data stored inside the table does not have any effect in the time needed for the TRUNCATE command.

Before MySQL 5.0.3, the TRUNCATE TABLE statement for InnoDB tables was always equivalent to a DELETE statement, regardless of whether Foreign key constraints exist or not. To take advantage of the speed improvements, you have to use MySQL 5.0.3 or later.

In comparison to a classical DELETE FROM employees.salaries; operation, the reduction in time needed is striking:

The TRUNCATE TABLE command takes only a fraction of the time needed for the DELETE. However, there are some caveats.

First of all, the TRUNCATE command will only have the speed advantage on InnoDB tables if the table is not referenced by any Foreign key constraints. But if the table is referenced by Foreign keys, the TRUNCATE TABLE command is equivalent to executing a DELETE statement with no WHERE clause, also eliminating all speed differences:

mysql> CREATE TABLE employees.salaries_referencer (
-> emp_no INT,
-> from_date DATE,
-> CONSTRAINT salaries_fk
-> FOREIGN KEY (emp_no, from_date)
-> REFERENCES salaries (emp_no, from_date)
-> ON DELETE RESTRICT);
Query OK, 0 rows affected (0.08 sec)

mysql> TRUNCATE TABLE employees.salaries;
Query OK, 0 rows affected (1 min 33.44 sec)

Furthermore, the TRUNCATE statement requires the DROP privilege (before MySQL 5.1.16, it only requires the DELETE permission), which forbids use of this command for some users.

And finally, TRUNCATE is not a transaction-safe command. If you execute a TRUNCATE statement, you will not be able to perform a rollback on this operation any more, and any open operation from the current transaction gets automatically committed as well. This is a characteristic that disqualifies this statement for situations in which the possibility of performing a rollback is mandatory; you will have to stick with the (much slower) DELETE in these cases.

There's more...

As we have seen, TRUNCATE TABLE only has performance advantages if there is no Foreign key reference to the table that is to be deleted. Here we will discuss how to use the speed improvements even in case of existing references.

Temporarily disabling Foreign key constraints

To make use of the increased speed of TRUNCATE TABLE although the target table is referenced via Foreign keys, you could temporarily remove the Foreign key constraints, use the TRUNCATE TABLE command, and reestablish the references afterwards. Using the above example of a table salaries_referencer that references salaries, you could use the following sequence:

mysql> ALTER TABLE employees.salaries_referencer
-> DROP FOREIGN KEY salaries_fk;
Query OK, 0 rows affected (0.19 sec)
Records: 0 Duplicates: 0 Warnings: 0

With this sequence, you temporarily disable the Foreign key constraints to have TRUNCATE TABLE use the faster deletion method. Beware, however, that this method might also lead to problems when the deletion of the table produces "loose ends". If the referencing table salaries.referencer holds records that referenced the now empty target table salaries, the creation of the Foreign key constraints will fail:

Also, keep in mind that this situation might also occur because of concurrent processes, which are able (for the duration of the disabled constraints) to insert data into the tables that violate the intended referential integrity.

As an alternative, you might be tempted to temporarily disable the Foreign key checks by setting foreign_key_checks to zero. While this works regarding the TRUNCATE TABLE performance, it is strongly discouraged to use this option because the Foreign key integrity is not revalidated when the Foreign key checks are enabled again. So you risk inconsistent data with respect to the referential integrity.

Deleting all but a fragment of a large table's data

In the previous recipe Deleting all data from large tables, we discussed a method of quickly removing all data from large tables while avoiding performance hits. But experience shows that you often must not delete all data, but have to retain some records and delete the rest. The TRUNCATE TABLE command does not allow any additional clauses to define which records to delete and which not; it always deletes all entries.

The intuitive solution to this would be to use a normal DELETE command with a WHERE clause that only matches the records to delete. For large tables, this might prove quite an expensive operation (in terms of duration). In this recipe, we will show you how to quickly remove most of the data from large tables while preserving some of the records.

Getting ready

We again need a MySQL server up and running and a SQL client (like mysql). For this recipe, we also need a user account with SELECT, INSERT, DELETE, DROP, and CREATE privileges for the target database (we will use the sample_install user throughout this section). We will furthermore use the Employees sample database in an InnoDB context. This database was introduced in the previous recipe and is available for free on the MySQL website. We will use the largest table salaries (with more than 2.8 million records) as the table to delete from. In our example, we will delete all records having a from_date before the threshold of '2002-01-01 00:00:00.0'.

How to do it...

    Connect to the database using a SQL client and the sample_install account.
  1. Execute the following commands:

    mysql> use employees;
    Database changed
    mysql> CREATE TABLE salaries_part
    -> SELECT * FROM salaries
    -> WHERE from_date >= "2002-01-01 00:00:00.0";
    Query OK, 140930 rows affected (11.47 sec)
    Records: 140930 Duplicates: 0 Warnings: 0

    mysql> TRUNCATE TABLE salaries;
    Query OK, 0 rows affected (0.05 sec)

    mysql> INSERT INTO salaries SELECT * from salaries_part;
    Query OK, 140930 rows affected (4.63 sec)
    Records: 140930 Duplicates: 0 Warnings: 0

    mysql> DROP TABLE salaries_part;
    Query OK, 0 rows affected (0.06 sec)

How it works...

For speeding up the deletion of most of the data from a large table, we utilize the speed advantage of the TRUNCATE TABLE statement over a DELETE command. In detail, the steps are as follows:

The initial USE statement is for convenience only, so we do not have to give the employees prefix for every table.

With the next statement (CREATE TABLE … SELECT * FROM …), we simply copy the data that should not be removed to a newly created table salaries_part.

Be careful to avoid errors when inverting conditions: to delete all entries before time X, you have to copy all records values later or equal to X! If you copy only records later than X, all records exactly at time X would get deleted as well

This table temporarily holds the data while we delete all data from the large salaries table using TRUNCATE in the next step. Afterwards, we simply copy the partial data from the salaries_part table back into the original (now emptied) salaries table. With the final step, we scrap the salaries_part table again, as it is not needed any more.

You could also create salaries_part as a TEMPORARY table. In this case, you could also skip the final DROP statement. This method is discouraged because it might lead to data loss in case of an error. Think of an error that occurs right after all data was deleted from the original salaries table, but before the data from the temporary table is restored. If the connection is closed, the data from the temporary table is lost. A non-temporary table does not entail this risk.

There's more...

We will not try to conceal that this approach has some caveats as well. First of all, the user who performs this operation needs some additional privileges like CREATE and DROP, which renders it unusable for many users with only basic permissions.

You should also keep in mind that the use of either CREATE TABLE or TRUNCATE causes an automatic commit of any transaction currently active, which basically means that this approach does not provide any transaction safety.

If concurrent database access is possible during the process of deletion, an additional problem comes up. In the period of time between the TRUNCATE and completion of the INSERT INTO … SELECT FROM … statements, the salaries table is empty for any other transaction. You have to make sure that this will not cause any problem. You should use the DELETE approach otherwise, as this will not produce intermediate states in which the database table is completely empty.

And finally, the performance benefit of this approach for InnoDB greatly depends on the speed of the TRUNCATE TABLE statement. However, if there are tables that reference the target table with a Foreign key, the TRUNCATE will be equivalent to a DELETE statement, thus destroying all performance improvements. A solution to this problem is to temporarily remove the Foreign key references. Please refer to the Temporarily disabling Foreign key constraints section of the previous recipe for a description of how to achieve this.

Performance considerations

A comparison between the method presented in this recipe and the use of an ordinary DELETE statement shows that the advantages depend on the amount of data that is not deleted. The more data is copied to the provisional table, the longer the operation takes. The DELETE statement, however, behaves conversely: it gets faster if more data is deleted. From a certain threshold on, the normal deletion will even be faster than the Copy-and-Truncate approach. As a rule of thumb for InnoDB tables, if you delete two thirds of the data or more, you can use the Copy-and-Truncate method; otherwise, a simple DELETE might prove faster. This differs slightly for other storage engines: for MyISAM, the Copy-and-Truncate method typically works faster if more than half of the data is deleted. So when considering a partial deletion of data from large tables, you should take a second to think about which approach fits better for your particular circumstances.

The following two diagrams compare the times needed to partially delete data either using a simple DELETE statement or the Copy-and-Truncate solution for different numbers of rows that are left after the operation. The table originally contains about 2.8 million rows. The first figure shows the comparison for the InnoDB storage engine:

For MyISAM, the Copy-and-Truncate mechanism is faster even for larger numbers of remaining rows:

MySQL Admin Cookbook 99 great recipes for mastering MySQL configuration and administration
Published: March 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

Read more about this book

(For more resources on MySQL, see here.)

Deleting all data incrementally from large tables

In the previous recipe Deleting all but a fragment of a large table's data, we discussed a method to quickly remove all but a small remainder of records from a large table. The downside of the approach presented there is the fact that during the process of deletion, the table temporarily appears completely empty to an observer. Unfortunately, this is often not acceptable, especially if your database is used in an environment with many parallel processes concurrently accessing the database, particularly the large table discussed here.

On the other hand, the alternative of simply using a DELETE statement is sometimes not acceptable either. A DELETE statement temporarily creates locks on the entries that are deleted. As a result, for the duration of the deletion, a major part of the table gets locked, thus preventing concurrent access to the table by other processes. This typically leads to timeout situations and other errors, as in the following example:

The following recipe shows an approach to deleting data from large tables without blocking access to the table's data for too long, so the deletion can happily be performed despite concurrent tasks simultaneously accessing the very same table.

Getting ready

As the recipe uses a stored procedure, we again need a user account with the CREATE ROUTINE privilege as well as the DELETE permission for the target database. Throughout the following steps, we will again assume sample_install as the user. The example table for deletion is salaries from the employees sample database (see previous recipes) once more. In our example, we will delete all entries from the table with an employee number emp_no below 485000.

How to do it...

  1. Connect to the database using the sample_install account.
  2. Enter the following SQL statements:

    mysql> delimiter //
    mysql> CREATE PROCEDURE employees.delete_incrementally()
    -> MODIFIES SQL DATA
    -> BEGIN
    -> REPEAT
    -> DELETE FROM employees.salaries
    -> WHERE emp_no < 485000
    -> LIMIT 20000;
    -> UNTIL ROW_COUNT() = 0 END REPEAT;
    -> END //
    Query OK, 0 rows affected (0.00 sec)

    mysql> delimiter ;

    mysql> CALL employees.delete_incrementally();
    Query OK, 0 rows affected, 137 warnings (3 min 58.09 sec)

How it works...

The above steps simply create a stored procedure named delete_incrementally(), which can be used to delete certain records from the table salaries.

The DELIMITER statements at start and end of the script are necessary to define a stored procedure, as the statements would otherwise be executed right away. The procedure definition itself is pretty straightforward and basically consists of a REPEAT … UNTIL loop that deletes data from the salaries table according to the given condition (WHERE emp_no < 485000). The special part of this DELETE statement is the LIMIT clause, which defines that no more than 20,000 rows should be deleted. This statement is executed by the loop as long as there are any records left to delete. As soon as the number of records affected by the DELETE statement (which can be retrieved using the ROW_COUNT() function) is zero, the loop ends.

The trick used by this approach is to distribute the period of time needed to delete the data from one block to multiple intervals.

In sum, the incremental deletion in steps of 20,000 actually is considerably slower than a single DELETE statement, but it is much more cooperative when it comes to concurrent write access to the same data.

The benefit lies within the fact that every single partial DELETE statement does not run very long, which drastically reduces the period of time in which locks are held for parts of the table. This basically eliminates the locking problems between deletion and other processes:

As you can see, even while the deletion still runs, the parallel modifications to the database work concurrently without lock wait timeout errors or similar problems.

There's more...

While typically the incremental deletion is slower than one single delete operation, this can change under heavy load for InnoDB tables: parallel transactions work on a snapshot of the current data at the point in time when the transaction starts. This feature is provided by InnoDB's Multi-Version Concurrency Control (MVCC).

With many transactions and large amounts of data, the difference between the snapshots and the deleted data has to be maintained by MySQL. This housekeeping data is kept until the last transaction that was opened before the deletion was completed is closed. The administration of this delta might have negative impact on the overall MySQL performance. With incremental deletion, this delta data typically does not grow as big as with a long running delete statement, which often reduces the performance hit.

Under heavy load, the incremental deletion approach might actually cause a gain in overall performance.

Summary

In the above article we have covered:

  • Inserting new data and updating data if it already exists
  • Inserting data based on existing database content
  • Deleting all data from large tables
  • Deleting all but a fragment of a large table's data
  • Deleting all data incrementally from large tables

Further resources on this subject:


About the Author :


Daniel Schneller

Daniel Schneller works as a software developer, database administrator, and general IT professional for an independent software vendor in the retail sector. After successfully graduating from the University of Cooperative Education in Heidenheim, Germany with a degree in Business Computer Science, he started his career as a professional software developer, focused on the Microsoft technology stack. In 2002 he started focusing on enterprise-level Java development and has since gained extensive knowledge and experience implementing large scale systems based on Java EE and relational databases, especially MySQL since version 4.0.

Currently he is mostly involved with the ongoing development of framework-level functionality, including customization and extension of an ORM-based persistence layer. He is involved in different open source projects such as FindBugs, Eclipse, and Checkstyle and infrequently blogs about Java, MySQL, Windows, Linux and other insanities at http://www.danielschneller.com.

Udo Schwedt

Udo Schwedt is head of the Java architecture team and deputy head of the Java development department at the IT service provider for Germany's market leader in the Do-It-Yourself sector. After finishing his studies at RWTH Aachen, Germany with a degree in Computer Science, he started his career as a professional Java developer in a leading software company in the banking sector. In 2003, he joined his current company as a framework developer, subsequently taking the role of a lead architect.

Both authors are involved in a large scale international software project, which encompasses development of a Java-based merchandise management software solution. This decentralized multi-platform environment is based on more than 500 decentralized MySQL server instances with more than 5,500 replication slaves. Daniel and Udo are responsible for configuration and management of this infrastructure.

Books From Packt

WordPress 3.0 jQuery
WordPress 3.0 jQuery

Drupal 7
Drupal 7

Drupal 7 First look
Drupal 7 First look

Mastering phpMyAdmin 3.1 for Effective MySQL Management
Mastering phpMyAdmin 3.1 for Effective MySQL Management

MySQL for Python
MySQL for Python

ChronoForms 1.3 for Joomla! site Cookbook
ChronoForms 1.3 for Joomla! site Cookbook

High Availability MySQL Cookbook
High Availability MySQL Cookbook

Moodle 1.9 Theme Design: Beginner's Guide
Moodle 1.9 Theme Design: Beginner's Guide

No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
X
7
9
m
4
B
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software