In the simplest terms, InnoDB is a transactional database information processing engine. It is the most commonly used engine currently available for the MySQL database application and was the first transactional engine offered with the application.
Different versions of MySQL ship with varying levels of InnoDB support and corresponding default settings for the InnoDB engine. Knowing which version of MySQL we're using will help us identify the baseline settings for the engine prior to going into development or production use of the database.
We'll cover the following topics:
Everything needed to get up and running with the database as well as associated operating system requirements and supported hardware platforms
How to ensure that InnoDB is available to MySQL and which default settings are enabled
Basics of plugin configuration
Basics of interacting with a new schema that will be used with InnoDB
InnoDB is more than a fast disk-based relational database engine. It offers, at its core, the following features that separate it from other disk-based engines:
MVCC
ACID compliance
Transaction support
Row-level locking
These features are responsible for providing what is known as Referential integrity; a core requirement for enterprise database applications.
Referential integrity can be best thought of as the ability for the database application to store relational data in multiple tables with consistency. If a database lacks consistency between relational data, the data cannot be relied upon for applications. If, for example, an application stores financial transactions where monetary data is processed, referential integrity and consistency of transactional data is a key component. Financial data is not the only case where this is an important feature, as many applications store and process sensitive data that must be consistent
A vital component is Multiversion concurrency control (MVCC), which is a control process used by databases to ensure that multiple concurrent connections can see and access consistent states of data over time. A common scenario relying on MVCC can be thought of as follows: data exists in a table and an application connection accesses that data, then a second connection accesses the same original data set while the first connection is making changes to it; since the first connection has not finalized its changes and committed its information we don't want the second connection to see the nonfinalized data. Thus two versions of the data exist at the same time—multiple versions—to allow the database to control the concurrent state of the data. MVCC also provides for the existence of point-in-time consistent views, where multiple versions of data are kept and are available for access based on their point-in-time existence.
Transaction support at the database level refers to the ability for units of work to be processed in separate units of execution from others. This isolation of data execution allows each database connection to manipulate, read, and write information at the same time without conflicting with each other. Transactions allow connections to operate on data on an all-or-nothing operation, so that if the transaction completes successfully it will be written to disk and recorded for upcoming transactions to then operate on. However, if the sequence of changes to the data in the transaction process do not complete then they can be rolled back, and no changes will be recorded to disk. This allows sequences of execution that contain multiple steps to fully succeed only if all of the changes complete, and to roll back any changed data to its original state if one or more of the sequence of changes in the transaction fail. This feature guarantees that the data remains consistent and referentially safe.
An integral part of InnoDB is its ability to ensure that data is atomic, consistent, isolated, and durable; these features make up components of ACID compliance. Simply put, atomicity requires that if a transaction fails then the changes are rolled back and not committed. Consistency requires that each successfully executed transaction will move the database ahead in time from one state to the next in a consistent manner without errors or data integrity issues. Isolation defines that each transaction will see separate sets of data in time and not conflict with other transactional data access. Finally, the durability clause ensures that any data that has been committed in a successful transaction will be written to disk in its final state, without the risk of data loss from errors or system failure, and will then be available to transactions that come in the future.
Finally, InnoDB differs from other on-disk storage engines in that it offers row-level locking. This primarily differs, in the MySQL world, with the MyISAM storage engine which features table-level locking. Locking refers to an internal operation of the database that prohibits reading or writing of table data by connections if another is currently using that data. This prevents concurrent connections from causing data corruption or forcing data invalidation when data is in use. The primary difference between table- and row-level locking is that when a connection requests data from a table it can either lock the row of data being accessed or the whole table of data being accessed. For performance and concurrency benefits, row-level locking excels.
InnoDB can be used on all platforms on which MySQL can be installed. These include:
There are also custom ports of MySQL from the open source community for running MySQL on various embedded platforms and non-standard operating systems.
Hardware-wise, MySQL and correspondingly InnoDB, will run on a wide variety of hardware, which at the time of this writing includes:
Intel x86 32 bit
AMD/Intel x 86_64
Intel Itanium IA-64
IBM Power architecture
Apple's PPC
PA-RISC 1.0 + 2.0
SPARC 32 + 64 bit
Keep in mind when installing and configuring InnoDB, depending on the architecture in which it is installed, it will have certain options available and enabled that are not available on all platforms. In addition to the underlying hardware, the operating system will also determine whether certain configuration options are available and the range to which some variables can be set. One of the more decisively important differences to be considered while choosing an operating system for your database server is the manner in which the operating system and underlying filesystem handles write caching and write flushes to the disk storage subsystem. These operating system abilities can cause a dramatic difference in the performance of InnoDB, often to the order of 10 times the concurrency ability. These options will be explained in detail in Chapter 3, Advanced Configuration Parameters.
MySQL should, unless you have a specific need for an older version, be downloaded directly from the MySQL website. Some operating systems maintain the latest version of MySQL in their package repositories while some do not, thus offering outdated and potentially insecure or unstable versions of the database application. Since end users do not have control over the operating system package repositories, and thus cannot upgrade the version of MySQL in the repository, it is professionally recommended that any server destined for production use be installed via the General Availability version of MySQL Community or Enterprise from the MySQL corporation website. The latest downloads can be found at http://www.mysql.com/downloads/mysql/.
As of MySQL 3.23, InnoDB has been included with the MySQL GA releases, and has been the primary choice for transactional engine support since. One of the more visible major changes came about in early versions of MySQL 5.1, when the InnoDB engine was shipped slightly different than before, where it was available as a pluggable storage engine. This new architecture, which allowed users to enable or disable storage engines on the fly, as opposed to compiling in support for different engines, made MySQL very flexible to different use cases. As such, using the plugin version of InnoDB allows the user to enable additional settings that were not available in the default included version of the engine.
As of MySQL 5.5, the default version of InnoDB is also the latest version of the plugin which negates the necessity of installing and enabling the formal plugin directly from InnoBase. From the MySQL website, the specific version differences are defined as follows:
As of MySQL 5.1.38, the InnoDB Plugin is included in MySQL 5.1 releases, in addition to the built-in version of InnoDB that has been included in the previous releases. MySQL 5.1.42 through 5.1.45 include InnoDB Plugin 1.0.6, which is considered of Release Candidate (RC) quality. MySQL 5.1.46 and up include InnoDB Plugin 1.0.7 or higher, which is considered of General Availability (GA) quality.
Since MySQL ships with the engine by default, the next step is to install MySQL. This can be achieved in different ways depending on your chosen architecture and platform. The scope of specific instructions for installation is not covered here due to the wide variety of platforms and architectures that MySQL supports.
Knowing which version of InnoDB is available on your server is of prime importance when planning benchmarks, testing, production configuration, and associated performance tuning. As per the documentation, InnoDB has gained many variables to tune the performance and stability over the years. The first step to using InnoDB is to ensure that it is enabled and available on your MySQL server. You can check if InnoDB is enabled by running the following SQL queries from the MySQL command line or via another means of manual query execution such as MySQL Workbench or phpMyAdmin:
mysql> show engines\G Engine: InnoDB Support: DEFAULT Comment: Supports transactions, row-level locking, and foreign keys Transactions: YES XA: YES Savepoints: YES mysql> SHOW VARIABLES LIKE 'have_innodb'; +---------------+-------+ | Variable_name | Value | +---------------+-------+ | have_innodb | YES | +---------------+-------+ 1 row in set (0.00 sec)
Before creating any tables with InnoDB as the engine, it is important to check the setting for transaction isolation level. You can check the current setting of your running database by executing the following query:
[mysql://root@localhost/(none)]> SHOW GLOBAL VARIABLES LIKE 'tx_isolation'; +---------------+-----------------+ | Variable_name | Value | +---------------+-----------------+ | tx_isolation | REPEATABLE-READ | +---------------+-----------------+ 1 row in set (0.00 sec)
There are four options for this global configuration variable, and the setting determines the method in which InnoDB handles transactional data execution and locking. If the variable is set to one of the non-ACID settings when the application is expecting ACID ability from the database then this can result in dead locks, phantom data, inconsistent referential data, and even data loss.
ACID is defined by the following properties:
All transactions fail, or none of the transactions fail. Basically if a transaction fails because of a hardware issue, connection issue, and so on – those partial changes won't commit. It's 100 percent or 0 percent operation.
Data being read by a select
statement is all at the same state. So when you use a transaction you're getting the most current and consistent data available. This is related to multiversion concurrency control (MVCC).
Nothing that's being read is actively being changed by another transaction. Your connection or transaction's read is not going to be changed by another transaction while you're dealing with that data.
Changes to the database persist – basically that means that if a transaction is committed and the DB fails or server crashes your changes will be there – which is why InnoDB uses transaction log files (where data is kept before being written to disk. The engine will read the logs on next startup and commit any remaining transactions in the logs that did not make to disk based tables).
The following options are available for this setting:
READ UNCOMMITTED: Every
select
statement operates without locks so you don't get consistency and might have dirt reads, which are potentially earlier versions of data as defined by MVCC. As a result, this setting does not conform to ACID compliance standards and should never be used if your application issues transactions that require point-in-time consistent data reads.READ COMMITTED: This setting offers consistent reads without table or row locks. Each consistent read, even within the same transaction, sets and reads its own fresh snapshot of point-in-time data. Regardless, this setting does not conform to ACID compliance standards but it offers a compromise of consistency and performance for applications that do not require full ACID settings.
REPEATABLE READ: The InnoDB default isolation level for ACID compliance. All reads within the same transaction will be consistent between each other – this defines the C in ACID. Additionally, all writes will be Durable, all transactions Atomic, all reads Isolated. Unless you have specific reasons to do otherwise, this should be your default setting for the transaction-isolation configuration variable.
SERIALIZABLE: This is the same as REPEATABLE READ but MySQL converts regular
select
statements with the preface of LOCK IN SHARED MODE when auto-commit is enabled. If auto-commit is disabled then eachselect
statement is started in a separate transaction, which will ensure that all reads are consistent. This setting also allows for XA distributed transactions support, which you can read more about in the MySQL manual. The SERIALIZABLE value setting will impact database transaction execution performance, so only enable this if it is absolutely necessary.
Once InnoDB support has been verified, you can start using the engine in your table definitions. A simple test to interact with InnoDB can be run as follows, where we will create a table with the engine specified explicitly along with a primary key, which InnoDB uses to index data.
If you leave off the engine definition at the end of the table create statement then MySQL will create the table with the system default engine, which is defined by the startup variables: default-storage-engine
or, alternately, storage_engine
, both accomplish the same purpose.
If you plan to use InnoDB exclusively for your table engine definitions, it generally makes sense to ensure that tables are created, explicitly and implicitly, by changing one or both of those startup variables to InnoDB. As of MySQL Version 5.5, the default has been changed to InnoDB so depending on your version you may not need to explicitly define the variable.
mysql> create database test; use test; Query OK, 1 row affected (0.00 sec) Database changed mysql> CREATE TABLE `test` ( -> `id` int(8) NOT NULL auto_increment, -> `data` varchar(255), -> `date` datetime, -> PRIMARY KEY (`id`), -> INDEX `date_ix` (`date`) -> ) ENGINE=InnoDB; Query OK, 0 rows affected (0.06 sec)
In the preceding table create statement, we created one table with three columns. These columns are described as follows:
A primary key based
id
field that will automatically increase in value for every inserted rowA variable character based
data
field to store our valuesDatetime
baseddate
field to record the insert date
Besides edge cases, these columns can be considered the bare minimum columns for relational data. The automatically incremented id
field will allow InnoDB to efficiently index our data or rapid lookups, and the date
field will allow us to easily sort and search our data based on the time it was created in the table.
These two columns are specifically important for InnoDB based tables, as other table engines can operate relatively quick and efficient full table scans without a primary key column if running a typical query such as SELECT COUNT(*) FROM test;
. However, InnoDB requires a PRIMARY KEY column or other INDEX to read in order for the query results to remain expedient; thus our id
column provides that performance gain. Instead of writing the inefficient latter query, we can now write SELECT COUNT(id) FROM test;
and InnoDB will access the id
column PRIMARY KEY index which results in a very fast result versus a full table scan. Similarly, we can sort our table data via the date
column and get quickly-returned table data.
If you'd like to experiment with InnoDB features more easily or see how different versions of MySQL work with InnoDB, you can install a sandboxing tool such as MySQL Sandbox.
At this point you should know the hardware and operating system requirements for installing MySQL, as well as the basic process for downloading and enabling support for the InnoDB engine. We've covered the requirements for ACID compliance and shown you how to set up basic indexing, as well as the process for creating a sample InnoDB table. Next we'll get into the basics of tuning InnoDB for better performance.