Home

Data

Mastering MariaDB

By Federico Razzoli

Book

eBook $32.99 $22.99

Print $54.99

Subscription $15.99 $10 p/m for three months

BUY NOW

$10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

BUY NOW $10 p/m for first 3 months. $15.99 p/m after that. Cancel Anytime!

eBook $32.99 $22.99

Print $54.99

Subscription $15.99 $10 p/m for three months

What do you get with a Packt Subscription?

This book & 7000+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook + Subscription?

Download this book in EPUB and PDF formats, plus a monthly download credit

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with a Packt Subscription?

This book & 6500+ ebooks & video courses on 1000+ technologies

60+ curated reading lists for various learning paths

50+ new titles added every month on new and emerging tech

Early Access to eBooks as they are being written

Personalised content suggestions

Customised display settings for better reading experience

50+ new titles added every month on new and emerging tech

Playlists, Notes and Bookmarks to easily manage your learning

Mobile App with offline access

What do you get with eBook?

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Download this book in EPUB and PDF formats

Access this title in our online reader

DRM FREE - Read whenever, wherever and however you want

Online reader with customised display settings for better reading experience

What do I get with Print?

Get a paperback copy of the book delivered to your specified Address*

Access this title in our online reader

Online reader with customised display settings for better reading experience

What do you get with video?

Download this video in MP4 format

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with video?

Stream this video

Access this title in our online reader

DRM FREE - Watch whenever, wherever and however you want

Online reader with customised display settings for better learning experience

What do you get with Audiobook?

Download a zip folder consisting of audio files (in MP3 Format) along with supplementary PDF

What do you get with Exam Trainer?

Flashcards, Mock exams, Exam Tips, Practice Questions

Access these resources with our interactive certification platform

Mobile compatible-Practice whenever, wherever, however you want

About this book

Publication date:: September 2014
Publisher: Packt
Pages: 384
ISBN: 9781783981540

Chapter 1. Understanding the Essentials of MariaDB

This chapter provides a generic overview of the MariaDB architecture. Note that this description is not meant to teach MariaDB to new users; some knowledge of the software is necessary to fully understand this book.

The following topics will be discussed in this chapter:

The MariaDB architecture
The workflow of SQL statement processing
Usage and tricks of the command-line client used, in all chapters of this book
Storage engines and their characteristics
Logs
Caches
User authentication and permissions
The INFORMATION_SCHEMA and PERFORMANCE_SCHEMA system databases
Compatibility with MySQL and other DBMS
Resources on the Web

The MariaDB architecture

MariaDB is a community-driven fork of MySQL that was started in 2009 by Monty Widenius, the original author of MySQL, after the old project was acquired by Oracle. The first version of MariaDB was based on MySQL 5.1, and the improvements to MySQL base code are regularly merged into the MariaDB project. Other features are also merged from the Percona Server, another fork that is very similar to the mainstream product.

The most important Percona feature merged into MariaDB is XtraDB, a fork of the InnoDB storage engine. InnoDB is the default storage engine in modern MySQL and MariaDB versions. XtraDB fixes bugs that are still present in InnoDB before the official bug fixes are released by Oracle. It also has performance improvements and other minor features. The protocol, API, and most SQL statements that work with MySQL also fully work with MariaDB. The plugins that are written for MySQL work with MariaDB too. Thanks to these characteristics, most of the applications for MySQL work with MariaDB, without any modifications required. But, at the same time, switching to MariaDB allows one to use interesting features that are not available with MySQL. If an application's developer ignores these features, the application can use the features of both—MariaDB and MySQL. While the reader is probably familiar with DBMS in general, and particularly MariaDB or MySQL, a quick architecture review might be useful. In this introductory chapter, the main components and operations performed by the server are listed. The details are left for discussion in the remaining chapters.

The following schema represents the architecture of MariaDB:

Basically, from a user's point of view, MariaDB receives some SQL queries or statements, elaborates them, and returns a result set. Let's see this process and the components involved in more detail:

When a client connects to MariaDB, an authentication is performed based on the client's hostname, username, and password. Authentication can optionally be delegated to a plugin.
If the login succeeds, the client can send a SQL query to the server.
The parser understands the SQL string.
The server checks whether the client has the permissions required for the requested action.
If the query is stored in the query cache, the results are immediately returned to the client.
The optimizer will try to find the fastest execution strategy, or query plan. This means that the optimizer decides the order in which the tables will be read. It also decides which indexes will be accessed and whether a temporary internal table will be used. A good strategy can greatly reduce the access to the disks and reduce the complexity of the operations by some order of magnitude. This topic will be discussed in Chapter 3, Optimizing Queries.
The storage engines read and write the data and index files and any cache that they may use to speed up operations. Some important features, such as the transactions and foreign keys, are implemented at the storage engine level.

MariaDB and the storage engines maintain a set of logs to keep a track of the received statements, errors occurred, changes to the data, and so on. Most of the logs are optional; however, some logs are necessary for some administrative tasks. For example, the binary log enables backups or replication. Logs will be explained in the later chapters.

MariaDB has several options that affect the server's behavior. Many of them are dynamic, which means that they can be changed at runtime; others are static, which means that the value assigned during a server's startup cannot change. Most of them exist in both—the session level, which means that any individual users can change the value for the current connection, and the global level, which applies to all users who did not set a session value. An option can be specified in several ways, such as server command-line parameters, in configuration files, or if it is dynamic, via a SQL statement. MariaDB reads a set of configuration files in a given order. The exact location and read order are dependent on the operating system. Typically, only one MariaDB instance runs on a machine, so only one configuration file is needed and usually it is /etc/my.cnf on Linux and my.ini in the MariaDB install directory on Windows, for example, C:\MariaDB 10.0\my.ini.

However, this modular configuration system is useful if several MariaDB (and perhaps MySQL) servers are installed on the same machine. Some settings are likely to be valid for all servers, but each server can specify more options or override the generic values. A file can also be placed in a user's home directory, so that it will only be read if MariaDB runs with that identity (the --user start up parameter). The configuration file patch is listed in Chapter 8, Backup and Disaster Recovery. Starting a server with command-line parameters overrides the file's settings. These techniques are useful when testing the behavior of different versions of the server, or with different settings.

This book does not describe all the existing options. The reader should already be familiar with the most important options and server variables. Some of them will, however, be explained when they are relevant to the topics discussed in the book. MariaDB Knowledge Base documents all the existing options.

The MariaDB server is the mysqld file. On Linux, it is possible to run the server directly but it is usually invoked by another script. The mysqld_safe script starts the server and also restarts it in cases where it is terminated abnormally. This is much safer in the production environments. The mysql.server script is also available for the System V-like systems, where the runlevel exists. This script is distributed with another name by many Linux distributions. When several installations are present on the same machine, it is possible to manage them using mysql_multi.

The command-line client

The code and output examples in this book use the mysql command-line client. Knowing some client commands can greatly increase productivity when this tool is used.

The mysql command-line client knows that a SQL statement is terminated when it finds a semicolon (;), a \g, or a \G terminator. In the first case, the output is printed in a tabular form, shown as follows:

MariaDB [(none)]> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| test               |
+--------------------+

Tip

Downloading the example code

You can download the example code files for all Packt Publishing books you have purchased from your account at http://www.packtpub.com. If you have purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

The mysql client has a prompt that normally appears at the beginning of a new line, as follows:

MariaDB [none]>

In the prompt, [none] means that no default database is selected. This means that, every time a table is named in a statement, the name of the database where it is located must be specifically specified. The USE statement selects a default database, whose name will appear in the prompt. The following example shows how to use it:

MariaDB [(none)]> USE test;
Database changed
MariaDB [test]>

When a statement spans on more lines, the lines begin with a different prompt, as shown in the following example:

MariaDB [test]> SELECT 1
    -> FROM DUAL;

If we forget to type a statement terminator, the modified prompt helps us notice the problem, shown as follows:

MariaDB [test]> SHOW TABLES
    ->

Here, the mysql client does not know that the statement is finished because a terminator (similar to a semicolon) is missing.

If a quote is open at the end of a line, the quoting character is shown in the prompt of the new line. While one could include a newline character in a string by pressing the Enter key, more often this happens by mistake. As we can see in the following example, the prompt helps us notice the problem:

MariaDB [test]> SELECT 'hello world FROM DUAL;
    '>

The problem here is that the end quote for the "hello world" string is missing. Note that the second line's prompt starts with a single quote.

Sometimes the tabular output is very difficult to read, particularly when output rows are longer than the command-line rows. When this is the case, the \G terminator is more convenient, as shown in the following example:

MariaDB [(none)]> SHOW VARIABLES LIKE 'char%' \G
*************************** 1. row ***************************
Variable_name: character_set_client
        Value: utf8
*************************** 2. row ***************************
Variable_name: character_set_connection
        Value: utf8
*************************** 3. row ***************************
Variable_name: character_set_database
        Value: latin1
…

On Linux systems, it is possible to use a pager program to read long outputs. Pagers provide the ability to scroll the output using the keyboard, or the mouse wheel, or any other method supported by the selected pager. Examples of good pagers are less, more, and lv (not installed by default on many distributions). To use less, run the following command:

MariaDB [(none)]> \P less
PAGER set to 'less'

The following queries will be seen with less. To disable the pager, run the following command:

MariaDB [(none)]> \P
Default pager wasn't set, using stdout.

Sometimes an output is long, but the user is only interested in a few rows, or even one row. In this case, it is possible to use the grep command as a pager with an option. The following example shows how to run the SHOW ENGINE InnoDB STATUS administrative statement, and get the rows that show the thread's status (the ones containing the string 'I/O thread'):

MariaDB [performance_schema]> \P grep 'I/O thread'
PAGER set to 'grep 'I/O thread''
MariaDB [performance_schema]> SHOW ENGINE InnoDB STATUS \G
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
…

Another interesting option is to set the md5sum program as a pager. As a result, when a query is executed, the MD5 hash of the query will be shown. This is useful to compare the results of two queries, for example, to check that two tables are identical, shown as follows:

MariaDB [(none)]> \P md5sum
PAGER set to 'md5sum'
MariaDB [(none)]> SELECT * FROM test.t1;
3ec930f74d6ec7d7bdd7aa8544440835  -
MariaDB [(none)]> SELECT * FROM test.t2;
3ec930f74d6ec7d7bdd7aa8544440835  -

In the preceding example, the queries are passed to md5sum, and their MD5 values appear in the command line. Since the values are identical, we can be reasonably sure that the queried tables (t1 and t2) are identical.

The \tee command can be used to log the current client session into a text file. On Windows, this can be used to save long outputs to a file and open it with a text editor, since the \P command does not work. To stop the logging, the \notee command can be used.

SQL warnings are not printed on the command prompt by default; only a warning count is showed. This can be a problem because warnings often indicate that a statement did not work as expected. To see all the warnings, the \W (uppercase) client command can be used. To suppress all the warnings and obtain a cleaner output, the \w (lowercase) command is used:

MariaDB [(none)]> \W
Show warnings enabled.
MariaDB [(none)]> SELECT 1/0;
+------+
| 1/0  |
+------+
| NULL |
+------+
1 row in set, 1 warning (0.00 sec)
Warning (Code 1365): Division by 0
MariaDB [(none)]> \w
Show warnings disabled.
MariaDB [(none)]> SELECT 1/0;
+------+
| 1/0  |
+------+
| NULL |
+------+
1 row in set, 1 warning (0.00 sec)

In the preceding example, we first enable the printing of warnings. The following SELECT query generates a warning. Then, we disable the printing of warnings. The same query does not show a warning anymore, but a warning count is still seen.

Sometimes, while using the command line, one needs to write a complex query. Using a good editor will be convenient. On Linux, it is possible to switch to an editor such as vi or Emacs by typing the edit command. The editor specified in the EDITOR environment variable is used. When the user exits the editor, the statement he/she wrote will appear in the command line.

In MariaDB 10.0, it is possible to stop the server from the command line without exiting or opening a new console to call mysqladmin. The SQL command to stop the server is SHUTDOWN. Unlike most administrative statements that require the SUPER privilege, this command requires the SHUTDOWN privilege. Normally, only the root user has these privileges. The client command to exit the client is \q. The following example shows how to terminate both the server and the client:

MariaDB [(none)]> SHUTDOWN;
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> \q
Bye

The mysql client can also be used to execute a batch file, that is, a text file containing a list of SQL statements. This can be done to restore a logical backup or to create a database required by an application. The results of the execution can be written into a text file. This is done using a Unix-like syntax, which works on all systems (including Windows) for the mysql client, shown as follows:

mysql < input_file > output_file

To quickly execute a single statement and see the results, it is not necessary to run the entire program. It is possible to use only one simple invocation:

federico@this:/usr/local/mysql/bin$ ./mysql -e "SELECT version();"	
+--------------------+
| version()          |
+--------------------+
| 10.0.5-MariaDB-log |
+--------------------+

Storage engines

As explained in the previous section, storage engines implement data handling at the physical level. They handle the data files, the data, and the index caches if they exist, and whatever is necessary to efficiently manage and read the data.

The .frm files are an exception. For each table, one .frm file exists. These files contain the definition of the table, and are created and used by the server.

Using the SHOW ENGINES statement or querying the information_schema.ENGINES table, it is possible to see the available storage engines. The following output is obtained with a standard MariaDB 10.0.6 installation:

MariaDB [(none)]> SELECT ENGINE, SUPPORT FROM information_schema.ENGINES \G
*************************** 1. row ***************************
 ENGINE: FEDERATED
SUPPORT: YES
*************************** 2. row ***************************
 ENGINE: MRG_MyISAM
SUPPORT: YES
*************************** 3. row ***************************
 ENGINE: CSV
SUPPORT: YES
*************************** 4. row ***************************
 ENGINE: BLACKHOLE
SUPPORT: YES
*************************** 5. row ***************************
 ENGINE: MEMORY
SUPPORT: YES
*************************** 6. row ***************************
 ENGINE: MyISAM
SUPPORT: YES
*************************** 7. row ***************************
 ENGINE: ARCHIVE
SUPPORT: YES
*************************** 8. row ***************************
 ENGINE: InnoDB
SUPPORT: DEFAULT
*************************** 9. row ***************************
 ENGINE: PERFORMANCE_SCHEMA
SUPPORT: YES
*************************** 10. row ***************************
 ENGINE: Aria
SUPPORT: YES

A list of available engines will be displayed along with a SUPPORT column that indicates whether the engine is available.

When a table is created, a storage engine should be specified. If not, the default storage engine will be used. The default storage engine is specified in the storage_engine system variable, as showed in the following example:

MariaDB [(none)]> SELECT @@global.storage_engine;
+-------------------------+
| @@global.storage_engine |
+-------------------------+
| InnoDB                  |
+-------------------------+

The TABLES table in the information_schema database has a column called ENGINE, which can be read to check which storage engine is used for a particular table, shown as follows:

MariaDB [(none)]> SELECT ENGINE FROM information_schema.TABLES WHERE TABLE_SCHEMA='test' AND TABLE_NAME='t1';
+--------+
| ENGINE |
+--------+
| InnoDB |
+--------+

A brief description of the available storage engine follows the preceding code.

XtraDB and InnoDB

InnoDB became the default engine with MariaDB 5.5 and MySQL 5.5. Percona maintains an InnoDB fork called XtraDB; it is InnoDB with bug fixes applied by Percona, and some unique features (mainly for performance and monitoring). By default, MariaDB uses XtraDB. For compatibility with InnoDB and MySQL, the commands still mention InnoDB but the XtraDB fork is used instead. However, this behavior can be changed by compiling the server with InnoDB, instead of XtraDB. This is not necessary since any new code can come with new bugs or unexpected performance problems, and XtraDB is not an exception.

Note

In this book, the default engine will generally be called InnoDB. In some cases, the XtraDB name will be used, to indicate that we are talking about a feature that is not supported by the mainstream InnoDB.

InnoDB is a high-performance, general-purpose storage engine that supports transactions with savepoints, XA transactions, and foreign keys. Savepoints are intermediate states that can be saved in the middle of a transaction and can then be restored if necessary. XA is a special type of transaction designed for operations that involve multiple resources, not necessarily SQL databases. In most cases, InnoDB performance is better than other engines. For this reason, this book will focus on XtraDB, which will be used for examples where another engine is not explicitly specified. For simplicity, XtraDB will be generally called InnoDB, except when describing features that are not supported by InnoDB.

InnoDB transactions are implemented via a complex locking system and undo logs. Each lock involves a single row or a range of rows; rows are identified using index records. Undo logs are used to rollback transactions when necessary, and can be stored in the system tablespace or elsewhere.

TokuDB

This storage engine is developed by Tokutek and has been included in MariaDB since Version 5.5, though it must be installed and enabled separately. It supports transactions with savepoints, XA transactions, but not foreign keys and full-text indexes. It is very different from InnoDB. Its main peculiarity is the use of a new data structure for indexes: the fractal trees. They are very similar to the commonly used B-trees, but each node has a buffer. This buffer contains the changes that need to be applied to the nodes that are more in-depth. Only when the buffer is full are the changes applied altogether. If the changes need to be written to disk, this is an important optimization, because writing fewer and bigger blocks is usually much faster. Also, this is not a problem with fractal trees fragmentation.

Another important feature of TokuDB is data compression. Of course, its compression level depends on the dataset, but it is generally much higher than the one provided by other storage engines. This happens because the write operations are grouped together. Data compression is always used in TokuDB and cannot be disabled.

Fractal trees and compression make TokuDB suitable to work with datasets that are too big to be entirely stored in memory. For such workloads, TokuDB can be faster than InnoDB. For most purposes, TokuDB offers reduced performance and has fewer features.

MyISAM and Aria

MyISAM was historically the default storage engine for MySQL and MariaDB, before Version 5.5. It is a relatively simple engine, optimized for read-heavy workloads where there are just a few writes or no writes at all. In practice, MyISAM is good for data warehousing and more generally for data reporting where data can be appended to tables, but not modified or deleted.

MyISAM writes two files for each table: a data file and an index file. The index file can always be rebuilt if it gets damaged for some reason. Copying data files (and the .frm files), even across different machines, is sufficient to back up and restore MyISAM tables.

Three data formats are available: FIXED, DYNAMIC, and COMPRESSED. The FIXED data format assigns a fixed length to columns, while DYNAMIC saves space when possible. The FIXED data format is faster, more reliable, and harder to fragment. The COMPRESSED data format is used to create small read-only tables.

Aria is designed to be MyISAM's successor. It uses logs that allow data recovery after a crash. Data changes are atomic in Aria; they are applied entirely, or the table is damaged. Aria uses a different data format called PAGE that is generally faster and never fragments too much, but it is possible to use the FIXED or DYNAMIC formats for compatibility with MyISAM (where the table will not be crash-safe).

Aria can be better than MyISAM in environments where there is concurrency, and the MariaDB Knowledge Base suggests using Aria for new applications. Yet, users should be aware that bulk writes are slower in Aria, particularly where duplicate indexed values exist.

Both MyISAM and Aria do not support transactions and foreign keys, but as explained previously, each statement on an Aria table can be considered a transaction. Even full-text indexes are supported by MyISAM and Aria.

The MRG_MyISAM storage engine, also called MERGE, can be used to build a table on multiple MyISAM identical tables, to work around the file size limit of the operating system.

Other engines

The storage engines described up to this point are of general purpose, even if some of them are only suitable for some particular workloads. Other storage engines use non-standard input or output methods, or process queries in a non-standard way, and thus are used for very specific purposes described as follows:

The OQGRAPH storage engine is developed by OpenQuery. It is meant to handle tree and graph data structures. Trees can be handled in several ways in SQL databases but, whichever method is used, there are some drawbacks because the relational theory does not suit tree structures. OQGRAPH solves this problem by translating SQL queries into tree-specific requests. OQGRAPH was introduced in MariaDB 5.2, temporarily disabled in 5.5, and then reintroduced in MariaDB 10.
The BLACKHOLE storage engine is inherited from MySQL. BLACKHOLE tables are always empty. Modifications have no effect on them and queries always return an empty result set.
The SPIDER storage engine is developed by Kentoku Shiba. It reads and writes data into other instances of MariaDB. XA transactions are supported. SPIDER has been designed for data sharding and will be discussed in more detail in Chapter 11, Data Sharding.
The CONNECT storage engine is a MariaDB-specific storage engine that allows reading and writing data from and to external sources. The data sources can be MariaDB or MySQL connections, ODBC connections, files, and directories. Files can use several formats, including but not limited to CSV, HTML tables, and binary data. An API exists to develop additional formats. Data can also be compressed with the gzip format. A CONNECT table can also be used to transform data contained in other tables, for example, to merge tables or reorganize data into a pivot table. This storage engine will probably obsolete some older storage engines: CSV, which accesses the CSV files; FEDERATED, an engine inherited from MySQL, which can access tables from other MariaDB or MySQL instances; and FEDERATEDX, added in MariaDB because FEDERATED was no longer maintained.
The ARCHIVE storage engine handles compressed tables. It has several limitations, such as the inability to modify or delete data after an insertion, and is quite slow. Nowadays, compressed InnoDB, MyISAM, or TokuDB tables are always preferable.
The CassandraSE storage engine connects to the Apache Cassandra NoSQL server to read and write data. It converts MariaDB's data types and logic into Cassandra and vice versa. It is a MariaDB-specific storage engine because it uses MariaDB's dynamic columns to emulate Cassandra's column families.
The SphinxSE storage engine is used to allow MariaDB to access a table that is stored in the Sphinx database server. Sphinx is mainly used and known for its good full-text searches.
The mroonga storage engine is specifically designed for full-text searches. These involve the Japanese, Chinese, or Korean character sets and languages. It also includes fast geometric indexes for geolocation.
The SEQUENCE storage engine cannot be used to physically create a table. If it is enabled, queries can involve virtual tables whose names follow a certain pattern. Based on the name, the SEQUENCE storage engine returns an integer series. For example, the seq_1_to_10 virtual table returns a result set with numbers from 1 to 10. The seq_1_to_10_step_2 virtual table returns a similar series, but with an increment of 2.
The performance_schema storage engine is only used internally for the tables in the performance_schema databases. The only reason why a database administrator (DBA) should be aware of it is that a specific statement exists to check how much memory is consumed by the performance_schema by using the SHOW ENGINE performance_schema STATUS command.

Logs

A MariaDB server maintains the following logs:

Error log: This log contains the error occurred during the server execution. This includes both server problems (such as errors that stop a plugin from starting) and SQL errors.
SQL_ERROR_LOG: This is a plugin introduced in MariaDB 5.5 that logs the errors generated by the SQL statements into a file. This is more specific than the error log, because it only logs SQL errors. Using this plugin is the easiest way to see the errors that occur in a stored routine or trigger.
General query log: SQL statements are logged into this file.
Slow query logs: This log can be configured to store the queries that take more than a given amount of time or do not use any index. It is useful for finding out why an application or database is slow.
Binary log (binlog): Depending on the chosen format, this log contains data that is changed to a binary form, or the SQL statements that caused the change. It is necessary for implementing incremental backups, replication, or a database cluster.
Relay log: This log only exists on replication slaves and it contains the data received by the master. Each entry in a slave's relay log matches an entry in a master's binary log.

InnoDB also has two logs named undo log and redo log. The undo log is used to keep track of the changes performed by the active transactions and roll them back if necessary. The redo log tracks data of the requested data changes and is used to recover tables after a crash.

Aria has a log (the Aria log) that contains the data not applied to the data files, and is used at startup to recover tables that were not closed properly. Changes to MyISAM tables are stored in the MyISAM log.

Each log consists of a set of files, stored in the installation directory and in the data directory, or in a different location determined by the user. However, some logs can be written in the system tables, which are located in the mysql database. The write process is slower in this case, but this allows querying such logs using SQL statements. Also, the CSV storage engine can be used, which allows you to import the logs into external programs using a well-known format.

Since logs are written very often, they have a buffer to improve performances (writing data in chunks has an overhead, which can be reduced by writing data together). Of course, logs are more reliable if they are written more often. Some variables control the use of the buffer, and the DBA can adjust them according to the need for reliability and speed.

Logs also need to be periodically rotated, which means that the new entries will be written in a new file, and the oldest file will probably be removed. The rotation can be automatic (for the binary log), or can be requested by the user via the FLUSH LOGS statement or the mysqladmin utility.

For each log that the user may need to read, there is a utility to show its contents. The log rotation can be done via the mysql-log-rotate script on Red Hat Linux.

MariaDB caches

MariaDB has several caches that can be adjusted using system variables and start-up options to adapt them to the specific workload. Usually, only a few caches should be regulated. By changing just a few options, the overall performances might greatly change. Other caches solve more specific problems.

The InnoDB buffer pool is usually the most important cache. It contains the data and keys of the InnoDB tables. On a dedicated server, usually the buffer pool should be at least 70 percent of the available RAM. Of course, this percentage is purely indicative: the optimal value depends on a wide variety of factors. The buffer pool has two sublists: the new list and the old list. It is possible to set the sublist sizes, as well as a minimum age the data pages must have before populating the new list. These settings determine how often a recent read data populates the new list, or remains in the old list until it is evicted. To improve concurrency, more instances of the buffer pool can be used. Different instances never contain the same data.

The change buffer, an area of the buffer pool, stores the data changes that are not yet flushed to disks. For write-intensive workloads, the percentage of the buffer pool occupied by the change buffer can be increased; for read-heavy workloads, the change buffer can be decreased or even disabled. It can also be configured to store only some types of changes, which is useful for some workloads.

MyISAM uses a buffer called key cache. It does not store data; it stores only indexes. More instances of the key cache can be created and individually configured.

Aria uses a cache called page cache that is similar to MyISAM's key cache. The Aria page cache is faster for data of a fixed length. Currently, Aria does not support multiple instances of this cache.

If MyISAM or Aria is mainly used, the key cache or the page cache should ideally be as large as your frequently accessed indexed data.

The table opens the cache and stores the handles for the physical table files. MyISAM and Aria use two files for each table (because indexes and data are stored separately). This cache reduces the file access overhead.

The host cache contains the association between the IP addresses and the hostnames of the clients that are connected to the server, and when the account is blocked.

InnoDB data structures

In MariaDB, by default, InnoDB is mapped to XtraDB, a compatible InnoDB fork maintained by Percona.

InnoDB tables are contained in tablespaces. A tablespace is a file that contains data and indexes for one or more tables. In old MariaDB and MySQL versions, all the tables are created in a system tablespace. If the innodb_file_per_table system variable is set to 1, which is the default since the 10.0 Version, each table is stored in a separate tablespace. This variable is dynamic, so it is possible to store some tables in separate files, and others in the system tablespace.

The system tablespace, by default, also contains InnoDB's data dictionary, the undo logs, the change buffer, and the doublewrite buffer. The data dictionary is a metadata collection of all InnoDB tables, columns, and indexes. The system tablespace is stored in the data directory, in the ibdata files (by default, two files).

A portion of a tablespace is called a segment. Regular tablespaces have one segment for data and one segment for each index. The system tablespace has several segments.

A page is a small data unit stored in a tablespace or in the buffer pool. Pages have a fixed size that can be configured. A page contains one or two rows and usually some empty space. The non-empty space ratio is called the fill factor.

A page that has been modified in the change buffer is called a dirty page.

In some cases, for example for a consistent read process, InnoDB sequentially reads several pages together, with a total size of 1 MB. Such groups of pages are called extents.

InnoDB indexes are important not only for reads, but also for locks. Each lock points to an index record.

An InnoDB index can be a clustered index or a secondary index. Primary keys are clustered indexes. If a table does not have a primary key, the first UNIQUE index, which only contains NOT NULL columns, will be used as a primary key. If no such index exists, a hidden cluster index is automatically created. All secondary index records point to a clustered index record, so we can say that all secondary indexes contain the clustered index.

Authentication and security

MariaDB authentication is based on a username, a password, and the client's hostname (or its IP address). The username and the hostname form the account, for example:

user_01@localhost

Each user can be authenticated by a different plugin. This is helpful when using external login systems, for example, operating system users. MariaDB or a plugin checks the password provided by the client and accepts or rejects the connection.

Permissions can be assigned to individual accounts or to accounts that match a pattern. Patterns are specified using the syntax for the LIKE operator. Several permissions exist. Each of them allows executing a single statement type, or a limited set of statements. Permissions can be applied to the whole server, to databases and the object they contain (tables and stored procedures), to individual objects, or even to individual columns in a table or view. This allows great granularity and flexibility when deciding what actions can be performed on what objects, and who can perform those actions.

MariaDB 10 also supports roles. Permissions can be granted to roles instead of accounts. Roles are assigned to accounts. If a user has a role assigned, he can use that role and perform all the actions whose permissions the role has to execute. Roles improve permission management in systems with many users, where a good security policy is required.

Additional options are available. For example, the DBA can require a user to always connect using SSL encryption. The DBA can also limit the resources used by a user, or can decide whether a user can use multiple connections simultaneously.

The information_schema database

The information_schema database (often called I_S for brevity) is a virtual database that contains informative tables. These tables can be divided into several groups:

Metadata tables: Tables such as SCHEMATA, TABLES, and COLUMNS contain information about the structure of databases, tables, columns, and so on.
Status and variables tables: The GLOBAL_VARIABLES and SESSION_VARIABLES tables list the values of the server's system variables. The GLOBAL_STATUS and SESSION_STATUS tables provide information about the operations performed by the server.
Privilege tables: The tables whose names end with _PRIVILEGES indicate users that have various permissions on objects.
The PROFILING table: This table can be used to monitor the queries executed during the current session, and see which low-level operations are performed by the server.
The PROCESSLIST table: This table shows the active sessions and their status.

Several tables provide information about InnoDB. Some of them are XtraDB-specific. These table names begin with INNODB_ or XTRADB_ if they only exist for XtraDB, discussed as follows:

InnoDB locks tables: The INNODB_LOCKS, INNODB_LOCK_WAITS, and INNODB_TRX tables contain information about active locks, waits, and transactions that acquired a lock or are waiting for a lock, respectively.
InnoDB buffer pool tables: Tables whose names start with INNODB_BUFFER_ are the buffer pool contents and page usage.
The INNODB_METRICS table: This table provides information about some low-level operations performed by InnoDB.
InnoDB compression tables: Tables whose names start with INNODB_CMP provide information about the performance of compressed pages.
InnoDB full-text tables: Tables whose names start with INNODB_FT_ provide information about full-text indexes in InnoDB tables.
InnoDB data dictionary tables: Tables whose names start with INNODB_SYS_ provide metadata about InnoDB tables, columns, and foreign keys. They are similar to the more generic tables that contain metadata, but these tables are specific to InnoDB. They also contain statistics and information about files.

Generally, the information that can be read from information_schema can also be obtained with the SHOW statements and vice versa. Querying information_schema is a more flexible and standard way to retrieve such information, but is also more verbose.

Information on the InnoDB activities can also be obtained in a human-readable form via the SHOW ENGINE InnoDB STATUS and SHOW ENGINE InnoDB MUTEX statements.

To answer the queries of information_schema, the server opens and reads the database files, which can be a slow operation. For this reason, the queries that are often executed on a production server should be optimized to only read the necessary files. This can usually be done with a good WHERE clause.

The performance_schema database

In the most relevant parts of MariaDB code, instrumentations can be found that allow detailed performance monitoring. The results of such monitoring are written into a special database called performance_schema. Since the monitoring activity sensibly slows down the server performance, it is possible to disable it in the configuration file, by setting the performance_schema variable to 0.

The performance_schema variable is based on the following concepts:

Actors: An actor is a thread that is currently monitored. It can be a user connection or a background MariaDB thread.
Consumers: Consumers are tables that are populated with performance data.
Instruments: These are used in instrumented MariaDB activities such as knowing the server's internals where the instruments names are intuitive. For example, wait/io/file/sql/binlog is a wait to acquire a lock on the binary log.
Objects. These are the tables whose activities must be monitored.

To determine what the server must monitor, the performance_schema setup tables can be modified: setup_actors, setup_consumers, setup_instruments, and setup_objects. When a low-level operation takes place and performance_schema is enabled, if the involved actor, consumer, instrument, and object is monitored, new information is written into the performance_schema. A setup_timer table determines the granularity of the timers that are used to monitor various events (microseconds, nanoseconds, and so on).

The performance_schema setup table consists of several tables. However, the names of the most important ones follow a pattern, based on a prefix and suffix. The prefix indicates what type of information the table provides. The most important prefixes are:

events_statements_: This means that the table refers to SQL statements.
events_stages_: This means that the table refers to the stages of a SQL statement execution (such as parsing and table opening).
*_instances_: This means that the table refers to a certain type of lock. For example, mutex_instances_ refer to mutexes.
events_waits_: This means that the table refers to threads that are waiting for a lock to be released.

The suffix indicates how the information is aggregated, shown as follows:

_current: This means that only the current server activities are in the table
_history: This means that some limited historical information is stored
_history_long: This means that more historical information is present

Other suffixes exist, but are self-explanatory.

For example, the events_waits_current table lists the threads that are currently waiting for an event. The events_statements_history table shows information about the recently executed statements.

Compatibility with MySQL and other DBMS

Each MariaDB tree uses a MySQL tree as a codebase. For example, MariaDB 5.5 is based on MySQL 5.5. When the MySQL tree is updated, MariaDB imports the bug fixes and new features. MariaDB should be fully compatible with the corresponding MySQL tree. This means that all the SQL statements, API calls, and configuration settings that work with MySQL will produce the same results on MariaDB. If an undocumented compatibility is found, it is treated as a bug. Of course, MariaDB develops new features on top of the MySQL codebase; thus what works with MariaDB will not work on MySQL if the MariaDB-specific features are used.

In a replication environment, it is safe to replicate MySQL on a compatible version of MariaDB. The opposite is only safe if the queries do not use MariaDB-specific features.

MariaDB also imports several features from the Percona Server, which also uses MySQL as a codebase. This means that programs that use features specific to the Percona Server can work with the corresponding tree of MariaDB.

The following table shows the correspondence between MySQL and MariaDB trees:

MariaDB tree	MySQL tree
5.1	5.1
5.2	5.1
5.3	5.1
5.5	5.5
10.0	5.5, partly 5.6

Each MariaDB tree till 5.5 is compatible with the MySQL tree having the same number, or (if such a tree does not exist) with the latest version having a lower number. MariaDB 10.0 breaks this pattern because it just implements a part of the MySQL 5.6 features; thus, it is not fully compatible. The complete list of incompatibilities is available in MariaDB Knowledge Base. Most users should not be affected, unless they want to use MySQL 5.6 and MariaDB 10.0 in the same replication environment. Also, MySQL 5.6 allows InnoDB tables to be used as a bridge to memcache, but this feature is not currently available in MariaDB.

MariaDB and MySQL use a syntax called executable comments to improve compatibility with other DBMS. Executable comments can be used to execute a part of a SQL statement on MySQL and MariaDB, but not on other DBMS; or just on MariaDB but not on MySQL; or again, on recent versions of MariaDB but not on older versions.

The most generic executable comment allows a part of a query to be executed on MariaDB and MySQL, shown as follows:

SELECT 1 /*! , VERSION() */;

By adding M, the comment will only be executed on MariaDB:

SELECT 1 /*M! , 'You are using MariaDB!' */;

It is possible to specify a minimum version number. This number must consist of five or six digits in the following form: the first number or the first two numbers are the major version, the following two numbers are the minor version, and the final two numbers are the patch number. For example:

SELECT 1
  /*!50510 , 'MySQL 5.5.10 or newer' */
  /*M!100006 , 'MariaDB 10.0.6 or newer' */;

Note that MariaDB 10.0 always executes executable comments for MySQL 5.6, which is mostly compatible, but it ignores executable comments for MySQL 5.7.

Also, the SQL_MODE system variable has some flags that make the general syntax more compatible with older versions of MySQL, or with other database systems.

MariaDB resources

The primary source of documentation is the MariaDB Knowledge Base, also called KB, that contains information about related open source tools. It is also a good place to ask technical questions about MariaDB and the community will answer. The MariaDB KB can be found at:

https://mariadb.com/kb/en/

The MariaDB Foundation has a blog that allows users to stay updated with new releases and other important news. It can be found at:

https://mariadb.org/

Planet MariaDB aggregates blog posts related to MariaDB at:

http://planetmariadb.org/

MariaDB's project planning tool JIRA can be used to report bugs, browse information on known bugs and their fixes, and to know when new releases are scheduled and which bug fixes and features they will contain. It can be found at:

https://mariadb.atlassian.net/browse/MDEV

Summary

In this chapter, we reviewed the general MariaDB architecture. Many features of the mysql client were explained. They increase productivity of the DBA and reduce the need for a GUI.

We discussed the storage engines that are included in MariaDB's binaries or sources. An overview of InnoDB, TokuDB, MyISAM, and Aria was presented. InnoDB will be used in this book and in most practical cases. For this reason, special attention has been dedicated to this engine and its data structures. However, uncommon use cases could benefit from other engines, for one reason or another. Also, a brief description of all the minor storage engines was presented. These engines can be used to solve specific problems. For example, Spider will be discussed in a later chapter, while the chapter about replication will show how BLACKHOLE can be used to avoid replicating some data.

The logs used by the server were briefly introduced. Some of them are essential to use some MariaDB features, such as physical backups, replication, or recovering after a DELETE statement that erased too much data. The details about how to use, configure, and maintain the logs will be explained in the later chapters, when the logs will be used for practical purposes.

The most important concepts about caches and security were reviewed. More details will be discussed in the chapters entirely dedicated to these topics.

The INFORMATION_SCHEMA and PERFORMANCE_SCHEMA databases contain a lot of useful information. This book does not explain all the tables in detail, because exhaustive information about them can be found in the MariaDB's KB. However, in the later chapters, some of these tables will be explained and used to demonstrate how they can be of help for practical purposes.

Since MariaDB is a fork of MySQL, the compatibility between different versions of MariaDB and MySQL was discussed. This topic is important while gradually replacing MySQL servers with MariaDB, while using a replication environment where MariaDB and MySQL coexist, or while developing an application that must work with both the DBMS (particularly if the developers want to take advantage of MariaDB's unique features).

Finally, this chapter presents a list of the most useful MariaDB resources. All advanced users should check them regularly to keep themselves updated with new releases and never stop learning!

In the next chapter, we will learn how to use some logs to find the errors that occurred in MariaDB, and how to debug SQL statements. This information is important for troubleshooting, and we will use it to deal with the more complex topics that will be discussed in the following chapters.

About the Author

Federico Razzoli

Federico Razzoli is a software developer, database consultant, and free software supporter. He has been working on websites and database applications since 2000 and used MySQL and other relational databases extensively during this period. He is now a MariaDB Ambassador.
Browse publications by this author

Sin problema ninguno en el proceso de compra.

one star. Stupid to require more.