MySQL Cluster Management : Part 2

There’s more than one way to achieve high availability for MySQL and this Cookbook covers a range of techniques and tools in over 60 practical recipes. The only book of its kind, you’ll be learning the natural, engaging way.

Replication between clusters with a backup channel

The previous recipe showed how to connect a MySQL Cluster to another MySQL server or another MySQL Cluster using a single replication channel. Obviously, this means that this replication channel has a single point of failure (if either of the two replication agents {machines} fail, the channel goes down).

If you are designing your disaster recovery plan to rely on MySQL Cluster replication, then you are likely to want more reliability than that. One simple thing that we can do is run multiple replication channels between two clusters. With this setup, in the event of a replication channel failing, a single command can be executed on one of the backup channel slaves to continue the channel.

It is not currently possible to automate this process (at least, not without scripting it yourself). The idea is that with a second channel ready and good monitoring of the primary channel, you can quickly bring up the replication channel in the case of failure, which means significantly less time spent with the replication channel down.

How to do it…

Setting up this process is not vastly different, however, it is vital to ensure that both channels are not running at any one time, or the data at the slave site will become a mess and the replication will stop. To guarantee this, the first step is to add the following to the mysqld section of /etc/my.cnf on all slave MySQL Servers (of which there are likely to be two):


Once added, restart mysqld. This my.cnf parameter prevents the MySQL Server from automatically starting the slave process. You should start one of the channels (normally, whichever channel you decide will be your master) normally, while following the steps in the previous recipe.

To configure the second slave, follow the instructions in the previous recipe, but stop just prior to the CHANGE MASTER TO step on the second (backup) slave.

If you configure two replication channels simultaneously (that is, forget to stop the existing replication channel when testing the backup), you will end up with a broken setup. Do not proceed to run CHANGE MASTER TO on the backup slave unless the primary channel is not operating.

As soon as the primary communication channel fails, you should execute the following command on any one of the SQL nodes in your slave (destination) cluster and record the result:

[slave] mysql> SELECT MAX(epoch) FROM mysql.ndb_apply_status;
| MAX(epoch) |
| 5952824672272 |
1 row in set (0.00 sec)

The previous highlighted number is the ID of the most recent global checkpoint, which is run every couple of seconds on all storage nodes in the master cluster and as a result, all the REDO logs are synced to disk. Checking this number on a SQL node in the slave cluster tells you what the last global checkpoint that made it to the slave cluster was.

You can run a similar command SELECT MAX(epoch) FROM mysql.ndb_binlog_index on any SQL node in the master (source) cluster to find out what the most recent global checkpoint on the master cluster is. Clearly, if your replication channel goes down, then these two numbers will diverge quickly.

Use this number (5952824672272 in our example) to find the correct logfile and position that you should connect to. You can do this by executing the following command on any SQL node in the master (source) cluster that you plan to make the new master, ensuring that you substitute the output of the previous command with the correct number as an epoch field as follows:

mysql> SELECT
-> File,
-> Position
-> FROM mysql.ndb_binlog_index
-> WHERE epoch > 5952824672272
-> ORDER BY epoch ASC LIMIT 1;
| File | Position |
| ./node2-bin.000003 | 200998 |
1 row in set (0.00 sec)

If this returns NULL, firstly, ensure that there is some activity in your cluster since the failure (if you are using batched updates, then there should be 32 KB of updates or more) and secondly, ensure that there is no active replication channel between the nodes (that is, ensure the primary channel has really failed).

Using the filename and position mentioned previously, run the following command on the backup slave:

It is critical that you run these commands on the correct node. The previous command, from which you get the filename and position, must be run on the new master (this is in the "source" cluster). The following command, which tells the new slave which master to connect to and its relevant position and filename, must be executed on the new slave (this is the "destination" cluster). While it is technically possible to connect the old slave to a new master or vice versa, this configuration is not recommended by MySQL and should not be used.

If all is okay, then the highlighted rows in the preceding output will show that the slave thread is running and waiting for the master to send an event.


MASTER_LOG_FILE='node2-bin.000003', MASTER_LOG_POS=200998;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_User: slave
Master_Port: 3306
Relay_Master_Log_File: node2-bin.000003
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 233

After a while, the Seconds_Behind_Master value should return to 0 (if the primary replication channel has been down for some time or if the master cluster has a very high write rate, then this may take some time)

There's more…

It is possible to increase the performance of MySQL Cluster replication by enabling batched updates. This can be accomplished by starting slave mysqld processes with the slave-allow-batching option (or add the slave-allow-batching option line to the [mysqld] section in my.cnf). This has the effect of applying updates in 32 KB batches rather than as soon as they are received, which generally results in lower CPU usage and higher throughput (particularly when the mean update size is low).

See also

To know more about Replication Compatibility Between MySQL Versions visit:

User-defined partitioning

MySQL Cluster vertically partitions data, based on the primary key, unless you configure it otherwise. The main aim of user-defined partitioning is to increase performance by grouping data likely to be involved in common queries onto a single node, thus reducing network traffic between nodes while satisfying queries. In this recipe, we will show how to define our own partitioning functions.

If the NoOfReplicas in the global cluster configuration file is equal to the number of storage nodes, then each storage node contains a complete copy of the cluster data and there is no partitioning involved. Partitioning is only involved when there are more storage nodes than replicas.

Getting ready

Look at the City table in the world dataset; there are two integer fields (ID and Population). MySQL Cluster will choose ID as the default partitioning scheme as follows:

mysql> desc City;
| Field | Type | Null | Key | Default | Extra |
| ID | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| CountryCode | char(3) | NO | | | |
| District | char(20) | NO | | | |
| Population | int(11) | NO | | 0 | |
5 rows in set (0.00 sec)

Therefore, a query that searches for a specific ID will use only one partition. In the following example, partition p3 is used:

mysql> explain partitions select * from City where ID=1;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | City | p3 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
1 row in set (0.00 sec)

However, searching for a Population involves searching all partitions as follows:

mysql> explain partitions select * from City where Population=42;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | City | p0,p1,p2,p3 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where with pushed condition |
1 row in set (0.01 sec)

The first thing to do when considering user-defined partitioning is to decide if you can improve on the default partitioning scheme. In this case, if your application makes a lot of queries against this table specifying the City ID, it is unlikely that you can improve performance with user-defined partitioning. However, in case it makes a lot of queries by the Population and ID fields, it is likely that you can improve performance by switching the partitioning function from a hash of the primary key to a hash of the primary key and the Population field.

How to do it...

In this example, we are going to add the field Population to the partitioning function used by MySQL Cluster.

We will add this field to the primary key rather than solely using this field. This is because the City table has an auto-increment field on the ID field, and in MySQL Cluster, an auto-increment field must be part of the primary key.

Firstly, modify the primary key in the table to add the field that we will use to partition the table by:

Query OK, 4079 rows affected (2.61 sec)
Records: 4079 Duplicates: 0 Warnings: 0

Now, tell MySQL Cluster to use the Population field as a partitioning function as follows:

mysql> ALTER TABLE City partition by key (Population);
Query OK, 4079 rows affected (2.84 sec)
Records: 4079 Duplicates: 0 Warnings: 0

Now, verify that queries executed against this table only use one partition as follows:

mysql> explain partitions select * from City where Population=42;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | City | p3 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where with pushed condition |
1 row in set (0.01 sec)

Now, notice that queries against the old partitioning function, ID, use all partitions as follows:

mysql> explain partitions select * from City where ID=1;
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | City | p0,p1,p2,p3 | ref | PRIMARY | PRIMARY | 4 | const | 10 | |
1 row in set (0.00 sec)

Congratulations! You have now set up user-defined partitioning. Now, benchmark your application to see if you have gained an increase in performance.

There's more...

User-defined partitioning can be particularly useful where you have multiple tables and a join. For example, if you had a table of Areas within Cities consisting of an ID field (primary key, auto increment, and default partitioning field) and then a City ID, you would likely find an enormous number of queries that select all of the locations within a certain city and also select the relevant city row. It would therefore make sense to keep:

  • all of the rows with the same City value inside the Areas table together on one node
  • each of these groups of City values inside the Areas table on the same node as the relevant City row in the City table

This can be achieved by configuring both tables to use the City field as a partitioning function, as described earlier in the Population field.

Disk-based tables

It is possible to configure the data nodes in a MySQL Cluster to store most of their data on disk rather than in RAM. This can be useful where the amount of data to be stored is impossible to store in RAM (for example, due to financial constraints). However, disk-based tables clearly have significantly reduced performance as compared to memory tables.

Disk-based tables still store columns with indexes in RAM. Only columns without indexes are stored on disk. This can result in a large RAM requirement even for disk-based tables.

Getting ready

To configure disk-based tables, data nodes should have spare space on a high performance block device.

To configure disk-based tables, we must configure each data node with a set of two files as follows:

  • TABLESPACES—disk-based tables store their data in TABLESPACES, which are made up of one or more data files
  • Logfile groups—disk-based tables store their ndb data in a logfile group made up of one or more undo logfiles

Disk-based tables do not support variable length fields—these fields are stored as fixed-width fields (for example, VARCHAR(100) is stored as CHAR(100). This means that a disk-based NDB table that uses lots of variable-width fields will take up significantly more space than it would as compared to either an NDB in-memory table or a non-clustered storage engine format.

How to do it...

Firstly, check that you have sufficient storage on your storage nodes using a command such as df as follows:

[root@node1 ~]# df -h | grep mysql-cluster
2.0G 165M 1.8G 9% /var/lib/mysql-cluster
2.0G 68M 1.9G 4% /var/lib/mysql-cluster/BACKUPS

In this example, there is 1.8G space available in the Data Directory. For this example, using a small amount of test data, this is sufficient

Create a log file and undo file:

Query OK, 0 rows affected (4.99 sec)

These files are created, by default, in the subfolder ndb_nodeid_fs in DataDir on each storage node. However, it is possible to pass an absolute path to force the undo file (previous one) and data file (next step) to be created on another filesystem or use symbolic links. You can also specify an UNDO log size. See the There's more… section for an example.

Now, create a TABLESPACE using the CREATE TABLESPACE SQL command (you can execute this on any SQL node in the cluster):

mysql> CREATE TABLESPACE world_ts ADD DATAFILE 'world_data.dat' USE LOGFILE GROUP world_log INITIAL_SIZE=500M 

Query OK, 0 rows affected (8.80 sec)

Now, you can create disk-based tables as follows:

mysql> CREATE TABLE `City` (
-> `ID` int(11) NOT NULL auto_increment,
-> `Name` char(35) NOT NULL default '',
-> `CountryCode` char(3) NOT NULL default '',
-> `District` char(20) NOT NULL default '',
-> `Population` int(11) NOT NULL default '0',
-> )
Query OK, 0 rows affected (2.06 sec)

Note that in this example, the ID field will still be stored in memory (due to the primary key).

How it works...

Disk-based tables are stored in fixed-width fields with 4-byte aligned. You can view the files (both the tablespace and logfile group): If you want to view the logfiles, then the following query shows the active logfiles and their parameters:

| world_log | 25 | CLUSTER_NODE=2;UNDO_BUFFER_SIZE=8388608 |
| world_log | 25 | CLUSTER_NODE=3;UNDO_BUFFER_SIZE=8388608 |
| world_log | 25 | UNDO_BUFFER_SIZE=8388608 |
3 rows in set (0.00 sec)

If you want to view the data files, then execute the following query that shows you each data file, its size, and its free capacity:

mysql> SELECT
-> (TOTAL_EXTENTS * EXTENT_SIZE)/(1024*1024) AS 'Total MB',
-> (FREE_EXTENTS * EXTENT_SIZE)/(1024*1024) AS 'Free MB',
| FILE_NAME | Total MB | Free MB | EXTRA |
| world_undo.dat | 200.0000 | NULL | CLUSTER_NODE=2;UNDO_BUFFER_SIZE=8388608 |
| world_undo.dat | 200.0000 | NULL | CLUSTER_NODE=3;UNDO_BUFFER_SIZE=8388608 |
| NULL | NULL | 199.8711 | UNDO_BUFFER_SIZE=8388608 |
3 rows in set (0.00 sec)

This shows that 199.87 MB is unused in this data file, and the file exists on two storage nodes. Note that all data on disk is stored in fixed-width columns, 4-byte aligned. This can result in significantly larger data files than you may expect. You can estimate the disk storage required using the methods in the Calculating DataMemory and IndexMemory recipe later in this article.

There's more...

The CREATE LOGFILE GROUP command can have a custom UNDO buffer size passed to it. A larger UNDO_BUFFER_SIZE will result in higher performance, but the parameter is limited by the amount of system memory available (that is free).

To use this command, add the UNDO_BUFFER_SIZE parameter to the command:


Query OK, 0 rows affected (4.99 sec)

An existing data file may be removed by executing an ALTER TABLESPACE DROP DATAFILE command as follows:

Query OK, 0 rows affected (0.47 sec)

To delete a tablespace, use the DROP TABLESPACE statement:

Query OK, 0 rows affected (0.51 sec)

In the event that the tablespace is still used, you will get a slightly cryptic error. Before dropping a tablespace, you must remove any data files associated with it.

ERROR 1529 (HY000): Failed to drop TABLESPACE
| Level | Code | Message |
| Error | 1296 | Got error 768 'Cant drop filegroup, filegroup is used' from NDB |
| Error | 1529 | Failed to drop TABLESPACE |
2 rows in set (0.00 sec)

The performance of a MySQL Cluster that uses disk data storage can be improved significantly by placing the tablespace and logfile group on separate block devices. One way to do this is to pass absolute paths to the commands that create these files, while another is symbolic links in the data directory.

Using symbolic links create the following two symbolic links on each storage node, assuming that you have disk2 and disk3 mounted in /mnt/, substituting &ltNODEID> for the correct value as follows:

[root@node1 mysql-cluster]# ln -s /mnt/disk1 /var/lib/mysql-cluster/ndb_<NODEID>_fs/logs
[root@node1 mysql-cluster]# ln -s /mnt/disk2 /var/lib/mysql-cluster/ndb_<NODEID>_fs/data

Now, create the logfile group and tablespace inside these directories as follows:

Query OK, 0 rows affected (4.99 sec)
mysql> CREATE TABLESPACE world_ts ADD DATAFILE 'data/world_data.dat' USE LOGFILE GROUP world_log INITIAL_SIZE=500M

Query OK, 0 rows affected (8.80 sec)

You should note that performance is significantly improved as data files I/O operations will be on a different block device to the logs. If given the choice of different specification block devices, it is generally wiser to give the highest performance to the device hosting the UNDO log.

Calculating DataMemory and IndexMemory

Before a migration to a MySQL Cluster, it is likely that you will want to be sure that the resources available are sufficient to handle the proposed cluster. Generally, MySQL Clusters are more memory intensive than anything else, and this recipe explains how you can estimate your memory usage in advance.

The script that is used in this recipe,, is provided by MySQL Cluster in a cluster binary. In the See also section, an alternative and more accurate tool is mentioned. is excellent for estimates, but it is worth remembering that it is only an estimate based on, sometimes inaccurate, assumptions.

Getting ready

This recipe demonstrates how to estimate, from a table scheme or an existing non-clustered table, the memory-usage of that table in the NDB (MySQL Cluster) storage engine. We will use a script,, provided in the MySQL-Cluster-gpl-tools package.

To use this script, you will require the following:

  • A working installation of Perl.
  • The Perl DBI module (this can be installed with yum install perl-DBI, if the EPEL yum repository is installed, see Appendix A, Base Installation).
  • The Perl DBD::MySQL module. This does exist in the EPEL repository, but will not install if you have installed the cluster specific mysql RPM. See There's more... for instructions on how to install this on a clean install of RHEL5 with the storage node RPMs installed.
  • The perl-Class-MethodMaker package (yum install perl-Class-MethodMaker)
  • The tables that you wish to examine that are imported into a MySQL server to which you have access (this can be done using any storage engine).
  • A running MySQL server. The server instance does not require to provide support for MySQL Cluster as we are running this script on MyISAM and InnoDB tables before they have been converted.

How to do it...

In this example, we will run against the world database and go through the global output and the output for the City table.

Firstly, run the script with a username and password as follows:

[root@node1 ~]# world --user=root --password=secret --format=text

The script then confirms that it is running for the world database on the local host and includes information for MySQL Cluster 4.1, 5, and 5.1.

MySQL Cluster differs enormously between versions in the amount of DataMemory and IndexMemory used (in general, getting significantly more efficient with each release). In this recipe, we will only look at the output for version 5.1. It is the closest to MySQL Cluster version 7, which is the current version. report for database: 'world' (3 tables)
Connected to: DBI:mysql:host=localhost
Including information for versions: 4.1, 5.0, 5.1

There is now some output for some other tables (if you imported the whole world dataset), which is skipped as it is identical to the output for the City table.

The first part of the output of the City table shows the DataMemory required for each column (showing the number of bytes per row), ending with a summary of the memory requirement for both fixed-and variable-width columns (there are no variable-width columns in this table):

DataMemory for Columns (* means varsized DataMemory):
Column Name Type Varsized Key 4.1 5.0 5.1
ID int(11) PRI
4 4 4
District char(20) 20 20 20
Name char(35) 36 36 36
CountryCode char(3) 4 4 4
Population int(11) 4 4 4
-- -- --
Fixed Size Columns DM/Row
68 68 68
Varsize Columns DM/Row 0 0 0

So, this table has approximately 68 bytes DataMemory requirement per row. The next part of the output shows how much DataMemory is required for indexes. In this case, there is none because the only index is a primary key (which is stored in IndexMemory) as follows:

DataMemory for Indexes:
Index Name Type 4.1 5.0 5.1
-- -- --
Total Index DM/Row 0 0 0

The next part of the output shows the IndexMemory requirement per index as follows:

IndexMemory for Indexes:
Index Name 4.1 5.0 5.1
PRIMARY 29 16 16
-- -- --
Indexes IM/Row 29 16 16

Therefore, we can see that we require 16 bytes of IndexMemory per row.

The per-table output of concludes with a summary of total memory usage, and we can see the overall IndexMemory and DataMemory requirement for this table under MySQL Cluster 5.1:

Summary (for THIS table):
4.1 5.0 5.1
Fixed Overhead DM/Row 12 12 16
NULL Bytes/Row 0 0 0
DataMemory/Row 80 80 84 (Includes overhead, bitmap and indexes)
Varsize Overhead DM/Row 0 0 8
Varsize NULL Bytes/Row 0 0 0
Avg Varside DM/Row 0 0 0
No. Rows 4079 4079 4079
Rows/32kb DM Page 408 408 388
Fixedsize DataMemory (KB) 320 320 352
Rows/32kb Varsize DM Page 0 0 0
Varsize DataMemory (KB) 0 0 0
Rows/8kb IM Page 282 512 512
IndexMemory (KB) 120 64 64

The final part of the output aggregates all of the tables examined by the scripts and produces configuration parameter recommendations:

Parameter Minimum Requirements
* indicates greater than default
Parameter Default 4.1 5.0 5.1
DataMemory (KB) 81920 480 480 512
NoOfOrderedIndexes 128 3 3 3
NoOfTables 128 3 3 3
IndexMemory (KB) 18432 192 88 88
NoOfUniqueHashIndexes 64 0 0 0
NoOfAttributes 1000 24 24 24
NoOfTriggers 768 15 15 15

Remember that:

  • These parameters are only estimates
  • It is a very bad idea to run a cluster close to its limits on any of these parameters
  • This output does not include any temporary tables that may be created
  • However, at the same time, this output is useful to get a low end estimate of usage

There's more...

In this section, we explain in greater detail how to install the DBD::mysql Perl module and a couple of other options that can be passed to The easiest way to install DBD::mysql is from MCPAN with these commands:

  • Firstly, install a compiler as follows:
  • [root@node1 ~]# yum install gcc
  • Now, download the MySQL Cluster devel package as follows:
  • [root@node1 ~]# wget
  • Install the RPM as follows:
  • [root@node1 ~]# rpm -ivh MySQL-Cluster-gpl-devel-7.0.6-0.rhel5.
  • Create a database and add a user for the DBD::mysql module to use to test as follows:
  • mysql> create database test;
    Query OK, 1 row affected (0.21 sec)
    mysql> grant all privileges on test.* to 'root'@'localhost'
    identified by 's3kr1t';
    Query OK, 0 rows affected (0.00 sec)
  • Now, install the DBD::mysql Perl module from CPAN as follows
  • [root@node1 ~]# perl -MCPAN -e 'install DBD::mysql'

If this is the first time you have run this command, then you will have to first answer some questions (defaults are fine) and select your location to choose a mirror.

The following additional options can be passed to



--database=<db name>

ALL may be specified to examine all databases


Designate a specific host and port (defaults to localhost on port 3306)


Create either text or HTML output


Comma-separated list of table names to skip


Comma-separated list of database names to skip

See also

sizer— sizer is more accurate than because sizer calculates:

  • Correct record overheads
  • Cost for unique indexes
  • Averages storage costs for VAR* columns (user specified by either estimation (loadfactor) or actual data)
  • Cost for BLOB / TEXT

sizer is marginally more complicated to use and involves a couple of steps, but can sometimes be useful if accuracy is vital.


In this article series, we have learned:

  • Replication between MySQL Clusters with a backup channel
  • User-defined partitioning
  • Disk-based tables
  • Calculating DataMemory and IndexMemory


If you have read this article you may be interested to view :

Books to Consider

comments powered by Disqus

An Introduction to 3D Printing

Explore the future of manufacturing and design  - read our guide to 3d printing for free