Optimizing your MySQL Servers' performance using Indexes

Exclusive offer: get 50% off this eBook here
MySQL Admin Cookbook

MySQL Admin Cookbook — Save 50%

99 great recipes for mastering MySQL configuration and administration

$29.99    $15.00
by Daniel Schneller Udo Schwedt | June 2010 | MySQL Web Development

In this article series by Daniel Schneller and Udo Schwedt, authors of MySQL Admin Cookbook, we will cover:

  • Adding indexes to tables
  • Adding a fulltext index
  • Creating a normalized text search column
  • Removing indexes from tables
  • Estimating InnoDB index space requirements
  • Using prefix primary keys
  • Choosing InnoDB primary key columns
  • Speeding up searches for (sub)domains
  • Finding duplicate indexes

Introduction

One of the most important features of relational database management systems—MySQL being no exception—is the use of indexes to allow rapid and efficient access to the enormous amounts of data they keep safe for us. In this article, we will provide some useful recipes for you to get the most out of your databases.

Infinite storage, infinite expectations

We have got accustomed to nearly infinite storage space at our disposal—storing everything from music to movies to high resolution medical imagery, detailed geographical information,or just plain old business data. While we take it for granted that we hardly ever run out of space, we also expect to be able to locate and retrieve every bit of information we save in an instant. There are examples everywhere in our lives—business and personal:

  • Your pocket music player's library can easily contain tens of thousands of songs and yet can be browsed effortlessly by artist name or album title, or show you last week's top 10 rock songs.
  • Search engines provide thousands of results in milliseconds for any arbitrary search term or combination.
  • A line of business application can render your sales numbers charted and displayed on a map, grouped by sales district in real-time.

These are a few simple examples, yet for each of them huge amounts of data must be combed to quickly provide just the right subset to satisfy each request. Even with the immense speed of modern hardware, this is not a trivial task to do and requires some clever techniques.

Speed by redundancy

Indexes are based on the principle that searching in sorted data sets is way faster than searching in unsorted collections of records. So when MySQL is told to create an index on one or more columns, it copies these columns' contents and stores them in a sorted manner. The remaining columns are replaced by a reference to the original table with the unsorted data.

This combines two benefits—providing fast retrieval while maintaining reasonably efficient storage requirements. So, without wasting too much space this approach enables you to create several of those indexes (or indices, both are correct) at a relatively low cost.

However, there is a drawback to this as well: while reading data, indexes allow for immense speeds, especially in large databases; however, they do slow down writing operations. In the course of INSERTs, UPDATEs, and DELETEs, all indexes need to be updated in addition to the data table itself. This can place significant additional load on the server, slowing down all operations.

For this reason, keeping the number of indexes as low as possible is paramount, especially for the largest tables where they are most important. In this article, you'll find some recipes that will help you to decide how to define indexes and show you some pitfalls to avoid.

Storage engine differences

We will not go into much detail here about the differences between the MyISAM and the InnoDB storage engines offered by MySQL. However, regarding indexes there are some important differences to know between MySQL's two most important storage engines. They influence some decisions you will have to make.

MyISAM

In the figure below you can see a simplified schema of how indexes work with the MyISAM storage engine. Their most important property can be summed up as "all indexes are created equal". This means that there is no technical difference between the primary and secondary keys.

The diagram shows a single (theoretical) data table called books. It has three columns named isbn, title, and author. This is a very simple schema, but it is sufficient for explanation purposes. The exact definition can be found in the Adding indexes to tables recipe in this article. For now, it is not important.

MyISAM tables store information in the order it is inserted. In the example, there are three records representing a single book each. The ISBN number is declared as the primary key for this table. As you can see, the records are not ordered in the table itself—the ISBN numbers are out of what would be their lexical order. Let's assume they have been inserted by someone in this order.

Now, have a look at the first index—the PRIMARY KEY. The index is sorted by the isbn column. Associated with each index entry is a row pointer that leads to the actual data record in the books table. When looking up a specific ISBN number in the primary key index, the database server follows the row pointer to retrieve the remaining data fields. The same holds true for the other two indexes IDX_TITLE and IDX_AUTHOR, which are sorted by the respective fields and also contain a row pointer each.

Looking up a book's details by any one of the three possible search criteria is a two-part operation: first, find the index record, and then follow the row pointer to get the rest of the data.

With this technique you can insert data very quickly because the actual data records are simply appended to the table. Only the relatively small index records need to be kept in order, meaning much less data has to be shuffled around on the disk.

There are drawbacks to this approach as well. Even in cases where you only ever want to look up data by a single search column, there will be two accesses to the storage subsystem—one for the index, another for the data.

InnoDB

However, InnoDB is different. Its index system is a little more complicated, but it has some advantages:

Primary (clustered) indexes

Whereas in MyISAM all indexes are structured identically, InnoDB makes a distinction between the primary key and additional secondary ones.

The primary index in InnoDB is a clustered index. This means that one or more columns of each record make up a unique key that identifies this exact record. In contrast to other indexes, a clustered index's main property is that it itself is part of the data instead of being stored in a different location. Both data and index are clustered together.

An index is only serving its purpose if it is stored in a sorted fashion. As a result, whenever you insert data or modify the key column(s), it needs to be put in the correct location according to the sort order. For a clustered index, the whole record with all its data has to be relocated.

That is why bulk data insertion into InnoDB tables is best performed in correct primary key order to minimize the amount of disk I/O needed to keep the records in index order. Moreover, the clustered index should be defined so that it is hardly ever changed for existing rows, as that too would mean relocating full records to different sectors on the disk.

Of course, there are significant advantages to this approach. One of the most important aspects of a clustered key is that it actually is a part of the data. This means that when accessing data through a primary key lookup, there is no need for a two-part operation as with MyISAM indexes. The location of the index is at the same time the location of the data itself—there is no need for following a row pointer to get the rest of the column data, saving an expensive disk access.

Secondary indexes

Consider if you were to search for a book by title to find out the ISBN number. An index on the name column is required to prevent the database from scanning through the whole (ISBN-sorted) table. In contrast to MyISAM, the InnoDB storage engine creates secondary indexes differently.

Instead of record pointers, it uses a copy of the whole primary key for each record to establish the connection to the actual data contents.

In the previous figure, have a look at the IDX_TITLE index. Instead of a simple pointer to the corresponding record in the data table, you can see the ISBN number duplicated as well. This is because the isbn column is the primary key of the books table. The same goes for the other indexes in the figure—they all contain the book ISBN number as well. You do not need to (and should not) specify this yourself when creating and indexing on InnoDB tables, it all happens automatically under the covers.

Lookups by secondary index are similar to MyISAM index lookups. In the first step, the index record that matches your search term is located. Then secondly, the remaining data is retrieved from the data table by means of another access—this time by primary key.

As you might have figured, the second access is optional, depending on what information you request in your query. Consider a query looking for the ISBN numbers of all known issues of Moby Dick:

SELECT isbn FROM books WHERE title LIKE 'Moby Dick%';

Issued against a presumably large library database, it will most certainly result in an index lookup on the IDX_TITLE key. Once the index records are found, there is no need for another lookup to the actual data pages on disk because the ISBN number is already present in the index. Even though you cannot see the column in the index definition, MySQL will skip the second seek saving valuable I/O operations.

But there is a drawback to this as well. MyISAM's row pointers are comparatively small. The primary key of an InnoDB table can be much bigger—the longer the key, the more the data that is stored redundantly.

In the end, it can often be quite difficult to decide on the optimal balance between increased space requirements and maintenance costs on index updates. But do not worry; we are going to provide help on that in this article as well.

General requirements for the recipes in this article

All the recipes in this article revolve around changing the database schema. In order to add indexes or remove them, you will need access to a user account that has an effective INDEX privilege or the ALTER privilege on the tables you are going to modify.

While the INDEX privilege allows for use of the CREATE INDEX command, ALTER is required for the ALTER TABLE ADD INDEX syntax. The MySQL manual states that the former is mapped to the latter automatically. However, an important difference exists: CREATE INDEX can only be used to add a single index at a time, while ALTER TABLE ADD INDEX can be used to add more than one index to a table in a single go.

This is especially relevant for InnoDB tables because up to MySQL version 5.1 every change to the definition of a table internally performs a copy of the whole table. While for small databases this might not be of any concern, it quickly becomes infeasible for large tables due to the high load copying may put on the server. With more recent versions this might have changed, but make sure to consult your version's manual.

In the recipes throughout this article, we will consistently use the ALTER TABLE ADD INDEX syntax to modify tables, assuming you have the appropriate privileges. If you do not, you will have to rewrite the statements to use the CREATE INDEX syntax.

Adding indexes to tables

Over time requirements for a software product usually change and affect the underlying database as well. Often the need for new types of queries arises, which makes it necessary to add one or more new indexes to perform these new queries fast enough.

In this recipe, we will add two new indexes to an existing table called books in the library schema. One will cover the author column, the other the title column. The schema and table can be created like this:

mysql> CREATE DATABASE library;
mysql> USE library;
mysql> CREATE TABLE books (
isbn char(13) NOT NULL,
author varchar(64) default NULL,
title varchar(64) NOT NULL,
PRIMARY KEY (isbn)
) ENGINE=InnoDB;

Getting ready

Connect to the database server with your administrative account.

MySQL Admin Cookbook 99 great recipes for mastering MySQL configuration and administration
Published: March 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:

How to do it...

  1. Change the default database to library
    USE library;
  2. Create both indexes in one go using the following command:
    ALTER TABLE books ADD INDEX IDX_author(author), ADD INDEX IDX_ title(title);

How it works...

The ALTER table statement shown above is almost self-explanatory. The books table is altered to be indexed with individual indexes on the author and the title columns. Each index is given an easily recognizable name: IDX_author and IDX_title for the author and title columns respectively.

Index names are helpful when you later decide to remove an index from a table. Instead of listing all the columns again, you can just refer to the index name.

Index names
It is very common to name indexes with some sort of prefix like IDX_ and then append the column name(s) the index spans.This is not strictly necessary and you might want to establish a different naming scheme. Whatever you choose, make sure you follow your scheme and assign names consistent with it for all your indexes.

There's more...

There are some more details worth knowing about when creating indexes on any given table.

Using MySQL Query Browser to generate the SQL statements

Setting up indexes can be done either through a command line as shown earlier or using an arguably more comfortable graphical tool like MySQL Query Browser.Especially when dealing with more complex table setups, the graphical presentation can provide additional clarity. Before applying any changes to your database, the product will display and allow you to copy or save the full SQL statement(s) that are equivalent to the changes you made in the graphical editor.

This is very convenient because this way you can be sure not to make any mistakes concerning statement syntax, table, or column names. We usually make changes using MySQL Query Browser on a development or testing machine just to grab the SQL statements and put them into SQL update script files for later execution, for example, as a part of our software update routine. The following figure shows what the changes made in this exampleook like. Note that the generated statements contain all table and column names in backticks.This is generally not required as long as those identifiers do not collide with MySQL keywords—something you should avoid anyway. Also, the statements will be fully qualified, which means the database name is put before the table name. This is also not strictly required if you set the default database to the right schema beforehand.

Prefix indexes

In the example above, we created an index with default settings. This will create an index that is usually "just right". You may, however, have special requirements or possess knowledge about the table data that cannot be derived from the schema definition alone, making a custom index a better choice than the default one.

The detail most often changed in an index definition is the length of the index fields. MySQL provides support for so-called prefix indexes. As the database does not know about the nature of the contents that are going to be stored in any particular column apart from the data type, it has no choice but to take the safe route and consider the full length of the column in its sorted index copy.

For long columns in large tables, it can be a waste of space to copy the complete column values to the index, which in turn can have negative impact on performance just because there's more data involved.

You can aid the database to work more efficiently with your domain knowledge. In the books example table the title can be up to 64 characters long. However, it is very unlikely that there will be a lot of books whose titles start alike and only differ in the last few characters. So, having the index cover the maximum length is probably not necessary for quick lookups. By changing the index creation statement to include a prefix length (say 20 characters) for the column to be indexed, you can tell MySQL to only copy the first 20 characters of the title to the index:

ALTER TABLE books ADD INDEX IDX_title(title(20));

As a result, the index will use less space—in terms of both disk usage and memory when used for queries. As long as the book title differs within the first 20 characters, this index will be more efficient than one covering the full column.

Even when there is a certain number of titles that are identical within this 20 character prefix, the index will still be useful. This is because as long as MySQL can rule out all but a few records, having to look at the actual table data for the final decision as to which rows fulfill the query conditions is still significantly faster than having to scan the whole table with all books.

Unfortunately, there is no easy-to-use formula to determine the ideal prefix length because it heavily depends on the actual data content. This is why by default the whole column is indexed.

Prefix primary keys

Most documentation on indexing in some way or another covers the topic of prefix indexes for text type columns, using only a portion at the beginning of column contents instead of the whole values for the actual index.

However, often this topic is presented in a way that might suggest this only works for secondary keys; but that is not true. You can also use a prefix primary key, as long as the most important requirement of a primary key is not violated: the uniqueness of each key value must be guaranteed.

Adding a fulltext index

Indexes are an important means of making sure a database performs well and responds quickly when queried. However, they can only live up to their full potential when applied to well-structured data. Unfortunately, not all information we would like to query can be made to fit into regular relational database tables and columns.

A prime example of this is free text. It does not follow any well-defined structure and does not lend itself to the principle by which regular indexes work. For example, a fulltext index allows querying for search terms no matter where in the indexed column they occur and not only at the beginning of the column as would be the case with normal indexes.

Fulltext indexes require you to use a special syntax to express your queries. Querying with the LIKE keyword will not be accelerated by a fulltext index. In this recipe you will learn how to create a fulltext index on an existing database column. For the purpose of this example, we assume a fictional forum database with a posts table that in turn has a content column storing the actual text of a forum entry.

InnoDB tables do not support fulltext indexing. This feature is only available for tables using the MyISAM storage engine.

Getting ready

Connect to the database using your administrative account.

How to do it...

  1. Change the default database to forum:
    USE forum;
  2. Create the fulltext index using the following command:
    ALTER TABLE posts ADD FULLTEXT INDEX IDX_content(content);

How it works...

While regular indexes create ordered copies of the relevant columns to enable quick lookups, fulltext indexes are a more complicated matter.


Dropping and recreating fulltext indexes for bulk data imports

When (first) inserting bulk data into a table, it is faster to first drop an existing fulltext index and then later recreate it. This will speed up the data insertion significantly because keeping the fulltext index up to date during data insert is an expensive operation.

There's more...

Here are some details that are important to know when including fulltext indexing in your applications.

Please be aware that changes to any of the parameters that follow require arebuild of any fulltext index that was created before the change!See the MySQL online manual at http://dev.mysql.com/doc/refman/5.1/en/fulltext-fine-tuning.html for more details.

Case sensitivity

Fulltext index queries are usually run in a case-insensitive way. If you need case-sensitive fulltext search, you will have to change the collation of the affected underlying columns to a binary variant.

Word length

When a fulltext index is created, only words within a configurable range of lengths are considered. This helps to prevent acronyms and abbreviations being included in the index. You can configure the acceptable length range using the ft_min_word_len and ft_max_word_len variables. The default value for the minimum length is 4 characters.

Stopwords

In every language, there are many words that are usually not wanted in fulltext search matching. These so called stopwords might be "is, a, be, there, because, done" among others. They appear so frequently in most texts that searching for them is hardly useful. To conserve resources, these stopwords are ignored when building a fulltext index. MySQL uses a default stopword list that defines what is to be ignored, which contains a list of about 550 English stopwords. You can change this list of stopwords with the ft_stopword_file variable. It takes a filename with a plain text file containing the stopwords you would like to use. Disabling stopwords can be achieved by setting this variable to an empty string.

Ignoring frequent words

Frequent words will be ignored: if a search term is present in more than half of the rows searched, it will be considered a stopword and effectively ignored. This is useful especially in large tables; otherwise you would get half of the table as query hits, which can hardly be considered useful.

When experimenting with fulltext search, make sure you have a reasonably large dataset to play with. Otherwise you will easily hit the 50 percent mark described above and not get any query results. This can be confusing and will make you think you did something wrong, while in fact everything is perfectly in order.

Query modes

Apart from the default human query mode you can use a boolean query mode, which enables special search-engine-like operators to be used—for example, the plus and minus signs to include or exclude words in the search.

This would allow you to use query terms such as '+apple -macintosh' to find all records containing the word apple, but not the word macintosh.

For all the possible operators, have a look at the MySQL online manual at http://dev.mysql.com/doc/refman/5.1/en/fulltext-boolean.html

Sphinx

MySQL's built-in fulltext search is only available for MyISAM tables. In particular, InnoDB is not supported. If you cannot or do not want to use MyISAM, have a look at Sphinx—an open source, free fulltext search engine that was designed to integrate nicely with MySQL. See http://sphinxsearch.com/ for more information.

Creating a normalized text search column

Usually, regular and fulltext indexing as supported by MySQL are sufficient for most use cases. There are, however, situations where they are not perfectly usable:

  • InnoDB tables cannot use fulltext indexes. At the time of writing there were no signs of this changing in the foreseeable future.
  • There are different ways to spell the search terms

Especially in non-English speaking countries, a problem often arises that does not surface as often in American or British environments. Words in the English language consist of the letters from A to Z without diacritics. From a software development perspective this is a welcome simplification because it allows for simpler implementations.

One problem you are often faced with German, for example, is different ways to spell the same word, making it complicated to formulate suitable search terms.

Consider the German words "Dübel" (dowel) and "Mörtel" (mortar). In a merchandise management database you might find several variants of similar products, but each could be spelled in different ways:

MySQL Admin Cookbook 99 great recipes for mastering MySQL configuration and administration
Published: March 2010
eBook Price: $29.99
Book Price: $49.99
See more
Select your format and quantity:



productID

name

stock

12352323

DÜBEL GROß 22

76

23982942

"Flacher-Einser" Mörtel

23

29885897

DÜBEL GROSS 4

44

83767742

Duebel Groß 68

31

As an end user of the corresponding application searching for those becomes cumbersome because to find exactly what you are looking for you might have to attempt several searches.

In this recipe, we will present an idea that needs some support on the application level but will allow you to use simple regular indexes to quickly search and find relevant records in situations like the above.

To implement the ideas presented in this recipe, modifications to the software accessing the database as well as the table definition will be necessary. We advise that this is a process that usually entails a higher complexity and increased testing efforts than simply adding an index.

Getting ready

To implement the ideas presented here, you will need to connect to the database server with your administrative account for the schema modifications. Because apart from the database modifications application program code changes will be necessary as well, you should contact an application developer.

In the example, we are going to assume a table definition as follows:

CREATE TABLE products (
productID int(11) NOT NULL,
name char(30) default NULL,
stock int(11) default NULL,
PRIMARY KEY (productID)
) ENGINE=InnoDB

How to do it...

  1. Connect to the database server using your administrative account and make test the default schema:
  2. use test;

  3. Add a new column norm_name to the products table:
    mysql ALTER TABLE products ADD COLUMN norm_name CHAR(90) AFTER name;

    The column needs to be at least as long as your original column. Depending on the character mapping rules you are going to implement, the projected values might takeup more space.

  4. Define an index on the new column. Make sure it is not set to UNIQUE:
    mysql ALTER TABLE products ADD INDEX IDX_norm_name (norm_name);
  5. Optionally, consider dropping an existing index on the original column. Also, consider modifying other indexes currently containing the original column to include the new one instead.
  6. Implement the replacement algorithm depending on your language. For German language substitutions, the following substitutions could be used. This is just an excerpt from the Transformers.java class you can find on the book's website.

    private static String[] replacements = {
    "ä", "ae", "null", "0", ":", "",
    "ö", "oe", "eins", "1", ":", "",
    "ü", "ue", "zwei", "2", ".", "",
    "ß", "ss", /* ... */ "-", "",
    " ", "", "neun", "9", ",", "",
    // ... further replacements...
    };
  7. Modify your application code to use the new mapping function and issue queries against the new norm_name column where previously the original name column was used. Depending on how you decide to expose the search features to your end users, you might want to make searching the new or the old column an option.
  8. Modify your application code to update the new column parallel to the original one. Depending on the application scenario, you might decide to only update the normalized search column periodically instead.
  9. Before handing out a new version of your software containing the new code, make sure the normalized search column gets populated with the correct values.
  10. Optionally, declare the new column NOT NULL, after it has been initially filled.

How it works...

By implementing the mapping algorithm, we make the application think about the different ways to spell things, not the end user. Instead of creating all possible variants, which could become a large set of permutations depending on the length and content of the original input, we project the search terms to a normalized form for both the original data and later for queries issued against it. As both use the same mapping functions, only a single—index supported—query against MySQL is needed. The application of course usually never reveals these internals. The person in front of the computer will just be pleased to find the desired records easily.

The mapping rules from input to search terms depend on the language and applicationspecific needs. For German words, they are rather short—only the umlaut characters need to be transformed to a normalized form. Other languages might require more complex rules.

In the example code earlier, we also transform the input to lowercase and remove several special characters like dashes and colons, and also the whitespace. For the sample data set of products we used, this is the result of the transformation:



productID

Name

name_nrm

stock

12352323

DÜBEL GROß 22

duebelgross22

76

23982942

"Flacher-Einser" Mörtel

flacher1ermoertel

23

29885897

DÜBEL GROSS 4

duebelgross4

44

83767742

Duebel Groß 68

duebelgross68

31

Now instead of querying the original data column, we ask the database to search for the transformed representation of the search terms in the additional norm_name (normalized) column. For this it can use regular indexes and provide results quickly and efficiently.

Note that the Transformer.java code available from the book's website is nowhere near production quality but only serves for demonstration purposes. It does not, for example, contain any error checking or exception handling and the mapping algorithm is very simple, too.

There is more...

If you do not care about international specialties but still want to improve user experience by allowing for less strict searches, you might want to have a look at the SOUNDEX() function. It is designed to work for English language words only and allows you to query for results that sound like the search terms.

However, note that the results of using it may not be what you expect—opinions on the Internet range from extreme enthusiasm to complete disappointment. You can find its documentation at http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex.html>http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_soundex

Removing indexes from tables

Once-useful indexes may become obsolete as requirements change with the evolving database. In this article, we will show you how to get rid of the IDX_author index created in the Adding indexes to tables recipe.

Getting ready

Connect to the database server with your administrative account.

How to do it...

  1. Change the default database to library:
    USE library;
  2. Drop the IDX_author index using the following command:
    ALTER TABLE books DROP INDEX IDX_author;

How it works...

Using the ALTER TABLE statement, we tell the database that we want to remove (DROP) the index named IDX_author from the books table.

There's more...

As with the creation of new indexes, you can drop multiple indexes at once using the ALTER TABLE statement by simply adding more DROP INDEX clauses, separated by comma. If you were to delete both indexes defined in Adding indexes to tables, you could use this statement

ALTER TABLE books DROP INDEX IDX_author, DROP INDEX IDX_title;

Estimating InnoDB index space requirements

While indexes might very well be the single most important key in database performancetuning, they come at the price of redundancy.

There are two main disadvantages tightly connected to redundant data storage:

  • The danger of inconsistencies between the redundant copies of data that should be at all times identical.
  • Increased storage and memory consumption because the same data is physically duplicated.

Fortunately, the former is a non-issue with indexes. As the database server takes care of keeping data and indexes consistent without human intervention, you cannot get into a situation where two columns that should contain equal data at all times are out of sync due to programming errors or the like. This is usually a problem when violating normalization rules.

In contrast to that, there is no way to prevent the latter disadvantage. We need to store multiple copies if we want different sort orders for quick lookups. What we can do, however, is to attempt to minimize the negative effect by trying to limit the amount of duplicated information as far as possible.

The employees database is an open source test database available for free. It contains examples for many MySQL features including large tables, foreign key constraints, views, and more. It can be found along with some documentation at http://dev.mysql.com/doc/employee/en/employee.html

In the example below, we assume the existence of the employees test database with an employees table defined as follows:

CREATE TABLE employees (
emp_no int(11) NOT NULL,
birth_date date NOT NULL,
first_name varchar(14) NOT NULL,
last_name varchar(16) NOT NULL,
gender enum('M','F') NOT NULL,
hire_date date NOT NULL,
PRIMARY KEY (emp_no)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

We will add an index each to the last_name and the first_name columns and try to predict the necessary space.

Please note that the results will never be exact. The storage requirements— especially of text-value table columns (VARCHAR, TEXT, CHAR, and so on)—can be difficult to determine because there are multiple factors that influence the calculation. Apart from differences between storage engines, an important aspect is the character set used. For details refer to the online manual for your server version:http://dev.mysql.com/doc/refman/5.1/en/storage-requirements.html.
Moreover, it is not possible to find out the exact size even for existing indexes because MySQL's SHOW TABLE STATUS command only gives approximate results for InnoDB tables.

Getting ready...

Connect to the database server with your administrative account.

How to do it...

  1. Find out the size of one primary key entry. To do so, look at the primary key definition in the table structure. Add the sizes of all primary key columns as documented in the MySQL Online Manual. In the example, the INT column takes 4 bytes. Write this number down.
  2. Determine the size of each column to be included in the new indexes and add them up per index. In the example, both first_name and last_name are VARCHAR columns—this means their lengths are not fixed as with the INT type. For simplicity, we will assume completely filled columns, meaning 14 bytes for first_name and 16 bytes for the last_name column.
  3. For each index, add the lengths of all relevant columns and the size of the primary key. In our example, this gives the following results



    Index

    Column size

    Primary Key

    Size

    Index Record

    Size

    IDX_FIRST_NAME

    14

    4

    18

    IDX_LAST_NAME

    16

    4

    20

    The rightmost column contains the pure data size of a single index record including the implicit primary key.

  4. Multiply the size per index record with the number of rows in the table:



    Index

    Rows

    Index record

    size

    Est. index size

    IDX_FIRST_NAME

    300024

    18

    5400432

    IDX_LAST_NAME

    300024

    20

    6000480

    The rightmost column contains the estimated size of the index, based on the current number of records, and the overhead taken by InnoDB to internally organize and store the data.

How it works

In the previous steps, we simply added up the sizes of all columns that will form a secondary index entry. This includes all columns of the secondary index itself and also those of the primary key because, as explained in the article introduction, InnoDB implicitly adds those to every index.

Internally, the server of course needs a little more than just the raw column contents—all sorts of management overhead (such as column widths, information on which columns can be null, as well as some constant overhead) add to the required space. Calculating these in detail is complicated and error-prone because they rely on many parameters and implementation details can change between MySQL versions. This is not required, however, because our aim is a ballpark number. As table contents often change quickly, exact numbers would not be valid for long.

You can see this in our example—the values are too low. In reality, you will need to experiment with these values. You are usually going to be on the safe side when you multiply your results with a factor of 1.5 to 2.5.

You will find that depending on the lengths of the columns indexed and those that make up the primary key, the accuracy of the estimates can vary.

There's more...

Predicting space requirements is not an exact science. The following items are intended to give some more hints on what you might want to think about.

Considering actual data lengths in your estimate

When adding an index to an existing column, you can try to use the average length of the column values:SELECT AVG(LENGTH(first_name)) AS avg_first, AVG(LENGTH(last_name)) AS avg_last FROM employees;

For the sample data the results are:



avg_first

 

avg_last

6.22

7.15

 

Round this up to the next integer (7/8). Note that especially for short columns like this, the estimates can be much less reliable because relative to internal database overhead data size is less significant. This is why in the recipe earlier we went with declared maximum length of the VARCHAR columns instead.

Minding character sets

For columns storing text information—such as CHAR and VARCHAR, VARBINARY, and TEXT—the storage requirements depend on the character set used for the text inside. For most English-speaking countries, this is something like the Latin-1 character set, which uses a single byte per character of text. However, in international environments, this encoding is hardly sufficient. To accommodate German text, for example, you need some special characters—not to mention Chinese, Japanese, or other non-Latin languages.

MySQL supports different character sets on a per column basis. However, often you will define a default character set for a database including all its tables and their columns.

When estimating index (and data) sizes for Unicode-aware columns (MySQL supports UTF-8 and UCS2 character sets for this purpose), you need to take into account that those may require more than a single byte per character. The very popular UTF-8 encoding uses between 1 and 4 (even though 4 are only used in very special cases) bytes per character. UCS2 has a constant size of 2 bytes per character. For details on how UTF-8 works, see http://en.wikipedia.org/wiki/UTF-8.

>> Continue Reading: Indexing in MySQL Admin here.

About the Author :


Daniel Schneller

Daniel Schneller works as a software developer, database administrator, and general IT professional for an independent software vendor in the retail sector. After successfully graduating from the University of Cooperative Education in Heidenheim, Germany with a degree in Business Computer Science, he started his career as a professional software developer, focused on the Microsoft technology stack. In 2002 he started focusing on enterprise-level Java development and has since gained extensive knowledge and experience implementing large scale systems based on Java EE and relational databases, especially MySQL since version 4.0.

Currently he is mostly involved with the ongoing development of framework-level functionality, including customization and extension of an ORM-based persistence layer. He is involved in different open source projects such as FindBugs, Eclipse, and Checkstyle and infrequently blogs about Java, MySQL, Windows, Linux and other insanities at http://www.danielschneller.com.

Udo Schwedt

Udo Schwedt is head of the Java architecture team and deputy head of the Java development department at the IT service provider for Germany's market leader in the Do-It-Yourself sector. After finishing his studies at RWTH Aachen, Germany with a degree in Computer Science, he started his career as a professional Java developer in a leading software company in the banking sector. In 2003, he joined his current company as a framework developer, subsequently taking the role of a lead architect.

Both authors are involved in a large scale international software project, which encompasses development of a Java-based merchandise management software solution. This decentralized multi-platform environment is based on more than 500 decentralized MySQL server instances with more than 5,500 replication slaves. Daniel and Udo are responsible for configuration and management of this infrastructure.

Books From Packt


Amazon SimpleDB Developer Guide
Amazon SimpleDB Developer Guide

NHibernate 2.x Beginner's Guide
NHibernate 2.x Beginner's Guide

Drupal 7 First look
Drupal 7 First look

Mastering phpMyAdmin 3.1 for Effective MySQL Management
Mastering phpMyAdmin 3.1 for Effective MySQL Management

jQuery 1.3 with PHP
jQuery 1.3 with PHP

Pentaho 3.2 Data Integration: Beginner's Guide
Pentaho 3.2 Data Integration: Beginner's Guide

High Availability MySQL Cookbook
High Availability MySQL Cookbook

Moodle 1.9 Theme Design: Beginner's Guide
Moodle 1.9 Theme Design: Beginner's Guide

 


No votes yet

Post new comment

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
d
w
r
n
k
t
Enter the code without spaces and pay attention to upper/lower case.
Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software