Maximizing Query Performance through Column Indexing in MySQL ~ Tech Blog

Monday, 16 July 2018

Maximizing Query Performance through Column Indexing in MySQL

Indexes are a feature that you can enable on your MySQL tables to increase performance, but they do have some downsides. When you create a new index, MySQL builds a separate block of information that needs to be updated every time there are changes made to the table. This means that if you are constantly updating, inserting and removing entries in your table this could have a negative impact on performance. Read on as we review some of the best practices for achieving the right balance between query and update speed through column indexing in MySQL.

How Column Indexing Works

Indexes are a type of lookup table that is used to improve the performance of SELECT queries. Without an index, MySQL has to start with the first record and then read the whole table to find the relevant rows. Conversely, if the table has an index for the columns in question, MySQL can quickly find the data without having to look at all the rows in a table.
Indexes can consist of one or more fields up to a total of 16 columns. For certain data types, you can even index a prefix of the column. One way to think about multiple-column indexes is like a sorted array containing values that are created by concatenating the values of the indexed columns. For this reason, indexes on multiple fields are also known as “concatenated indexes”.
Here’s an example showing how the database would use indexing in performing a query. Given the following SELECT statement:

SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

The simplest case scenario is that a multiple-column index exists on col1 and col2; in that case, the appropriate rows can be fetched directly.
Another possibility is that separate single-column indexes exist on col1 and col2; here, the query optimizer tries to find the most restrictive index by deciding which index will find fewer rows and using that index to fetch them.
When a table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). Hence, it can pay off to create your multiple-column indexes starting with the most restrictive to the least restrictive fields.
By "pay off", I mean this: if a table has 1,000 rows, this is at least 100 times faster than reading sequentially!

Creating Single and Multiple Column Indexes

You can specify a column to index by using the INDEX statement at the time that you create a new MySQL table. Alternatively, you can also add a single index to an existing table by using the CREATE INDEX clause.
In the following example, we are defining a new table called “blog”, which will contain two fields: blogID and post_text. To add an index to the blogID field, we add the INDEX (ID field) clause:

CREATE TABLE blog (
blogID  INT, 
post_text TEXT,
INDEX (blogID)
)

Had we not defined the index at creation time, we could always add one later, using the following syntax:

CREATE INDEX id_index ON blog(blogID);

Notice that the “Create Index” clause requires an index name as well. The “ALTER TABLE” command also supports the adding of an index:

ALTER TABLE blog ADD INDEX(blogID);

We can create multiple-column indexes, or concatenated indexes, the same way:

CREATE TABLE blog (
blogID  INT, 
post_text TEXT,
auth_id INT,
INDEX (blogID, auth_id)
)

To add the indexes on a pre-existing table, use the following syntax:

CREATE INDEX id_index ON blog(blogID, auth_id);

ALTER TABLE blog ADD INDEX(blogID, auth_id);

Indexing Partial Fields (Partial Index)

Indexes can be created that use only the leading part of column values, using “col_name(length)” syntax to specify an index prefix length. Prefixes can be specified for CHAR, VARCHAR, BINARY, BLOB, TEXT, and VARBINARY columns:

CREATE INDEX index_name ON table_name (field_name(prefix_length));

The statement shown here creates an index of the post table using the first 10 characters of the post_text column:

CREATE INDEX post_index ON post (post_text(10));

Potential advantages to using partial indexes is that if the values in the indexed field tend to differ in the first 10 characters, this index should not be much slower than an index created from the entire post_text column. At the same time, using column prefixes for indexes can make the index file much smaller, which could save a lot of disk space and may also speed up table inserts.
Partial indexes are associated with the Like statement, as in:

SELECT id, first_name, last_name, user_name
FROM   t1
WHERE last_name like 'Rob G%';

Note that changing the expression to somethng like “%Rob G%” will lose the benefit of the index because it searches the middle of the field.
You can change an existing index on a table using the ALTER TABLE syntax, which we saw above:

ALTER TABLE t1
DROP INDEX last_name,
ADD INDEX (last_name(10));

Some Indexing Tips

Create an index on the field that has the biggest variety of values first. This will result in the “most bang for your buck”, so to speak.
Keep indexes small. It's better to have an index on just zip code or postal code, rather than postal code & country. The smaller the index, the better the response time.
For high frequency functions (thousands of times per day) it can be wise to have a very large index, so the system does not even need the table for the read function.
For small tables an index may be disadvantageous. The same can also be said for any function where the system would be better off by scanning the whole table.

Other Index Types

MySQL supports a few other index types, depending on the database engine used. One example is a FULLTEXT index, which is used for keyword searches in stored documents. Only the MyISAM storage engine supports FULLTEXT indexes and only for CHAR, VARCHAR, and TEXT columns. Indexing always takes place over the entire column and column prefix indexing is not supported. You can also create special indexes on spatial data types, called R-Trees. Again, only MyISAM supports R-tree indexes on spatial types. Other storage engines use B-trees for indexing spatial types. The MEMORY storage engine uses HASH indexes by default, but also supports B-Tree indexes.
Having seen how indexes can negatively impact table inserts and updates, we now know that one has to take great care in deciding which fields to index. Moreover, choosing between single and mutiple-field indexes can be a daunting task. In the latter, field ordering also plays a critical role. Hopefully, this article provided you with some helpful guidelines in deciding on which fields to index and when to do so.

Monday, 16 July 2018