Tuesday, 17 July 2018

MySQL Indexing Explained

MySQL Indexing Explained

Following on from my previous post about optimizing MySQL servers, I am going to discuss the basics of using indexes. Here at Viaduct, we've seen that the most common reasons for performance problems in web applications are missing or incorrect database indexes, so we hope this simple introduction can help people get much more out of their databases.

What are Indexes?

Every time your web application runs a database query containing a WHERE statement, the database server's job is to look through all the rows in your table to find those that match your request. As the table grows, an increasing number of rows need to be inspected each time.
Indexes solve this problem in exactly the same was as the index in a reference book, by taking data from a column in your table and storing it alphabetically in a separate location called an index. The same process can be applied to all data types, for example numeric data will be stored in numeric order and dates in date order.
By doing this, the database does not need to look through every row in your table, instead it can find the data you are searching for alphabetically, then skip immediately to look at the row(s) where the data is located.
Icons

Creating a Simple Index

Many web frameworks will assist with the creation of indexes, and it is worth researching how to create indexes using your framework, such as migrations in Ruby on Rails, but in this article I will provide the raw MySQL commands.
In this example, we will be working with a table of students at an extraordinarily large school. This is quite a large table as we have 100,000 students in our school. The first few rows of the table look like this:
IDFirst NameLast NameClass
1JamesBond6A
2ChrisSmith6A
3JaneEyre6B
4DaveSmith6B
The table was created using the following query:
CREATE TABLE `students` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `first_name` varchar(255) DEFAULT NULL,
  `last_name` varchar(255) DEFAULT NULL,
  `class` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB
Our web application performs 3 queries against this table:
  • Look up a student by ID
  • Search for students by last name
  • List all students in a class
Each of those queries searches for data from a single column, so we should ensure that each of these columns has its own index.
The first query SELECT * FROM students WHERE id = 1 is a special case because it is looking up a row using its PRIMARY KEY. Because we already know exactly which row we want, no further optimization is required. It is best to always look rows up in this way when possible, and almost all tables should have a unique column defined as the primary key in this way.
The second query SELECT * FROM students WHERE last_name = 'Smith' will be searching a column which does not currently have an index and this is exactly the type of query that will search the whole table unless we do something about it. Lets go right ahead and create an index on that column:
CREATE INDEX by_last_name ON students (`last_name`);
This creates an index named by_last_name on the table, which will contain an indexes copy the last_name column, allowing us to look these up much more quickly.
Exactly the same principle can be applied to the class column, allowing us to efficiently look up the students in a particular class. We should create a simple index on it as follows:
CREATE INDEX by_class ON students (`class`);

Querying Multiple Columns

What happens when our queries become a little more complicated and we are searching the database for multiple columns? In this case, we are searching for a student named Smith, and we know what class they are in.
SELECT * FROM students WHERE class = '6A' AND last_name = 'Smith'
MySQL will normally only use one index to query the table, so we will not benefit from using both of our existing indexes to perform this query. However we also do not need to many any more indexes at this point!
The database server will look at the table and determine that we have an index on classand that each class only contains about 20 students. It will use the by_class index we have already created to locate all 20 students in the class, then check the last_name of each row individually. Searching 20 rows is no trouble for the server, compared with searching all 100,000, and we've avoided wasting memory by creating any more indexes.

Indexing Multiple Columns

In our last example, we has small class sizes, but what if we actually had some classes that contained over 100 students? In this case, it would be a good idea to improve our by_class index to also include the last_name, using more memory but allowing us to efficiently search on both columns at the same time.
DROP INDEX by_class ON students;
CREATE INDEX by_class_and_last_name ON students (class, last_name);
Why did we drop the by_class index? Because our new index will allow us to search by class and by last name, but it will also allow us to search only by class, rendering the by_class index redundant.
So why did we keep by_last_name? Unfortunately, indexes can only be used starting at the beginning. You don't have to use the whole index, but the query has to use its components in order starting with the first.
Icons

Joins

It is very likely that our application will have some joins. In this example, our students will have some grades in a grades table as follows:
IDstudent_idTimestampGrade
112014-01-20 15:00:00A+
212014-02-20 15:00:00A-
When viewing a student's record, our web application will fetch all grades related to that student with the following query:
SELECT * FROM grades WHERE student_id = 1
We also have a feature to display all grades from a particular class:
SELECT * from students WHERE class = '6A' JOIN grades on grades.student_id = students.id
Both of these queries use the "foreign key" called student_id. It is almost always a good idea to index foreign keys, and this is done in the same way as text fields.
CREATE INDEX by_student_id ON grades (student_id);

Choosing Columns to Index

Rather than looking at your queries, it is quite possible to index your database simply by looking through your tables and indexing columns that will obviously be searched on. Here are some examples of columns that should almost always be indexed:
  • Columns that join with other tables (they usually end _id).
  • Usernames and/or email addresses, these are looked up every time a user logs in.
  • Any field used in a URL. Many applications have a uuid field for this purpose. Remember that id should already be a primary key.

Summary

If you find your SELECT queries are taking a long time to run, you are most likely missing an index on your table. Look at which columns are searching on, and create an index on each one using the following syntax:
CREATE INDEX by_student_id ON grades (student_id);
It is always wise to start by indexing columns that join with other tables (they usually end _id). After that, look for columns you know you will commonly search on. Focus on your largest tables.

0 comments:

Post a Comment