Wednesday 18 April 2018

Top 10 MySQL Best Practices

Many groups and individuals are involved in the field of data management, from MySQL administrators, architects, developers, as well as infrastructure support people. Each of these plays a part in the security, maintenance, and performance of a MySQL installation. Therefore, when speaking of best practices, one has to consider which of these functions that specific practice pertains to. In this top 10 list, I will try to include a bit from each discipline. I have considered my own experiences as well as consulted with numerous other sources to compile this final list. Whether or not you agree with each and every item and their ordering, the important thing is to at least consider each of the points raised here today. So, without further ado, here is my personal top 10 list.

1. Index Search Fields

You should always index columns that you plan on searching on. Creating an index on a field in a table creates another data structure which holds the field value, and pointer to the record it relates to. This index structure is then sorted, allowing binary searches to be performed on it. An index can be defined for a single column or multiple columns of a given table.
This rule also applies to fields where partial string searches will be performed on the start of the field. For instance the phrase "last_name LIKE 'rob%'" will use the index, whereas "WHERE last_name LIKE '%ert%'" will not.
This does not imply that the more indexes you have, the better. While that is true to a point, keep in mind that every index takes up disk space. In a MyISAM database, the index file can quickly reach the size limits of the underlying file system if many fields within the same table are indexed.

2. Avoid Using "SELECT *" in your Queries

As a general rule, the more data is read from the tables, the slower a query becomes. Considering that some production tables may contains dozens of columns, some of which comprises of huge data types, it would be foolhardy to select all of them. A database server which is separate from the web server will only aggravate this issue, due to the data having to be transferred across the network.
To reiterate, it is a good habit to always specify which columns you need when writing your SELECT statements.

3. Set a password for the "root" user and then rename the user.

Here's a security tip. Much like with UNIX, the first thing you should do with a clean MySQL install is set a password for the root user:

$ mysqladmin -u root password NEWPASSWORD

Even better, once you've set the password, change the name of the "root" user to something else. A hacker on a MySQL server will likely target the root, both for its superuser status and because it is a known user. By changing the name of the root user, you make it all the more difficult for hackers to succeed using a brute-force attack. Use the following series of commands to rename the "root" user:

$ mysql -u root -p
mysql> use mysql;
mysql> update user set password=PASSWORD("NEWPASSWORD") where 
User='<userid>';
mysql> flush privileges;
mysql> quit

4. Tune your Queries with EXPLAIN

The EXPLAIN keyword is undoubtedly the most instructive analytical tool in the MySQL arsenal. Using it can give you valuable insight on the steps that MySQL is taking to execute your query. This can help you spot the bottlenecks and other problems with your query or table structures.
In my Optimizing MySQL Query Retrieval Speed through Table Joins article, I included how to use EXPLAIN to ascertain the efficiency of the table joins in the following query statement:

explain
select a.au_lname,
a.au_fname, 
ta.royaltyper, 
t.title, 
t.royalty
from authors a,
titleauthor ta,
titles t
where a.au_id = ta.au_id
and ta.title_id = t.title_id

EXPLAIN produced the following results:

id  select_type table   type    possible_keys   key         key_len  ref                             rows    Extra
1   SIMPLE      ta      ALL     (NULL)          (NULL)      (NULL)   (NULL)                          800 
1   SIMPLE      t       ref     NewIndex        NewIndex     23      crosstab_article.ta.title_id    100     Using where
1   SIMPLE      a       ALL     (NULL)          (NULL)      (NULL)   (NULL)                          1000    Using where; 
                                                                                                             Using join buffer

The results of an EXPLAIN query will show you which indexes are being utilized, how the table is being scanned and sorted, and other useful information. At the very least, you can be sure that the lower the numbers appear in the rows column, the faster the query should run.

5. Index and Use Same Column Types for Joins

If your query contains many joins, you need to make sure that the columns that make up a join are indexed on both tables. This will allow MySQL to better optimize the join operation.
Likewise, the columns that are joined must share the same type. For instance, if you join a DECIMAL column to one of the type INT, MySQL will be unable to use at least one of the indexes. String type columns must also use the same character encoding.

6. LIMIT 1 When Getting a Unique Row

There are some queries that are meant to only return one row, such as those which fetch a unique record, or that verify whether or not there are any records that satisfy the WHERE clause. In such cases, adding LIMIT 1 to your query can increase performance. This reduces execution time because the database engine will stop scanning for records after it finds the first matching record, instead of going through the whole table or index.
A second popular usage is in subqueries. In the following SELECT statement, we want to retrieve the first s2 field value sorted by the s1 column. We can then match it against the outer query values:

SELECT * FROM t1 WHERE s1 = (SELECT s2 FROM t2 ORDER BY s1 LIMIT 1);

7. Hide MySQL from the Internet

Most experienced database administrators and security personnel know to never host the database under the Web server's root. For Web-enabled applications, MySQL should be hidden behind a firewall and communication should only be enabled between application servers and your Web servers. Another option is to use MySQL's skip-networking option. When it is enabled, MySQL only listens for local socket connections and ignores all TCP ports.

8. Use the Smallest Data Types Possible

This would seem to be common sense to me, because of my programming background. When I was attending college, there was still some of the "memory is scarce" philosophy carried over from the days of 256 MB hard drives. Nowadays, no one seems to care one iota about memory or hard drive space. "Memory is cheap!" is the new adage. While it is true in dollar terms, it still takes longer to read in large data types than smaller ones, as the former require more disk sectors to be read into memory.
The moral of the story is to ignore the temptation to immediately jump the largest data type when designing your tables. Consider using an int rather than a bigint. You should also avoid large char(255)text fields when a varchar or smaller char() will suffice. Using the right data type will fit more records in memory or index key block, meaning fewer reads, ergo faster performance.

9. Create Views to Simplify Commonly-used Table Joins

As discussed in my Writing Reusable Queries in MySQL article, views help to both simplify complex schema and to implement security. One way that views contribute to security is to hide auditing fields from developers. They can also be used to filter out unindexed columns, leaving only fields that are fastest to search on. The only caveat to using this technique is that you must be fairly sure that you won't need to access one of the hidden table columns in the future; not an easy thing to do!

10. Take Advantage of Query Caching

The query cache stores the text of a SELECT statement together with the corresponding result set. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client. Most MySQL servers have query caching enabled by default. It's one of the most effective methods of improving performance.
It's a beautiful thing, but query caching isn't without limitations. Take the following statement:

SELECT emp_id, 
    bonus_id 
FROM  bonuses 
WHERE YEAR(award_date) = Year(CURDATE());

The problem here is that queries, which contain certain non-deterministic functions - that is those which MySQL cannot calculate in advance - like NOW() and RAND() are not cached. Fortunately, there is an easy fix to prevent this from happening. That is to store the function results to a variable:

SET @year = Year(CURDATE());
SELECT emp_id, 
    bonus_id 
FROM  bonuses 
WHERE YEAR(award_date) = @year;

And that's my personal top 10 best practices in MySQL. One thing that I noticed while conducting my research for this article is that there is a wide range of opinions as to which practices should be worthy of a spot in the top 10. In my estimation, there are many factors that may affect an item's weight for you, including your particular role, experience, line of business, software versions, and hardware configuration, to name but a few. I would welcome you to add your own contributions as comments. Every tip helps!

MySQL INSERT Statement Variations

The Data Manipulation Language (DML) SELECT...INTO command is unsupported in MySQL. However, MySQL does provide the INSERT...SELECT statement. Rob Gravelle discusses this, and other variations of the INSERT statement.

In the MySQL Data Manipulation and Query Statements article, we looked at two variations of the INSERT INTO statement. If you recall, we utilized the INSERT statement to populate tables, rather than the Data Manipulation Language (DML) SELECT...INTO command, which is unsupported in MySQL. However, MySQL does provide the INSERT...SELECT statement. This, and other variations of the INSERT statement will be the topic of today’s article.

Inserting Multiple Rows

The INSERT INTO syntax that we used came in two forms – one with column names and one in which they were omitted:
Form 1:

INSERT INTO table_name
VALUES (value1, value2, value3,...)

Form 2:

INSERT INTO table_name (column1, column2, column3,...)
VALUES (value1, value2, value3,...)

Here are some examples of both forms:

INSERT INTO "employees" ("id", "shop_id", "gender", "name", "salary") 
VALUES (1,1,'m','Jon Simpson',4500);
INSERT INTO "employees" VALUES (1,1,'m','Jon Simpson',4500);

The above “VALUES“ syntax can also be used to insert multiple rows by including lists of column values, each enclosed within parentheses and separated by commas. Here’s the syntax for that along with an example:

INSERT INTO table_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
 
INSERT INTO 'employees' 
       ('id', 'shop_id', 'gender', 'name',                 'salary') 
VALUES (1,    1,         'm',      'Jon Simpson',          4500),
       (2,    1,         'f',      'Barbara Breitenmoser', 4700),
       (3,    2,         'f',      'Kirsten Ruegg',        5600),
       (4,    3,         'm',      'Ralph Teller',         5100),
       (5,    3,         'm',      'Peter Jonson',         5200);

Populating Specific Columns

Sometimes, you just want to populate the minimum number of mandatory fields to create a record. The VALUES syntax is not ideally suited for this purpose because the number and ordering of supplied values must exactly match the number of table columns. A better way is to use the SET statement. It allows you to specify the column names you want to populate for one row. Here is what the INSERT INTO...SET syntax looks like:

INSERT INTO table_name 
SET column_name1 ={expr | DEFAULT}[,
    column_name2 ={expr | DEFAULT},
    ...];

If a table column is not specified in an INSERT, a couple of things can happen, depending on whether or not strict mode is enabled. In strict mode, an error occurs if any column doesn't have a default value. Otherwise, MySQL uses the implicit default value for any column that does not have an explicitly defined default. The value of course is dependent on the data type. Strings default to an empty string (“”) and integers default to zero (0). Enums default to the first item.
Should both the column list and the VALUES list be empty, INSERT creates a row with each column set according to the above rules.

mysql>INSERT INTO employee () VALUES();
 
mysql>SELECT ROW_COUNT();
+-------------+
| ROW_COUNT() |
+-------------+
|           6 |
+-------------+
1 row in set (0.00 sec)

Here is the employee table with the new row. Notice that the numeric shop_id was defaulted to zero. All the other fields were set to NULL.

id	shop_id	gender	name	salary
1	1	m	Jon Simpson	4500
2	1	f	Barbara Breitenmoser	(NULL)
3	2	f	Kirsten Ruegg	5600
4	3	m	Ralph Teller	5100
5	3	m	Peter Jonson	5200
6	0	(NULL)	(NULL)	(NULL)

The following table definition is similar to the one used in the last article, but with a couple of different column attributes.

The shop_id field has been changed from NULL to NOT NULL.
The gender field has also been changed from NULL to NOT NULL and includes a default value.

CREATE TABLE "employees" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "shop_id" INT(11) NOT NULL,
  "gender" enum('m','f') DEFAULT 'm' NOT NULL,
  "name" varchar(32) DEFAULT NULL,
  "salary" int(11) DEFAULT NULL,
  PRIMARY KEY ("id")
);

Use the INSERT INTO...VALUES example above for multiple rows to populate the table.
Here is an example that sets the mandatory shop_id column, using the INSERT INTO...SET statement.

CREATE TABLE "employees" (
  "id" int(11) NOT NULL AUTO_INCREMENT,
  "shop_id" INT(11) NOT NULL,
  "gender" enum('m','f') DEFAULT 'm' NOT NULL,
  "name" varchar(32) DEFAULT NULL,
  "salary" int(11) DEFAULT NULL,
  PRIMARY KEY ("id")
);

We don’t need to include the row id column because it’s an auto_increment field. The gender will default to its default value of ‘m’, but the remaining two fields will remain NULL as no defaults have been assigned for those.

id	shop_id	gender	name	salary
1	1	M	Jon Simpson	4500
2	1	F	Barbara Breitenmoser	(NULL)
3	2	F	Kirsten Ruegg	5600
4	3	M	Ralph Teller	5100
5	3	M	Peter Jonson	5200
6	2	M	(NULL)	(NULL)

The following statement would fail in strict mode because the shop_id column was omitted and does not have a default value assigned.

mysql>INSERT INTO 'employees' SET name='Jon Simpson';
 
mysql>SQL Error: Field 'shop_id' doesn't have a default value.

Setting Column Values Using a SELECT Statement

There are many instances where it makes more sense to fetch column data from another source, such as another table. The INSERT...SELECT statement serves that purpose by combining the INSERT INTO...VALUES and SELECT statements. Here’s some SQL that populates the employee table with data from another table.

INSERT INTO 'employees' ('shop_id', 'gender', 'name', 'salary') 
SELECT 3,
       LEFT(gender, 1),
       CONCAT_WS(' ', first_name, last_name),
       salary
FROM   transferred_ employees
WHERE  transfer_date > '2008-01-01';

As is often the case with data transfers, some data massaging must be done to make it conform to the structure of the target table. In this case I used the LEFT() function to retrieve the first letter of the source gender field and CONCAT() to build the name column from the first and last name fields. In fact, all of the INSERT statement variations that we’ve looked at here today accept expressions as well as data literals. The expression can be the result of a function or a literal expression such as “3 * 12 – (6 + 1)”.
If you use an INSERT...VALUES statement with multiple value lists or INSERT ... SELECT, the statement returns an information string in this format:

Records: 3 Duplicates: 0 Warnings: 0

Handling Duplicates

MySQL provides a way to deal with duplicate keys using the ON DUPLICATE KEY UPDATE statement. Whenever a row is inserted that would cause a duplicate value in a UNIQUE index or PRIMARY KEY, an UPDATE of the existing row is performed.

INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;

Here’s an example to illustrate.

INSERT INTO mytable (primary_id, count) VALUES(5, 1) 
  ON DUPLICATE KEY UPDATE count = count + 1;

In the above example, if the value 5 doesn't exist in the primary_id column, it will be inserted. However, if a value of 5 already exists, an update will be made to the existing column. Note that, in the latter case, affected rows will return 2, which may not be what you would expect.
You can use the VALUES(col_name) function in the UPDATE clause to refer to column values from the INSERT portion of the INSERT...ON DUPLICATE KEY UPDATE statement. In other words, VALUES(col_name) in the ON DUPLICATE KEY UPDATE clause refers to the value of col_name that would be inserted, had no duplicate-key conflict occurred. This function is especially useful in multiple-row inserts.

INSERT INTO table (a,b,c) VALUES (1,2,3),(4,5,6)
  ON DUPLICATE KEY UPDATE c=VALUES(a)+VALUES(b);

If a table contains an auto_increment column and INSERT...ON DUPLICATE KEY UPDATE inserts or updates a row, the LAST_INSERT_ID() function returns the auto_increment value. If the statement updates a row instead, LAST_INSERT_ID() is not meaningful. However, you can work around this by passing an expression to the LAST_INSERT_ID() function. Suppose that id is the AUTO_INCREMENT column. To make LAST_INSERT_ID() meaningful when an update is performed, use the following code.

INSERT INTO table (email) VALUES (email_address) 
  ON DUPLICATE KEY UPDATE id=LAST_INSERT_ID(id);

There are other uses for the INSERT statement that you may be interested in, such as importing data from an XML document. The Working with XML Data in MySQL article contains some helpful information on using the INSERT statement for that purpose.

Tuesday 17 April 2018

Four Ways MySQL Executes GROUP BY

MySQL GROUP BY could potentially be responsible for 90% or more of the query execution time.

The main complexity when MySQL executes GROUP BY is computing aggregate functions in a GROUP BY statement. How this works is shown in the documentation for UDF Aggregate Functions. As we see, the requirement is that UDF functions get all values that constitute the single group one after another. That way, it can compute the aggregate function value for the single group before moving to another group.

The problem, of course, is that in most cases the source data values aren’t grouped. Values coming from a variety of groups follow one another during processing. As such, we need a special step to handle MySQL GROUP BY.

Let’s look at the same table we looked at before:

Shell

1

2

3

4

5

6

7

8

9

10

11

mysql> show create table tbl G

*************************** 1. row ***************************

      Table: tbl

Create Table: CREATE TABLE `tbl` (

 `id` int(11) NOT NULL AUTO_INCREMENT,

 `k` int(11) NOT NULL DEFAULT '0',

 `g` int(10) unsigned NOT NULL,

 PRIMARY KEY (`id`),

 KEY `k` (`k`)

) ENGINE=InnoDB AUTO_INCREMENT=2340933 DEFAULT CHARSET=latin1

1 row in set (0.00 sec)

And the same GROUP BY statements executed in different ways:

1: Index Ordered GROUP BY in MySQL

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

mysql> select k, count(*) c from tbl group by k order by k limit 5;

+---+---+

| k | c |

+---+---+

| 2 | 3 |

| 4 | 1 |

| 5 | 2 |

| 8 | 1 |

| 9 | 1 |

+---+---+

5 rows in set (0.00 sec)

mysql> explain select k, count(*) c from tbl group by k order by k limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: index

possible_keys: k

         key: k

     key_len: 4

         ref: NULL

        rows: 5

    filtered: 100.00

       Extra: Using index

1 row in set, 1 warning (0.00 sec)

In this case, we have an index on the column we use for GROUP BY. This way, we can just scan data group by group and perform GROUP BY on the fly (inexpensively).

It works especially well when we use LIMIT to restrict the number of groups we retrieve or when a “covering index” is in use, as a sequential index-only scan is a very fast operation.

If you have a small number of groups though, and no covering index, index order scans can cause a lot of IO. So this might not be the most optimal plan.

2: External Sort GROUP BY in MySQL

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

mysql> explain select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: ALL

possible_keys: NULL

         key: NULL

     key_len: NULL

         ref: NULL

        rows: 998490

    filtered: 100.00

       Extra: Using filesort

1 row in set, 1 warning (0.00 sec)

mysql> select SQL_BIG_RESULT g, count(*) c from tbl group by g limit 5;

+---+---+

| g | c |

+---+---+

| 0 | 1 |

| 1 | 2 |

| 4 | 1 |

| 5 | 1 |

| 6 | 2 |

+---+---+

5 rows in set (0.88 sec)

If we do not have an index that allows us to scan the data in group order, we can instead get data sorted through an external sort (also referred to as “filesort” in MySQL).

You may notice I’m using an SQL_BIG_RESULT hint here to get this plan. Without it, MySQL won’t choose this plan in this case.

In general, MySQL prefers to use this plan only if we have a large number of groups, because in this case sorting is more efficient than having a temporary table (which we will talk about next).

3: Temporary Table GROUP BY in MySQL

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: ALL

possible_keys: NULL

         key: NULL

     key_len: NULL

         ref: NULL

        rows: 998490

    filtered: 100.00

       Extra: Using temporary

1 row in set, 1 warning (0.00 sec)

mysql> select  g, sum(g) s from tbl group by g order by null limit 5;

+---+------+

| g | s    |

+---+------+

| 0 |    0 |

| 1 |    2 |

| 4 |    4 |

| 5 |    5 |

| 6 |   12 |

+---+------+

5 rows in set (7.75 sec)

In this case, MySQL also does a full table scan. But instead of running additional sort passes, it creates a temporary table instead. This temporary table contains one row per group, and with each incoming row the value for the corresponding group is updated. Lots of updates! While this might be reasonable in-memory, it becomes very expensive if the resulting table is so large that updates are going to cause a lot of disk IO. In this case, external sort plans are usually better.

Note that while MySQL selects this plan by default for this use case, if we do not supply any hints it is almost 10x slower than the plan we get using the SQL_BIG_RESULT hint.

You may notice I added “ORDER BY NULL” to this query. This is to show you “clean” the temporary table only plan. Without it, we get this plan:

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

mysql> explain select  g, sum(g) s from tbl group by g limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: ALL

possible_keys: NULL

         key: NULL

     key_len: NULL

         ref: NULL

        rows: 998490

    filtered: 100.00

       Extra: Using temporary; Using filesort

1 row in set, 1 warning (0.00 sec)

In it, we get the “worst of both worlds” with Using Temporary Table and filesort.

MySQL 5.7 always returns GROUP BY results sorted in group order, even if this the query doesn’t require it (which can then require an expensive additional sort pass). ORDER BY NULL signals the application doesn’t need this.

You should note that in some cases – such as JOIN queries with aggregate functions accessing columns from different tables – using temporary tables for GROUP BY might be the only option.

If you want to force MySQL to use a plan that does temporary tables for GROUP BY, you can use the SQL_SMALL_RESULT hint.

4: Index Skip-Scan-Based GROUP BY in MySQL

The previous three GROUP BY execution methods apply to all aggregate functions. Some of them, however, have a fourth method.

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

mysql> explain select k,max(id) from tbl group by k G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: range

possible_keys: k

         key: k

     key_len: 4

         ref: NULL

        rows: 2

    filtered: 100.00

       Extra: Using index for group-by

1 row in set, 1 warning (0.00 sec)

mysql> select k,max(id) from tbl group by k;

+---+---------+

| k | max(id) |

+---+---------+

| 0 | 2340920 |

| 1 | 2340916 |

| 2 | 2340932 |

| 3 | 2340928 |

| 4 | 2340924 |

+---+---------+

5 rows in set (0.00 sec)

This method applies only to very special aggregate functions: MIN() and MAX(). These do not really need to go through all the rows in the group to compute the value at all.

They can just jump to the minimum or maximum group value in the group directly (if there is such an index).

How can you find MAX(ID) value for each group if the index is only built on (K) column? This is an InnoDB table. Remember InnoDB tables effectively append the PRIMARY KEY to all indexes. (K) becomes (K,ID), allowing us to use Skip-Scan optimization for this query.

This optimization is only enabled if there is a large number of rows per group. Otherwise, MySQL prefers more conventional means to execute this query (like Index Ordered GROUP BY detailed in approach #1).

While we’re on MIN()/MAX() aggregate functions, other optimizations apply to them as well. For example, if you have an aggregate function with no GROUP BY (effectively having one group for all tables), MySQL fetches those values from indexes during a statistics analyzes phase and avoids reading tables during the execution stage altogether:

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

mysql> explain select max(k) from tbl G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: NULL

  partitions: NULL

        type: NULL

possible_keys: NULL

         key: NULL

     key_len: NULL

         ref: NULL

        rows: NULL

    filtered: NULL

       Extra: Select tables optimized away

1 row in set, 1 warning (0.00 sec)

Filtering and Group By

We have looked at four ways MySQL executes GROUP BY. For simplicity, I used GROUP BY on the whole table, with no filtering applied. The same concepts apply when you have a WHERE clause:

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

mysql> explain select  g, sum(g) s from tbl where k>4 group by g order by NULL limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: range

possible_keys: k

         key: k

     key_len: 4

         ref: NULL

        rows: 1

    filtered: 100.00

       Extra: Using index condition; Using temporary

1 row in set, 1 warning (0.00 sec)

For this case, we use the range on the K column for data filtering/lookup and do a GROUP BY when there is a temporary table.

In some cases, the methods do not conflict. In others, however, we have to choose either to use one index for GROUP BY or another index for filtering:

Shell

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

mysql> alter table tbl add key(g);

Query OK, 0 rows affected (4.17 sec)

Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select  g, sum(g) s from tbl where k>1 group by g limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: index

possible_keys: k,g

         key: g

     key_len: 4

         ref: NULL

        rows: 16

    filtered: 50.00

       Extra: Using where

1 row in set, 1 warning (0.00 sec)

mysql> explain select  g, sum(g) s from tbl where k>4 group by g limit 5 G

*************************** 1. row ***************************

          id: 1

 select_type: SIMPLE

       table: tbl

  partitions: NULL

        type: range

possible_keys: k,g

         key: k

     key_len: 4

         ref: NULL

        rows: 1

    filtered: 100.00

       Extra: Using index condition; Using temporary; Using filesort

1 row in set, 1 warning (0.00 sec)

Depending on specific constants used in this query, we can see that we either use an index ordered scan for GROUP BY (and “give up” benefiting from the index to resolve the WHERE clause), or use an index to resolve the WHERE clause (but use a temporary table to resolve GROUP BY).

In my experience, this is where MySQL GROUP BY does not always make the right choice. You might need to use FORCE INDEX to execute queries the way you want them to.

Summary

I hope this article provides a good overview of how MySQL executes GROUP BY. In my next blog post, we will look into techniques you can use to optimize GROUP BY queries.

Wednesday 18 April 2018

1. Index Search Fields

2. Avoid Using "SELECT *" in your Queries

3. Set a password for the "root" user and then rename the user.

4. Tune your Queries with EXPLAIN

5. Index and Use Same Column Types for Joins

6. LIMIT 1 When Getting a Unique Row

7. Hide MySQL from the Internet

8. Use the Smallest Data Types Possible

9. Create Views to Simplify Commonly-used Table Joins

10. Take Advantage of Query Caching

Inserting Multiple Rows

Populating Specific Columns

Setting Column Values Using a SELECT Statement

Handling Duplicates

Tuesday 17 April 2018

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains