Mixed ASC/DESC sorting in MySQL ~ Tech Blog

Tuesday, 17 July 2018

Mixed ASC/DESC sorting in MySQL

I have a table of blog posts, each with a foreign key back to it's author. There are less than 15,000entries in this table.

This query scans over 19,000 rows (per EXPLAIN), requires a filesort (that might be regular MySQLbehavior), and takes over 400ms to return 5 rows. possibly because of the complicated WHERE used to check if the item is actually published.

view source print ?

01.SELECT  blog_post.id,

02.blog_post.title,

03.blog_post.author_id,

04.blog_post.has_been_fact_checked,

05.blog_post.published_date,

06.blog_post.ordering,

07.auth_user.username,

08.auth_user.email

09.FROM    blog_post

10.INNER JOIN

11.auth_user

12.ON      auth_user.id = blog_post.author_id

13.WHERE   blog_post.is_approved = 1

14.AND blog_post.has_been_fact_checked = 1

15.AND blog_post.published_date <=  '2010-10-25 22:40:05'

16.ORDER BY

17.blog_post.published_date DESC,

18.blog_post.ordering ASC,

19.blog_post.id DESC

20.LIMIT 5

How can I wrangle this query under control?

This query is quite simple: a filtering condition with two equalities and a range and an order by. The range in the filter fits the ORDER BY perfectly, so the whole query could be executed using an index scan without any filesorts.

The only problem is that we have the mixed directions in ORDER BY, and MySQL does not support ASC / DESCclauses for the indexes.

With a simple table redesign, the problem could easily be solved: we would just need to reverse the order in the ordering column, say, by creating its copy and storing negative values in it. This way, we could just create a composite index (with all columns ascending) and rewrite the query to use the reverse column instead. That's what many engines do, MediaWiki (which Wikipedia runs on) being one of the most well-known examples.

This solution is nice, I hear people saying, but requires a database redesign which as we all know is never as simple as some development pundits on their blogs seem to think.

OK, this is a good point. Let's see what we can do with the current design, and, as always, create a sample table to do it:

Table creation details

There are 1,000 authors and 500,000 blog posts in the respective tables. The blog posts intentionally have duplicates on published_date.

The original query:

SELECT  blog_post.id,

blog_post.title,

blog_post.author_id,

blog_post.has_been_fact_checked,

blog_post.published_date,

blog_post.ordering,

auth_user.username,

auth_user.email

FROM    blog_post

INNER JOIN

auth_user

ON      auth_user.id = blog_post.author_id

WHERE   blog_post.is_approved = 1

AND blog_post.has_been_fact_checked = 1

AND blog_post.published_date <= '2010-10-25 22:40:05'

ORDER BY

blog_post.published_date DESC,

blog_post.ordering ASC,

blog_post.id DESC

LIMIT 5

id	title	author_id	has_been_fact_checked	published_date	ordering	username	email
156410	Post 156410	984	1	2010-10-25 22:00:00	1	author984	author984@example.com
451417	Post 451417	140	1	2010-10-25 22:00:00	8	author140	author140@example.com
262749	Post 262749	719	1	2010-10-25 22:00:00	19	author719	author719@example.com
415157	Post 415157	430	1	2010-10-25 22:00:00	22	author430	author430@example.com
307578	Post 307578	611	1	2010-10-25 22:00:00	72	author611	author611@example.com
5 rows fetched in 0.0004s (0.4558s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	SIMPLE	blog_post	ref	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	2	const,const	189693	100.00	Using where; Using filesort
1	SIMPLE	auth_user	eq_ref	PRIMARY	PRIMARY	4	20101102_desc.blog_post.author_id	1	100.00

select `20101102_desc`.`blog_post`.`id` AS `id`,`20101102_desc`.`blog_post`.`title` AS `title`,`20101102_desc`.`blog_post`.`author_id` AS `author_id`,`20101102_desc`.`blog_post`.`has_been_fact_checked` AS `has_been_fact_checked`,`20101102_desc`.`blog_post`.`published_date` AS `published_date`,`20101102_desc`.`blog_post`.`ordering` AS `ordering`,`20101102_desc`.`auth_user`.`username` AS `username`,`20101102_desc`.`auth_user`.`email` AS `email` from `20101102_desc`.`blog_post` join `20101102_desc`.`auth_user` where ((`20101102_desc`.`auth_user`.`id` = `20101102_desc`.`blog_post`.`author_id`) and (`20101102_desc`.`blog_post`.`has_been_fact_checked` = 1) and (`20101102_desc`.`blog_post`.`is_approved` = 1) and (`20101102_desc`.`blog_post`.`published_date` <= '2010-10-25 22:40:05')) order by `20101102_desc`.`blog_post`.`published_date` desc,`20101102_desc`.`blog_post`.`ordering`,`20101102_desc`.`blog_post`.`id` desc limit 5

As the asker said, this query takes 455 ms and the filesort is used.

Now, as we cannot use a single index on mixed ASC / DESC ordering conditions, we should squeeze the most from our existing index.

The longest common prefix of the index and the ORDER BY expression (which implicitly includes all columns from the WHERE clause) is (is_approved, has_been_fact_checked, published_date).

What does it mean? It means that the chunks of data sharing the distinct values of these three columns will always go in the same order both in the index and in the query, though the order may and will differ within each chunk.

See the queries below, first one with the initial ORDER BY sorting:

SELECT  published_date, id, ordering

FROM    blog_post

WHERE   is_approved = 1

AND has_been_fact_checked = 1

ORDER BY

published_date DESC,

ordering ASC,

id DESC

LIMIT 16

published_date	id	ordering
2010-11-02 00:00:00	111429	97

2010-11-01 23:00:00	48638	6
2010-11-01 23:00:00	245245	7
2010-11-01 23:00:00	153099	19
2010-11-01 23:00:00	334624	33
2010-11-01 23:00:00	200844	45
2010-11-01 23:00:00	163114	52
2010-11-01 23:00:00	312847	54
2010-11-01 23:00:00	205531	72
2010-11-01 23:00:00	265795	94

2010-11-01 22:00:00	409475	24
2010-11-01 22:00:00	213483	45
2010-11-01 22:00:00	72571	57

2010-11-01 21:00:00	102956	32
2010-11-01 21:00:00	422579	48
2010-11-01 21:00:00	280297	98

, the second one with the index sorting:

SELECT  published_date, id, ordering

FROM    blog_post

WHERE   is_approved = 1

AND has_been_fact_checked = 1

ORDER BY

published_date DESC,

ordering DESC,

id DESC

LIMIT 16

published_date	id	ordering
2010-11-02 00:00:00	111429	97

2010-11-01 23:00:00	265795	94
2010-11-01 23:00:00	205531	72
2010-11-01 23:00:00	312847	54
2010-11-01 23:00:00	163114	52
2010-11-01 23:00:00	200844	45
2010-11-01 23:00:00	334624	33
2010-11-01 23:00:00	153099	19
2010-11-01 23:00:00	245245	7
2010-11-01 23:00:00	48638	6

2010-11-01 22:00:00	72571	57
2010-11-01 22:00:00	213483	45
2010-11-01 22:00:00	409475	24

2010-11-01 21:00:00	280297	98
2010-11-01 21:00:00	422579	48
2010-11-01 21:00:00	102956	32

As you can see, the chunks of data occupy the same places within both recordsets. Post dated 2010-11-02 00:00:00 is at row 1. Those dated 2010-11-01 23:00:00 are at rows 2 — 10 etc. The orders differ only within the chunks, but not between chunks.

This means that if we select the same chunks as the original query does and reorder them, we get the same order.

And the chunks we can select quite efficiently using the index, since their order, as we shown above, is the same for both queries.

The original query had LIMIT 5, this means we have to select at most 5 chunks. To do this, we just need to count off 5 distinct dates satisfying the condition.

This can be done using a technique I described in one of my earlier posts:

20 latest unique records

Selecting 5 dates

Here's a query to select 5 distinct dates less than the given one:

SELECT  bp.published_date

FROM    blog_post bp

WHERE   bp.is_approved = 1

AND bp.has_been_fact_checked = 1

AND bp.published_date < '2010-10-25 22:40:05'

AND bp.id =

(

SELECT  id

FROM    blog_post bpi

WHERE   bpi.is_approved = 1

AND bpi.has_been_fact_checked = 1

AND bpi.published_date = bp.published_date

ORDER BY

bpi.is_approved DESC, bpi.has_been_fact_checked DESC, bpi.published_date DESC

LIMIT 1

)

ORDER BY

bp.is_approved DESC, bp.has_been_fact_checked DESC, bp.published_date DESC

LIMIT 5

published_date
2010-10-25 22:00:00
2010-10-25 21:00:00
2010-10-25 20:00:00
2010-10-25 19:00:00
2010-10-25 18:00:00
5 rows fetched in 0.0001s (0.0018s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	bp	range	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11		189693	100.00	Using where; Using index
2	DEPENDENT SUBQUERY	bpi	ref	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11	const,const,20101102_desc.bp.published_date	1	100.00	Using where; Using index

Field or reference '20101102_desc.bp.published_date' of SELECT #2 was resolved in SELECT #1
select `20101102_desc`.`bp`.`published_date` AS `published_date` from `20101102_desc`.`blog_post` `bp` where ((`20101102_desc`.`bp`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bp`.`is_approved` = 1) and (`20101102_desc`.`bp`.`published_date` < '2010-10-25 22:40:05') and (`20101102_desc`.`bp`.`id` = (select `20101102_desc`.`bpi`.`id` from `20101102_desc`.`blog_post` `bpi` where ((`20101102_desc`.`bpi`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpi`.`is_approved` = 1) and (`20101102_desc`.`bpi`.`published_date` = `20101102_desc`.`bp`.`published_date`)) order by `20101102_desc`.`bpi`.`is_approved` desc,`20101102_desc`.`bpi`.`has_been_fact_checked` desc,`20101102_desc`.`bpi`.`published_date` desc limit 1))) order by `20101102_desc`.`bp`.`is_approved` desc,`20101102_desc`.`bp`.`has_been_fact_checked` desc,`20101102_desc`.`bp`.`published_date` desc limit 5

This uses the index access and is very efficient.

Selecting chunks of data

Now we need to select all records within the chunks defined by the dates. This can be done with a simple BETWEEN condition. The upper bound of the BETWEEN will be the date that we used as the parameter to the query, and the lower one will be the last date of those selected on the previous stage.

Since it should be a scalar, we will just replace LIMIT 5 (which returns 5 records) with LIMIT 4, 1 (which returns the last of these 5 records) and put it into a scalar subquery:

SELECT  *

FROM    blog_post bpo

WHERE   is_approved = 1

AND has_been_fact_checked = 1

AND published_date BETWEEN

(

SELECT  bp.published_date

FROM    blog_post bp

WHERE   bp.is_approved = 1

AND bp.has_been_fact_checked = 1

AND bp.published_date < '2010-10-25 22:40:05'

AND bp.id =

(

SELECT  id

FROM    blog_post bpi

WHERE   bpi.is_approved = 1

AND bpi.has_been_fact_checked = 1

AND bpi.published_date = bp.published_date

ORDER BY

bpi.is_approved DESC, bpi.has_been_fact_checked DESC, bpi.published_date DESC

LIMIT 1

)

ORDER BY

bp.is_approved DESC, bp.has_been_fact_checked DESC, bp.published_date DESC

LIMIT 4, 1

)

AND '2010-10-25 22:40:05'

id	title	is_approved	has_been_fact_checked	published_date	ordering	author_id
418694	Post 418694	1	1	2010-10-25 18:00:00	16	579
354679	Post 354679	1	1	2010-10-25 18:00:00	44	733
480570	Post 480570	1	1	2010-10-25 18:00:00	70	62
402339	Post 402339	1	1	2010-10-25 18:00:00	85	498
282148	Post 282148	1	1	2010-10-25 18:00:00	87	152
358065	Post 358065	1	1	2010-10-25 18:00:00	88	301

328191	Post 328191	1	1	2010-10-25 19:00:00	33	627
95345	Post 95345	1	1	2010-10-25 19:00:00	39	497
73886	Post 73886	1	1	2010-10-25 19:00:00	77	1

29391	Post 29391	1	1	2010-10-25 20:00:00	13	259
632	Post 632	1	1	2010-10-25 20:00:00	15	952

93117	Post 93117	1	1	2010-10-25 21:00:00	65	773
278821	Post 278821	1	1	2010-10-25 21:00:00	72	49

156410	Post 156410	1	1	2010-10-25 22:00:00	1	984
451417	Post 451417	1	1	2010-10-25 22:00:00	8	140
262749	Post 262749	1	1	2010-10-25 22:00:00	19	719
415157	Post 415157	1	1	2010-10-25 22:00:00	22	430
225161	Post 225161	1	1	2010-10-25 22:00:00	72	398
307578	Post 307578	1	1	2010-10-25 22:00:00	72	611
426932	Post 426932	1	1	2010-10-25 22:00:00	92	64
20 rows fetched in 0.0012s (0.0041s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	bpo	range	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11		20	75.00	Using where
2	SUBQUERY	bp	range	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11		189693	100.00	Using where; Using index
3	DEPENDENT SUBQUERY	bpi	ref	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11	20101102_desc.bp.published_date	1	100.00	Using where; Using index

Field or reference '20101102_desc.bp.published_date' of SELECT #3 was resolved in SELECT #2
select `20101102_desc`.`bpo`.`id` AS `id`,`20101102_desc`.`bpo`.`title` AS `title`,`20101102_desc`.`bpo`.`is_approved` AS `is_approved`,`20101102_desc`.`bpo`.`has_been_fact_checked` AS `has_been_fact_checked`,`20101102_desc`.`bpo`.`published_date` AS `published_date`,`20101102_desc`.`bpo`.`ordering` AS `ordering`,`20101102_desc`.`bpo`.`author_id` AS `author_id` from `20101102_desc`.`blog_post` `bpo` where ((`20101102_desc`.`bpo`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpo`.`is_approved` = 1) and (`20101102_desc`.`bpo`.`published_date` between (select `20101102_desc`.`bp`.`published_date` from `20101102_desc`.`blog_post` `bp` where ((`20101102_desc`.`bp`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bp`.`is_approved` = 1) and (`20101102_desc`.`bp`.`published_date` < '2010-10-25 22:40:05') and (`20101102_desc`.`bp`.`id` = (select `20101102_desc`.`bpi`.`id` from `20101102_desc`.`blog_post` `bpi` where ((`20101102_desc`.`bpi`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpi`.`is_approved` = 1) and (`20101102_desc`.`bpi`.`published_date` = `20101102_desc`.`bp`.`published_date`)) order by `20101102_desc`.`bpi`.`is_approved` desc,`20101102_desc`.`bpi`.`has_been_fact_checked` desc,`20101102_desc`.`bpi`.`published_date` desc limit 1))) order by `20101102_desc`.`bp`.`is_approved` desc,`20101102_desc`.`bp`.`has_been_fact_checked` desc,`20101102_desc`.`bp`.`published_date` desc limit 4,1) and '2010-10-25 22:40:05'))

This query returns (almost instantly) 20 records in 5 chunks of different size (split for readability). Note that the chunks are arranged in order reverse to the order the dates were returned on the previous step: we omitted the ORDER BY clause and, by default, the index is traversed in forward (ASC) direction.

Ordering and joining

Now, all that's left to do is to add the correct ORDER BY condition to rearrange the data within the chunks, join the authors table and apply the LIMIT:

SELECT  bpo.id,

bpo.title,

bpo.author_id,

bpo.has_been_fact_checked,

bpo.published_date,

bpo.ordering,

au.username,

au.email

FROM    blog_post bpo

JOIN    auth_user au

ON      au.id = bpo.author_id

WHERE   bpo.is_approved = 1

AND bpo.has_been_fact_checked = 1

AND bpo.published_date BETWEEN

(

SELECT  bp.published_date

FROM    blog_post bp

WHERE   bp.is_approved = 1

AND bp.has_been_fact_checked = 1

AND bp.published_date < '2010-10-25 22:40:05'

AND bp.id =

(

SELECT  id

FROM    blog_post bpi

WHERE   bpi.is_approved = 1

AND bpi.has_been_fact_checked = 1

AND bpi.published_date = bp.published_date

ORDER BY

bpi.is_approved DESC, bpi.has_been_fact_checked DESC, bpi.published_date DESC

LIMIT 1

)

ORDER BY

bp.is_approved DESC, bp.has_been_fact_checked DESC, bp.published_date DESC

LIMIT 4, 1

)

AND '2010-10-25 22:40:05'

ORDER BY

bpo.published_date DESC,

bpo.ordering ASC,

bpo.id DESC

LIMIT 5

id	title	author_id	has_been_fact_checked	published_date	ordering	username	email
156410	Post 156410	984	1	2010-10-25 22:00:00	1	author984	author984@example.com
451417	Post 451417	140	1	2010-10-25 22:00:00	8	author140	author140@example.com
262749	Post 262749	719	1	2010-10-25 22:00:00	19	author719	author719@example.com
415157	Post 415157	430	1	2010-10-25 22:00:00	22	author430	author430@example.com
307578	Post 307578	611	1	2010-10-25 22:00:00	72	author611	author611@example.com
5 rows fetched in 0.0004s (0.0044s)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	filtered	Extra
1	PRIMARY	bpo	range	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11		20	100.00	Using where; Using filesort
1	PRIMARY	au	eq_ref	PRIMARY	PRIMARY	4	20101102_desc.bpo.author_id	1	100.00
2	SUBQUERY	bp	range	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11		189693	100.00	Using where; Using index
3	DEPENDENT SUBQUERY	bpi	ref	ix_blogpost_approved_checked_published_ordering_id	ix_blogpost_approved_checked_published_ordering_id	11	20101102_desc.bp.published_date	1	100.00	Using where; Using index

Field or reference '20101102_desc.bp.published_date' of SELECT #3 was resolved in SELECT #2
select `20101102_desc`.`bpo`.`id` AS `id`,`20101102_desc`.`bpo`.`title` AS `title`,`20101102_desc`.`bpo`.`author_id` AS `author_id`,`20101102_desc`.`bpo`.`has_been_fact_checked` AS `has_been_fact_checked`,`20101102_desc`.`bpo`.`published_date` AS `published_date`,`20101102_desc`.`bpo`.`ordering` AS `ordering`,`20101102_desc`.`au`.`username` AS `username`,`20101102_desc`.`au`.`email` AS `email` from `20101102_desc`.`blog_post` `bpo` join `20101102_desc`.`auth_user` `au` where ((`20101102_desc`.`au`.`id` = `20101102_desc`.`bpo`.`author_id`) and (`20101102_desc`.`bpo`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpo`.`is_approved` = 1) and (`20101102_desc`.`bpo`.`published_date` between (select `20101102_desc`.`bp`.`published_date` from `20101102_desc`.`blog_post` `bp` where ((`20101102_desc`.`bp`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bp`.`is_approved` = 1) and (`20101102_desc`.`bp`.`published_date` < '2010-10-25 22:40:05') and (`20101102_desc`.`bp`.`id` = (select `20101102_desc`.`bpi`.`id` from `20101102_desc`.`blog_post` `bpi` where ((`20101102_desc`.`bpi`.`has_been_fact_checked` = 1) and (`20101102_desc`.`bpi`.`is_approved` = 1) and (`20101102_desc`.`bpi`.`published_date` = `20101102_desc`.`bp`.`published_date`)) order by `20101102_desc`.`bpi`.`is_approved` desc,`20101102_desc`.`bpi`.`has_been_fact_checked` desc,`20101102_desc`.`bpi`.`published_date` desc limit 1))) order by `20101102_desc`.`bp`.`is_approved` desc,`20101102_desc`.`bp`.`has_been_fact_checked` desc,`20101102_desc`.`bp`.`published_date` desc limit 4,1) and '2010-10-25 22:40:05')) order by `20101102_desc`.`bpo`.`published_date` desc,`20101102_desc`.`bpo`.`ordering`,`20101102_desc`.`bpo`.`id` desc limit 5

The records have been rearranged (using a filesort) and a LIMIT was applied.

Since the records are only taken from 5 chunks with 20 records in total, the filesort is not a problem at all and the query completes in only 4 ms.

Conclusion

MySQL does not support ASC / DESC clauses in the indices.

However, with an ORDER BY / LIMIT condition that mixes the orders of columns, it is possible to use an index.

To do that, we should limit the recordset to the minimal set that would still retain the order on the leading columns that share one direction. This can be done by selecting distinct set of those leading columns up to the original limit, and filtering on them using the index.

Then we can revert to a filesort to order this (much smaller) set on the remaining columns, which would be much more efficient.

Tech Blog

Tuesday, 17 July 2018