I'm trying to optimize quickly optimize the search functionality of some outdated forum software written in PHP. I've got my work down to a query that looks like this:
SELECT thread.threadid
FROM thread AS thread
INNER JOIN word AS word ON (word.title LIKE 'word1' OR word.title LIKE 'word2')
INNER JOIN postindex AS postindex ON (postindex.wordid = word.wordid)
INNER JOIN post AS postquery ON (postquery.postid = postindex.postid)
WHERE thread.threadid = postquery.threadid
GROUP BY thread.threadid
HAVING COUNT(DISTINCT word.wordid) = 2
LIMIT 25;
word1
and word2
are examples; there could be any number of words. The number at the very end of the query is the total number of words. The idea is that a thread most contain all words in the search query, spread out over any number of posts.
This query often exceeds 60 seconds with only two words, and times out. I'm stumped; I can't figure out how to further optimize this horrid search engine.
As far as I can tell, everything is indexed properly, and I've run
ANALYZE
recently. Most of the database is running on InnoDB. Here's the output of EXPLAIN
:+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
| 1 | SIMPLE | word | range | PRIMARY,title | title | 150 | NULL | 2 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | postindex | ref | wordid,temp_ix | temp_ix | 4 | database1.word.wordid | 3 | Using index condition |
| 1 | SIMPLE | postquery | eq_ref | PRIMARY,threadid,showthread | PRIMARY | 4 | database1.postindex.postid | 1 | NULL |
| 1 | SIMPLE | thread | eq_ref | PRIMARY,forumid,postuserid,pollid,title,lastpost,dateline,prefixid,tweeted,firstpostid | PRIMARY | 4 | database1.postquery.threadid | 1 | Using index |
+----+-------------+-----------+--------+----------------------------------------------------------------------------------------+---------+---------+------------------------------+------+-----------------------------------------------------------+
Update
LIMIT 25
doesn't seem to be helping much. It shaves off maybe second from a query that normally returns hundreds of results.Clarification
The part that's slowing down MySQL is the
GROUP BY ... HAVING ...
bit. With GROUP BY
, the LIMIT
is pretty much useless for improving performance. Without GROUP BY
, and as long as the LIMIT
remains, the queries are quite speedy.SQL Info
Output of
SHOW CREATE TABLE postindex;
:CREATE TABLE `postindex` (
`wordid` int(10) unsigned NOT NULL DEFAULT '0',
`postid` int(10) unsigned NOT NULL DEFAULT '0',
`intitle` smallint(5) unsigned NOT NULL DEFAULT '0',
`score` smallint(5) unsigned NOT NULL DEFAULT '0',
UNIQUE KEY `wordid` (`wordid`,`postid`),
KEY `temp_ix` (`wordid`),
KEY `postid` (`postid`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I didn't make the table, so I have no idea why there's a duplicate index on wordid; however, I'm not willing to delete it, since this is ancient, fickle software.
You can try several rewrites and compare execution plan and times.
Using 2
EXISTS
subqueries (one for each word to be checked):SELECT t.threadid
FROM thread AS t
WHERE EXISTS
( SELECT 1
FROM post AS p
JOIN postindex AS pi
ON pi.postid = p.postid
JOIN word AS w
ON pi.wordid = w.wordid
WHERE w.title = 'word1'
AND t.threadid = p.threadid
)
AND EXISTS
( SELECT 1
FROM post AS p
JOIN postindex AS pi
ON pi.postid = p.postid
JOIN word AS w
ON pi.wordid = w.wordid
WHERE w.title = 'word2'
AND t.threadid = p.threadid
) ;
Using one
EXISTS
subquery:SELECT t.threadid
FROM thread AS t
WHERE EXISTS
( SELECT 1
FROM post AS p1
JOIN postindex AS pi1
ON pi1.postid = p1.postid
JOIN word AS w1
ON w1.wordid = pi1.wordid
AND w1.title = 'word1'
JOIN post AS p2
ON p2.threadid = p1.threadid
JOIN postindex AS pi2
ON pi2.postid = p2.postid
JOIN word AS w2
ON w2.wordid = pi2.wordid
AND w2.title = 'word2'
WHERE t.threadid = p1.threadid
AND t.threadid = p2.threadid
) ;
A single query with many joins and
GROUP BY
only to remove the duplicate threadid
:SELECT t.threadid
FROM thread AS t
JOIN post AS p1
ON p1.threadid = t.threadid
JOIN postindex AS pi1
ON pi1.postid = p1.postid
JOIN word AS w1
ON w1.wordid = pi1.wordid
AND w1.title = 'word1'
JOIN post AS p2
ON p1.threadid = t.threadid
JOIN postindex AS pi2
ON pi2.postid = p2.postid
JOIN word AS w2
ON w2.wordid = pi2.wordid
AND w2.title = 'word2'
WHERE p1.threadid = p2.threadid -- this line is redundant
GROUP BY t.threadid ;
0 comments:
Post a Comment