Tuesday 4 September 2018

Optimize a MySQL query with multiple joins

One of the queries used by a web app we're running is as follows:

SELECT
       p.id, r.id AS report_id, tr.result_id,
       r.report_date, r.department, r.reportStatus, rs.specimen,
       tr.name, tr.value, tr.flag, tr.unit, tr.reference_range
FROM patients AS p
INNER JOIN
    patients_reports AS pr ON pr.patient_id = p.id
INNER JOIN
    reports AS r ON pr.report_id = r.id
INNER JOIN
    results AS rs ON r.id = rs.report_id
INNER JOIN
    test_results AS tr ON rs.id = tr.result_id
WHERE pr.patient_id = 17548
ORDER BY rs.specimen, tr.name, r.report_date;

The explain plan looks like this:
+----+-------------+-------+--------+---------------+-----------+---------+-------------------+--------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref               | rows   | Extra                                        |
+----+-------------+-------+--------+---------------+-----------+---------+-------------------+--------+----------------------------------------------+
|  1 | SIMPLE      | p     | const  | PRIMARY       | PRIMARY   | 4       | const             |      1 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | rs    | ALL    | PRIMARY       | NULL      | NULL    | NULL              | 152817 |                                              |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY       | PRIMARY   | 4       | demo.rs.report_id |      1 |                                              |
|  1 | SIMPLE      | pr    | eq_ref | PRIMARY       | PRIMARY   | 8       | const,demo.r.id   |      1 | Using where; Using index                     |
|  1 | SIMPLE      | tr    | ref    | result_id     | result_id | 5       | demo.rs.id        |      1 | Using where                                  |
+----+-------------+-------+--------+---------------+-----------+---------+-------------------+--------+----------------------------------------------+

The query returns 27371 rows. There are 152730 rows in test_results at the moment. This is just a small amount of demo data.
I've tried to get the query to be more efficient, but I'm having trouble getting it to execute more quickly. I've had a look at various articles on documentation and questions on stackoverflow, but have not been able to fix this.
I tried removing one of the joins as follows:
SELECT
       pr.patient_id, r.id AS report_id, tr.result_id,
       r.report_date, r.department, r.reportStatus, rs.specimen,
       tr.name, tr.value, tr.flag, tr.unit, tr.reference_range
FROM patients_reports AS pr
INNER JOIN
    reports AS r ON pr.report_id = r.id
INNER JOIN
    results AS rs ON r.id = rs.report_id
INNER JOIN
    test_results AS tr ON rs.id = tr.result_id
WHERE pr.patient_id = 17548
ORDER BY rs.specimen, tr.name, r.report_date;

The query plan is then as follows:
+----+-------------+-------+--------+---------------+-----------+---------+-------------------+--------+---------------------------------+
| id | select_type | table | type   | possible_keys | key       | key_len | ref               | rows   | Extra                           |
+----+-------------+-------+--------+---------------+-----------+---------+-------------------+--------+---------------------------------+
|  1 | SIMPLE      | rs    | ALL    | PRIMARY       | NULL      | NULL    | NULL              | 152817 | Using temporary; Using filesort |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY       | PRIMARY   | 4       | demo.rs.report_id |      1 |                                 |
|  1 | SIMPLE      | pr    | eq_ref | PRIMARY       | PRIMARY   | 8       | const,demo.r.id   |      1 | Using where; Using index        |
|  1 | SIMPLE      | tr    | ref    | result_id     | result_id | 5       | demo.rs.id        |      1 | Using where                     |
+----+-------------+-------+--------+---------------+-----------+---------+-------------------+--------+---------------------------------+

So not much different.
I've tried rearranging the query and using STRAIGHT_JOIN amongst other things, but I'm not getting anywhere.
I'd appreciate some suggestions on how to optimize the query. Thanks.
EDIT: Argh! I did not have an index on results.report_id, but it does not seem to have helped:
+----+-------------+-------+--------+-------------------+-----------+---------+-------------------+--------+---------------------------------+
| id | select_type | table | type   | possible_keys     | key       | key_len | ref               | rows   | Extra                           |
+----+-------------+-------+--------+-------------------+-----------+---------+-------------------+--------+---------------------------------+
|  1 | SIMPLE      | rs    | ALL    | PRIMARY,report_id | NULL      | NULL    | NULL              | 152817 | Using temporary; Using filesort |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY           | PRIMARY   | 4       | demo.rs.report_id |      1 |                                 |
|  1 | SIMPLE      | pr    | eq_ref | PRIMARY           | PRIMARY   | 8       | const,demo.r.id   |      1 | Using where; Using index        |
|  1 | SIMPLE      | tr    | ref    | result_id         | result_id | 5       | demo.rs.id        |      1 | Using where                     |
+----+-------------+-------+--------+-------------------+-----------+---------+-------------------+--------+---------------------------------+

EDIT2:
patients_reports looks like this:
+------------+---------+------+-----+---------+-------+
| Field      | Type    | Null | Key | Default | Extra |
+------------+---------+------+-----+---------+-------+
| patient_id | int(11) | NO   | PRI | 0       |       |
| report_id  | int(11) | NO   | PRI | 0       |       |
+------------+---------+------+-----+---------+-------+

EDIT3:
After adding the results.report_id index and trying the STRAIGHT_JOIN again as suggested by @DRapp:
SELECT STRAIGHT_JOIN
       r.id AS report_id, tr.result_id,
       r.report_date, r.department, r.reportStatus, rs.specimen,
       tr.name, tr.value, tr.flag, tr.unit, tr.reference_range
FROM patients_reports AS pr
INNER JOIN
    reports AS r ON pr.report_id = r.id
INNER JOIN
    results AS rs ON r.id = rs.report_id
INNER JOIN
    test_results AS tr ON rs.id = tr.result_id
WHERE pr.patient_id = 17548
ORDER BY rs.specimen, tr.name, r.report_date;

the plan looks like this:
+----+-------------+-------+--------+-------------------+-----------+---------+-------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys     | key       | key_len | ref               | rows | Extra                                        |
+----+-------------+-------+--------+-------------------+-----------+---------+-------------------+------+----------------------------------------------+
|  1 | SIMPLE      | pr    | ref    | PRIMARY           | PRIMARY   | 4       | const             | 3646 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | r     | eq_ref | PRIMARY           | PRIMARY   | 4       | demo.pr.report_id |    1 |                                              |
|  1 | SIMPLE      | rs    | ref    | PRIMARY,report_id | report_id | 5       | demo.r.id         |  764 | Using where                                  |
|  1 | SIMPLE      | tr    | ref    | result_id         | result_id | 5       | demo.rs.id        |    1 | Using where                                  |
+----+-------------+-------+--------+-------------------+-----------+---------+-------------------+------+----------------------------------------------+

So I think that looks much better, but I'm not sure exactly how to tell. Also the query still seems to take about the same about of time as before.

Is results.report_id indexed? It's failing to find a key and doing a table scan it looks like. I'm assuming results.id is actually the primary key.
Also, if it report_id was the primary key, and it's INNODB, it should be clustered on that index, so absolutely no clue why that isn't screaming fast if it is configured that way.

0 comments:

Post a Comment