Tuesday 4 September 2018

LEFT JOIN does not work to replace a NOT EXISTS MYSQL

I have a table called Documents and a table called Notes that stores notes for the Documents. I need to get all Documents where there are no notes that have a status of 2 or 3.

SELECT * FROM Documents
   WHERE NOT EXISTS (
      SELECT docID FROM Notes WHERE docId = id AND status IN (2, 3)
   )

This is extremely slow but it works. I tried doing an Inner join but if just one note has a status other than 2 or 3, it still shows the Document. I need it to only show Documents where there is no occurrence of 2 or 3 in any of the notes.
Can anyone help!? Thanks!

An anti-join is a familiar SQL pattern.
With that pattern, we use an outer join operation to return all rows, along with matching rows, including rows that don't have a match. The "trick" is to use a predicate in the WHERE clause to filter out all of the rows that found a match, leaving only rows that didn't have a match.
As an example, to retrieve rows from Documents that don't have any matching row in Notes that meet specified criteria:
SELECT d.*
  FROM Documents d
  LEFT
  JOIN Notes n
    ON n.docId = d.id
   AND n.status IN (2,3)
 WHERE n.docId IS NULL

(I'm guessing that docId and status are references to columns in Notes, and that idis a reference to a column in Documents. The column references in your query aren't qualified, so that leaves us guessing which columns are in which table. Best practice is to qualify all column references in a query that references more than one table. One big benefit is that it makes it possible to decipher the statement without having to look at the table definitions, to figure out which columns are coming from which table.)
That query will return rows from Documents where there isn't any "matching" row in Notes that has a status of 2 or 3.
Because the LEFT JOIN is an outer join, it returns all rows from Documents, along with matching rows from Notes. Any rows from Documents that don't have a matching row will be also be returned, with NULL values for the columns from Notes. The equality predicate in the join (n.docId = d.Id) guarantees us that any "matching" row will have a non-NULL value for docId.
The "trick" is the predicate in the WHERE clause: n.docId IS NULL
Any rows that had a match will be filtered out, so we're left with rows from Documentsthat didn't have a match.

The original query has status NOT IN (2,3). That would essentially ignore rows in Notes that have one of those statuses, and only a row with some other non-NULL value of status would "match". Based on the specification... "no notes with a status of 2 or 3", that seems to imply you'd want status IN (2,3) in the query.

0 comments:

Post a Comment