Saturday 8 September 2018

MySQL: Find records in one table that are not in another

A couple of weeks back I posted how to find records in one table that are not in another with MySQL and received an email with a more efficient way of approaching the same problem and therefore revise my original post with his suggestion.

Example tables

The examples below use three tables as follows:
content: contains the content pages for a website. The primary key is content_id.
tags: the "tags" that a page is tagged with. The primary key is tag_id.
content_to_tags: a table that creates a many-to-many relationship between the above two tables; a page can belong to multiple tags.

Example 1

The first example finds records in content_to_tags that have no associated record in content. This could have happened if the application deleted a record from content but didn't delete the associated records from content_to_tags:
SELECT * 
FROM content_to_tags c2t
WHERE NOT EXISTS (
 SELECT * 
 FROM content c 
 WHERE c.content_id = c2t.content_id
)

Example 2

This is the same as for the first example but comparing tags with content_to_tags. Again, this might have happened if records were deleted from tags but not their associated records from content_to_tags:
SELECT * 
FROM content_to_tags c2t
WHERE NOT EXISTS (
 SELECT * 
 FROM tags t
 WHERE t.tag_id = c2t.tag_id
)

Benchmarking Examples 1 and 2

In my orginal post I used a LEFT JOIN from content_to_tags to content. I benchmarked my original query compared to this query on a small content table with 1000 records, a content_to_tags table with 2000 records and a tags table with 100 records.
Using NOT EXISTS was just over two times faster, and I would assume that as the tables get populated with more records the difference exponential.

A note about ACID compliance and foreign keys

Note that in a properly ACID compliant database with foreign key constraints the first two examples shouldn't actually return any data, because it wouldn't be possible to delete records from tags/content if there are associated records present in the content_to_tags table.
Examples 3 and 4 are looking at something slightly different: finding content that's not tagged at all, and finding tags that are not tagged to any content.

Example 3

The next example looks for records in content where there are no associated records in content_to_tags. This is useful if you want to find any untagged posts:
SELECT * 
FROM content c
WHERE NOT EXISTS (
 SELECT * 
 FROM content_to_tags c2t
 WHERE c.content_id = c2t.content_id
)

Example 4

The final example is the same as the above but to find tags that have no associated records in content_to_tags. This is useful for finding tags that have no content, or in another context (with different table names etc) categories that have no products in them:
SELECT * 
FROM tags t
WHERE NOT EXISTS (
 SELECT * 
 FROM content_to_tags c2t
 WHERE t.tag_id = c2t.tag_id
)

Benchmarking Examples 3 and 4

The difference between my original LEFT JOIN and the NOT EXISTS queries here was minimal for examples 3 and 4 on the example tables although I would imagine as they become bigger the difference would grow. NOT EXISTS was once again faster that the LEFT JOIN syntax.

Related posts:

0 comments:

Post a Comment