Tuesday 28 August 2018

MySQL NOT IN with subquery does not work as expected

I'm creating a application that will generate lists for email marketing campaigns. I have tables for contacts, emails, and campaigns. A campaign has many emails and a contact has many emails. The email is related to a contact and a campaign. Basically a table for a MANY to MANY relationship except I have other fields in the table for the result of the email (clicked, opened, unsubscribed, etc). There are also other tables but this is where I'm having the trouble.

I'm trying to use NOT IN with a subquery to get a list of contacts who have not received an email since a certain date with other conditions. An example query is this:
SELECT *
FROM `contact` `t`
WHERE (unsubscribed='1')
  AND t.id NOT IN
   (SELECT distinct contact_id
    FROM email, campaign
    WHERE email.campaign_id = campaign.id
      AND campaign.date_sent >= '2012-07-12')
ORDER BY rand()
LIMIT 10000

This returns 0 result. However, if I run the first condition:
select id
from contact
where unsubscribed=1

I have 9075 rows. Then, if I separately run the subquery:
SELECT distinct contact_id
FROM email, campaign
WHERE email.campaign_id = campaign.id
  AND campaign.date_sent >= '2012-07-12'

I have 116612 rows. Out of each of those results, I end up with 826 values that are duplicates. From what I can understand, this means that 9075-826=8249 records ARE unsubscribed=1 AND NOT IN the second query. So, my first query should be returning 8249 results but it is returning 0. I must be structuring the query wrong or using the wrong operators but I can not for the life of me figure out how to get this right.
Can anyone help? So many thanks in advance as this has had me stumped for like 3 days! :)

This is because
SELECT 1 FROM DUAL WHERE 1 NOT IN (NULL, 2)

won't return anything, whereas
SELECT 1 FROM DUAL WHERE 1 NOT IN (2)

will.
Please review the behaviour of NOT IN and NULL in MYSQL.
For your concern you should get away with it using NOT EXISTS instead of NOT IN:
SELECT * FROM `contact` `t`
WHERE (unsubscribed='1')
AND NOT EXISTS (
    SELECT * FROM email, campaign
    WHERE
        email.campaign_id = campaign.id
    AND campaign.date_sent >= '2012-07-12'
    AND t.id = contact_id
)
ORDER BY rand()
LIMIT 10000

0 comments:

Post a Comment