Tuesday 4 September 2018

How to optimize the mysql query with a large dataset

I have two tables with the following schema,

CREATE TABLE `open_log` (
  `delivery_id` varchar(30) DEFAULT NULL,
  `email_id` varchar(50) DEFAULT NULL,
  `email_activity` varchar(30) DEFAULT NULL,
  `click_url` text,
  `email_code` varchar(30) DEFAULT NULL,
  `on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

CREATE TABLE `sent_log` (
  `email_id` varchar(50) DEFAULT NULL,
  `delivery_id` varchar(50) DEFAULT NULL,
  `email_code` varchar(50) DEFAULT NULL,
  `delivery_status` varchar(50) DEFAULT NULL,
  `tries` int(11) DEFAULT NULL,
  `creation_ts` varchar(50) DEFAULT NULL,
  `creation_dt` varchar(50) DEFAULT NULL,
  `on_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

The email_id and delivery_id columns in both tables make up a unique key.
The open_log table have 2.5 million records where as sent_log table has 0.25 million records.
I want to filter out the records from open log table based on the unique key (email_id and delivery_id).
I'm writing the following query.
SELECT * FROM open_log
WHERE CONCAT(email_id,'^',delivery_id)
IN (
SELECT DISTINCT CONCAT(email_id,'^',delivery_id) FROM sent_log
)

The problem is the query is taking too much time to execute. I've waited for an hour for the query completion but didn't succeed.
Kindly, suggest what I can do to make it fast since, I have the big data size in the tables.
Thanks, Faisal Nasir

First, rewrite your query using exists:
SELECT *
FROM open_log ol
WHERE EXISTS (SELECT 1
              FROM send_log sl
              WHERE sl.email_id = ol.email_id and sl.delivery_id = ol.delivery_id
             );

Then, add an index so this query will run faster:
create index idx_sendlog_emailid_deliveryid on send_log(email_id, delivery_id);

Your query is slow for a variety of reasons:
  • The use of string concatenation makes it impossible for MySQL to use an index.
  • The select distinct in the subquery is unnecessary.
  • Exists can be faster than in.

0 comments:

Post a Comment