Mysql: Finding incomplete orders ~ Tech Blog

Monday, 12 November 2018

Mysql: Finding incomplete orders

Imagine we are keeping an online shop and want to find the customers that don't have complete orders.

We'll make the structure of the orders a little bit complex:

Each customer may have a number of baskets
Each basket will have a list of positions in it
Each position has a number of discounts
An order is considered complete, when all entites are present: there is at least one basket, all baskets should have at least one position, and each position should have at least one discount

We will keep the data in four tables as following:

Let's try to find all customers that have at least one incomplete order.

An incomplete order means that either there are no basket, or there are no positions in the basket, or there are no discounts for the position.

In our sample data, customer 1 has no baskets, customer 2 has no positions, and customer 3 has no discounts.

Having no baskets or no positions in a basket naturally also means not having any discounts. That's why we can utilize an outer join to find the missing rows, and filter out the discounts with a NULL id:

SELECT  DISTINCT cmr_id

FROM    t_customer

LEFT OUTER JOIN

t_basket

ON      bsk_customer = cmr_id

LEFT OUTER JOIN

t_position

ON      pos_basket = bsk_id

LEFT OUTER JOIN

t_discount

ON      dsc_position = pos_id

WHERE   dsc_id IS NULL

cmr_id
1
2
3
3 rows fetched in 0,0012s (3,2659s)

This runs for more than 3 seconds.

The query joins all four tables, selects all rows from them and finally filters out the rows that have a NULL dsc_id.

We have more than 100,000 rows in the largest tables, that's why these joins are quite costly.

We can reduce scanning: just find a first matching row for each customer that has a NULL dsc_id and then go to another customer instead of scanning all other baskets, positions and discounts for this one.

This may be achieved by using NOT EXISTS:

SELECT  cmr_id

FROM    t_customer

WHERE   NOT EXISTS

(

SELECT  1

FROM    t_basket, t_position, t_discount

WHERE   bsk_customer = cmr_id

AND pos_basket = bsk_id

AND dsc_position = pos_id

)

cmr_id
1
2
3
3 rows fetched in 0,0012s (0,1618s)

This is much faster.

Now let's change the rules a little. We will consider an order complete only if it has at least two discounts for each position. In our sample data, customer 4 has exactly 1 discount for each position.

This becomes complicated now. We cannot use NOT EXISTS anymore, as it will fail if exactly one discount exists, which is not what we need.

We can of course filter out the positions having less than two discounts using the following query:

SELECT  cmr_id

FROM    t_customer

WHERE   NOT EXISTS

(

SELECT  1

FROM    t_basket

LEFT OUTER JOIN

t_position

ON      pos_basket = bsk_id

LEFT OUTER JOIN

t_discount

ON      dsc_position = pos_id

WHERE   bsk_customer = cmr_id

GROUP BY

bsk_id, pos_id

HAVING  COUNT(*) > 1

)

cmr_id
1
2
3
4
4 rows fetched in 0,0012s (5,6372s)

, which will give us the result we're after. But this query is quite slow.

How can we optimize it?

MySQL uses NESTED LOOPS to perform joins, with t_customer as a leading table. To calculate a COUNT(*), it first need to join all four tables, and then filter out the rows that have COUNT(*) >= 2.

But t_discount actually has all the information we need for filtering. We may first perform the filtering, and then join the other tables with the filtered results. Thus we can get rid of the unneeded joins.

So we'll make our query perform the following steps:

We find all customers that don't have discounts at all. These customers don't have any records in the t_discount, but, fortunately, we already know how to filter them out fast
We find all customers that have positions with exactly one discount. To do this, we select all such positions from t_discount by using COUNT(*), and perform the joins with the other tables. Joins will be performed after fitering.
Finally, we UNION the results of the queries above

Let's see how it works:

SELECT  cmr_id

FROM    t_customer

WHERE   NOT EXISTS

(

SELECT  1

FROM    t_basket, t_position, t_discount

WHERE   bsk_customer = cmr_id

AND pos_basket = bsk_id

AND dsc_position = pos_id

)

UNION

SELECT  DISTINCT bsk_customer

FROM    (

SELECT  dsc_position

FROM    t_discount

GROUP BY

dsc_position

HAVING COUNT(*) < 2

) dd, t_position, t_basket

WHERE   pos_id = dsc_position

AND bsk_id = pos_basket

cmr_id
1
2
3
4
4 rows fetched in 0,0012s (0,7321s)

Much faster.

Tech Blog

Monday, 12 November 2018

Mysql: Finding incomplete orders

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains

Monday, 12 November 2018

0 comments:

Post a Comment

Total Pageviews

Achievement

Live Traffic

Subscribe To

Followers

About Me

I V RAMANA

Recent Comments

Categories

Popular Posts

Hot Topics

Video

News

Comments

Recent

Bottom Ad [Post Page]

Recent Posts

Mysql - How to reset the administrator password in ISPConfig 3

Socialize

Blog Archive

Search This Blog

Post Top Ad

Archive

Post Bottom Ad

Author Details

About Me

Tags

Full width home advertisement

Pages

Post Page Advertisement [Top]

Climb the mountains