Official MySQL documentation provides information that
using regular expressions is "powerful way of specifying a pattern for a
complex search". Is it really such a powerful way of filtering and
should be used, or is it a solution that should be avoided? As it
usually happens in real life, there are many opinions and no universal
answer. Unfortunately, it often turns out that the truth lies somewhere
in the middle.
One of my clients asked me yesterday for a little help with a query
badly hitting performance of their production server. They were
complaining about performance of
Original query used in customer's application:
Of course `list` table was properly indexed.
Explain:
Can we deal with it somehow?
I was playing with it a bit trying different alternatives and here is what I found as much more efficient replacement to original query:
Execution plan again:
And finally, optimization benefit (timings for both queries):
Original query by customer:
Of course it does not mean that all the queries using the
REGEXP
powered query and asked for advice on how to make it efficient.Original query used in customer's application:
SELECT id FROM list WHERE user_name REGEXP '^bulba[0-9]+$';
Of course `list` table was properly indexed.
Explain:
mysql> explain SELECT SQL_NO_CACHE id FROM list WHERE user_name REGEXP '^bulba[0-9]+$'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: list type: index possible_keys: NULL key: id7 key_len: 24 ref: NULL rows: 4175201 Extra: Using where; Using index 1 row in set (0.00 sec)Execution plan pasted above shows that MySQL was using index scan (
type: index
),
which is faster way to get your data than table scan, but still, it's
relatively slow and overall performance strongly depends on amount of
data it has to parse. Customer case is quite a good example of it. Even
assuming that InnoDB buffer pool is big enough to cover all index pages
and almost no IO will be needed, it's still over four million rows to
examine.Can we deal with it somehow?
I was playing with it a bit trying different alternatives and here is what I found as much more efficient replacement to original query:
SELECT id FROM list WHERE user_name LIKE 'bulba%' and user_name REGEXP 'bulba[0-9]';
Execution plan again:
mysql> explain SELECT SQL_NO_CACHE id FROM list WHERE user_name LIKE 'bulba%' AND user_name REGEXP 'bulba[0-9]'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: list type: range possible_keys: id7 key: id7 key_len: 19 ref: NULL rows: 31 Extra: Using where; Using index 1 row in set (0.00 sec)Explain looks better. Now, instead of index scan MySQL will use 'id7' to find rows matching criteria with minimum overhead. This is because MySQL can't really use index when
REGEXP
is the only filtering used. New one, in fact, is doing primary filtering with LIKE
, REGEXP
just extends it.And finally, optimization benefit (timings for both queries):
Original query by customer:
19 rows in set (4.96 sec)Fixed one:
19 rows in set (0.01 sec)It seems that we were able to achieve the goal.
Of course it does not mean that all the queries using the
REGEXP
are "bad". That is only a reminder that special care should be taken
when writing SQL code with the use of this very nice functionality. You
should aware of queries with REGEXP
as the only filtering role.
0 comments:
Post a Comment