Tuesday, 30 July 2019

MySQL REGEXP Examples

In MySQL, the REGEXP operator is used to determine whether or not a string matches a regular expression. It’s a synonym for REGEXP_LIKE().
If the string matches the regular expression provided, the result is 1, otherwise it’s 0.

Syntax

The syntax goes like this:
expr REGEXP pat
Where expr is the input string and pat is the regular expression for which you’re testing the string against.

Example 1 – Basic Regular Expression

The most basic regular expression we can use is one that has no special characters in it. Here, we just use a string. If any part of the input string matches that string, it returns a match.
SELECT 
  'Corn' REGEXP 'Corn' AS 'Corn',
  'Acorn' REGEXP 'Corn' AS 'Acorn',
  'Corner' REGEXP 'Corn' AS 'Corner',
  'Cheese' REGEXP 'Corn' AS 'Cheese';
Result:
+------+-------+--------+--------+
| Corn | Acorn | Corner | Cheese |
+------+-------+--------+--------+
|    1 |     1 |      1 |      0 |
+------+-------+--------+--------+

Example 2 – Match the Beginning of a String

In this example, the regular expression specifies that the string must begin with Co.
SELECT 
  'Corn' REGEXP '^Co' AS 'Corn',
  'Acorn' REGEXP '^Co' AS 'Acorn',
  'Cheese' REGEXP '^Co' AS 'Cheese';
Result:
+------+-------+--------+
| Corn | Acorn | Cheese |
+------+-------+--------+
|    1 |     0 |      0 |
+------+-------+--------+

Example 3 – Match the End of a String

In this example, the regular expression specifies that the string must end with rn.
SELECT 
  'Corn' REGEXP 'rn$' AS 'Corn',
  'Acorn' REGEXP 'rn$' AS 'Acorn',
  'Cheese' REGEXP 'rn$' AS 'Cheese';
Result:
+------+-------+--------+
| Corn | Acorn | Cheese |
+------+-------+--------+
|    1 |     1 |      0 |
+------+-------+--------+

Example 4 – Match Any Character

The . character matches any character.
SELECT 
  'Corn' REGEXP '.' AS 'Corn',
  'Cheese' REGEXP '.' AS 'Cheese',
  '' REGEXP '.' AS '';
Result:
+------+--------+---+
| Corn | Cheese |   |
+------+--------+---+
|    1 |      1 | 0 |
+------+--------+---+
This character is typically used in conjunction with other characters to specify further criteria. For example:
SELECT 
  'Corn' REGEXP '^C.rn$' AS 'Corn', 
  'Crn' REGEXP '^C.rn$' AS 'Crn';
Result:
+------+-----+
| Corn | Crn |
+------+-----+
|    1 |   0 |
+------+-----+
Here we specify that the string must start with C, that it must be followed by a character (any character), and that it must end with rn.
Note that this character specifies a single instance of the character. If you want to specify multiple instances (for example ee instead of just e), you’ll need to add more . characters.
SELECT 
  'Tweet' REGEXP '^Tw..t$' AS 'Tweet', 
  'Tweat' REGEXP '^Tw..t$' AS 'Tweat', 
  'Tweet' REGEXP '^Tw.t$' AS 'Tweet', 
  'Twit' REGEXP '^Tw..t$' AS 'Twit';
Result:
+-------+-------+-------+------+
| Tweet | Tweat | Tweet | Twit |
+-------+-------+-------+------+
|     1 |     1 |     0 |    0 |
+-------+-------+-------+------+
Another way to do this is to specify the number of occurrences within curly brackets:
SELECT 
  'Tweet' REGEXP '^Tw.{2}t$' AS 'Tweet', 
  'Tweat' REGEXP '^Tw.{2}t$' AS 'Tweat', 
  'Tweet' REGEXP '^Tw.{1}t$' AS 'Tweet', 
  'Twit' REGEXP '^Tw.{2}t$' AS 'Twit';
Result:
+-------+-------+-------+------+
| Tweet | Tweat | Tweet | Twit |
+-------+-------+-------+------+
|     1 |     1 |     0 |    0 |
+-------+-------+-------+------+
However, if you know the character that you’re looking for, you can specify that character (instead of the . character), as demonstrated in the following example.

Example 5 – Match Zero or More Instances of a Specific Character

We can do the following to specify zero or more instances of the e character:
SELECT 
  'Twet' REGEXP '^Twe*t$' AS 'Twet',
  'Tweet' REGEXP '^Twe*t$' AS 'Tweet',
  'Tweeet' REGEXP '^Twe*t$' AS 'Tweeet',
  'Twt' REGEXP '^Twe*t$' AS 'Twt',
  'Twit' REGEXP '^Twe*t$' AS 'Twit',
  'Twiet' REGEXP '^Twe*t$' AS 'Twiet',
  'Tweit' REGEXP '^Twe*t$' AS 'Tweit';
Result:
+------+-------+--------+-----+------+-------+-------+
| Twet | Tweet | Tweeet | Twt | Twit | Twiet | Tweit |
+------+-------+--------+-----+------+-------+-------+
|    1 |     1 |      1 |   1 |    0 |     0 |     0 |
+------+-------+--------+-----+------+-------+-------+
The first four match but the last three don’t.

Example 6 – Match One or More Instances of a Specific Character

We can modify the previous example so that we only get a match if one or more characters is found (the previous example returned a match if zero or more were found). To do this, we simply use + instead of *:
SELECT 
  'Twet' REGEXP '^Twe+t$' AS 'Twet',
  'Tweet' REGEXP '^Twe+t$' AS 'Tweet',
  'Tweeet' REGEXP '^Twe+t$' AS 'Tweeet',
  'Twt' REGEXP '^Twe+t$' AS 'Twt',
  'Twit' REGEXP '^Twe+t$' AS 'Twit',
  'Twiet' REGEXP '^Twe+t$' AS 'Twiet',
  'Tweit' REGEXP '^Twe+t$' AS 'Tweit';
Result:
+------+-------+--------+-----+------+-------+-------+
| Twet | Tweet | Tweeet | Twt | Twit | Twiet | Tweit |
+------+-------+--------+-----+------+-------+-------+
|    1 |     1 |      1 |   0 |    0 |     0 |     0 |
+------+-------+--------+-----+------+-------+-------+
In this case, the fourth word returns a different result to the previous example.

Example 7 – Match Zero or One Instance of a Specific Character

We can modify the previous example so that we only get a match on zero or one of the desired characters. To do this, we use ?:
SELECT 
  'Twet' REGEXP '^Twe?t$' AS 'Twet',
  'Tweet' REGEXP '^Twe?t$' AS 'Tweet',
  'Tweeet' REGEXP '^Twe?t$' AS 'Tweeet',
  'Twt' REGEXP '^Twe?t$' AS 'Twt',
  'Twit' REGEXP '^Twe?t$' AS 'Twit',
  'Twiet' REGEXP '^Twe?t$' AS 'Twiet',
  'Tweit' REGEXP '^Twe?t$' AS 'Tweit';
Result:
+------+-------+--------+-----+------+-------+-------+
| Twet | Tweet | Tweeet | Twt | Twit | Twiet | Tweit |
+------+-------+--------+-----+------+-------+-------+
|    1 |     0 |      0 |   1 |    0 |     0 |     0 |
+------+-------+--------+-----+------+-------+-------+

Example 8 – Alternation

You can use the | character to match one or another sequence of characters:
SELECT 
  'Tweet' REGEXP 'Tw|et' AS 'Tweet',
  'For Let' REGEXP 'Tw|et' AS 'For Let',
  'Banana' REGEXP 'Tw|et' AS 'Banana';
Result:
+-------+---------+--------+
| Tweet | For Let | Banana |
+-------+---------+--------+
|     1 |       1 |      0 |
+-------+---------+--------+
Here’s another example where I search for whole words:
SELECT 
  'Cat' REGEXP 'Cat|Dog' AS 'Cat',
  'Dog' REGEXP 'Cat|Dog' AS 'Dog',
  'Doggone' REGEXP 'Cat|Dog' AS 'Doggone',
  'Banana' REGEXP 'Cat|Dog' AS 'Banana';
Result:
+-----+-----+---------+--------+
| Cat | Dog | Doggone | Banana |
+-----+-----+---------+--------+
|   1 |   1 |       1 |      0 |
+-----+-----+---------+--------+
We still get a match even when our regular expression matches only part of the the string.

Example 9 – Match Zero or More Instances of a Sequence

You can use brackets along with the asterisk ()* to specify zero or more instances of a sequence:
SELECT 
  'Banana' REGEXP '(an)*' AS 'Banana',
  'Land' REGEXP '(an)*' AS 'Land',
  'Cheese' REGEXP '(an)*' AS 'Cheese';
Result:
+--------+------+--------+
| Banana | Land | Cheese |
+--------+------+--------+
|      1 |    1 |      1 |
+--------+------+--------+
Another example:
SELECT 
  'Banana' REGEXP '^B(an)*d$' AS 'Banana',
  'Band' REGEXP '^B(an)*d$' AS 'Band',
  'Bald' REGEXP '^B(an)*d$' AS 'Bald',
  'Bad' REGEXP '^B(an)*d$' AS 'Bad';
Result:
+--------+------+------+-----+
| Banana | Band | Bald | Bad |
+--------+------+------+-----+
|      0 |    1 |    0 |   0 |
+--------+------+------+-----+

Example 10 – Repetition

As seen in a previous example, you can use curly brackets to specify repetition. This notation provides a more general way of writing regular expressions than some of the previous examples:
SELECT 
  'Tweeet' REGEXP 'e{3}' AS 'Tweeet',
  'Tweet' REGEXP 'e{3}' AS 'Tweet';
Result:
+--------+-------+
| Tweeet | Tweet |
+--------+-------+
|      1 |     0 |
+--------+-------+

Example 11 – Range

You can use the hyphen character to specify a range. Here’s an example that specifies a range of numbers:
SELECT 
  'Tweet 123' REGEXP '[0-9]' AS 'Tweet 123',
  'Tweet ABC' REGEXP '[0-9]' AS 'Tweet ABC';
Result:
+--------+-------+
| Tweeet | Tweet |
+--------+-------+
|      1 |     0 |
+--------+-------+
And the following example specifies a range of letters:
SELECT 
  'Tweet 123' REGEXP '[A-Z]' AS 'Tweet 123',
  'ABC' REGEXP '[A-Z]' AS 'ABC',
  '123' REGEXP '[A-Z]' AS '123';
Result:
+--------+-------+
| Tweeet | Tweet |
+--------+-------+
|      1 |     0 |
+--------+-------+
Here’s what happens if we limit the range of numbers:
SELECT 
  '123' REGEXP '[1-3]' AS '123',
  '012' REGEXP '[1-3]' AS '012',
  '045' REGEXP '[1-3]' AS '045';
Result:
+-----+-----+-----+
| 123 | 012 | 045 |
+-----+-----+-----+
|   1 |   1 |   0 |
+-----+-----+-----+

Example 12 – Not Within a Range

We can use the ^ character to modify the previous example so that the specified range of characters are excluded:
SELECT 
  '123' REGEXP '[^1-3]' AS '123',
  '012' REGEXP '[^1-3]' AS '012',
  '045' REGEXP '[^1-3]' AS '045';
Result:
+-----+-----+-----+
| 123 | 012 | 045 |
+-----+-----+-----+
|   0 |   1 |   1 |
+-----+-----+-----+
So we get the opposite result to the previous example.

0 comments:

Post a Comment