Tuesday, 30 July 2019

How the REGEXP_SUBSTR() Function Works in MySQL

In MySQL, the REGEXP_SUBSTR() function returns the substring that matches the given regular expression pattern.
If there’s no match (i.e. the input string doesn’t contain the substring), the result is NULL.

Syntax

The syntax goes like this:
REGEXP_SUBSTR(expr, pat[, pos[, occurrence[, match_type]]])
Where expr is the input string and pat is the regular expression pattern for the substring.
The optional pos argument allows you to specify a position within the string to start the search. If omitted, it starts at position 1.
The optional occurrence argument allows you to specify which occurrence of the match to search for. If omitted, the first occurrence is used (occurrence 1).
The optional match_type argument allows you to refine the regular expression. For example, you can use this argument to specify case-sensitive matching or not.

Example 1 – Basic Usage

Here’s a basic example:
SELECT REGEXP_SUBSTR('Thailand or Cambodia', 'l.nd') Result;
Result:
+--------+
| Result |
+--------+
| land   |
+--------+
In this case there’s a match, and the substring is returned.

Example 2 – Multiple Matches

By default, if there are multiple matches within the string, the first one is returned (although you can specify another occurrence if needed. We’ll get to that soon):
SELECT REGEXP_SUBSTR('Lend for land', 'l.nd') Result;
Result:
+--------+
| Result |
+--------+
| Lend   |
+--------+

Example 3 – No Match

Here’s an example where there’s no match:
SELECT REGEXP_SUBSTR('Lend for land', '^C') Result;
Result:
+--------+
| Result |
+--------+
| NULL   |
+--------+
There’s no match, so the result is NULL.

Example 4 – The pos Argument

Here’s an example of specifying a starting position:
SELECT REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2) Result;
Result:
+--------+
| Result |
+--------+
| Cut    |
+--------+
We started at position 2, which comes after the start of the first occurrence, so the next occurrence is returned instead.
Here’s another example:
SELECT 
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1) 'Pos 1',
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2) 'Pos 2',
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 6) 'Pos 6';
Result:
+-------+-------+-------+
| Pos 1 | Pos 2 | Pos 6 |
+-------+-------+-------+
| Cat   | Cut   | Cot   |
+-------+-------+-------+

Example 5 – The occurrence Argument

Here’s an example of using the occurrence argument. In all cases, we start at position 1:
SELECT 
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1, 1) 'Occurrence 1',
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1, 2) 'Occurrence 2',
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 1, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+
| Occurrence 1 | Occurrence 2 | Occurrence 3 |
+--------------+--------------+--------------+
| Cat          | Cut          | Cot          |
+--------------+--------------+--------------+
However, if we start at a different position, the result is different:
SELECT 
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2, 1) 'Occurrence 1',
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2, 2) 'Occurrence 2',
  REGEXP_SUBSTR('Cat Cut Cot', 'C.t', 2, 3) 'Occurrence 3';
Result:
+--------------+--------------+--------------+
| Occurrence 1 | Occurrence 2 | Occurrence 3 |
+--------------+--------------+--------------+
| Cut          | Cot          | NULL         |
+--------------+--------------+--------------+
This happened because our starting position came after the first occurrence had started. Therefore, occurrence 2 became occurrence 1, and occurrence 3 became occurrence 2. And because there were no more occurrences, the result of occurrence 3 was NULL (i.e. there was no occurrence 3).

Example 6 – The match_type Argument

You can provide an additional argument to determine the match type. This allows you to specify things like whether or not the match is case-sensitive, whether or not to include line terminators, etc.
Here’s an example of specifying a case-sensitive match and a case-insensitive match:
SELECT 
  REGEXP_SUBSTR('Cat Cut Cot', 'c.t', 1, 1, 'c') 'Case-Sensitive',
  REGEXP_SUBSTR('Cat Cut Cot', 'c.t', 1, 1, 'i') 'Case-Insensitive';
Result:
+----------------+------------------+
| Case-Sensitive | Case-Insensitive |
+----------------+------------------+
| NULL           | Cat              |
+----------------+------------------+
The match_type argument can contain the following characters:
c
Case sensitive matching.
i
Case insensitive matching.
m
Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression.
n
The . character matches line terminators. The default is for . matching to stop at the end of a line.
u
Unix-only line endings. Only the newline character is recognized as a line ending by the .^, and $ match operators.

0 comments:

Post a Comment