Tuesday, 30 July 2019

How the REGEX_REPLACE() Function Works in MySQL

In MySQL, the REGEXP_REPLACE() function replaces occurrences of the substring within a string that matches the given regular expression pattern.
The whole string is returned along with the replacements.
If there’s no match (i.e. the input string doesn’t contain the substring), the the whole string is returned unchanged.

Syntax

The syntax goes like this:
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Where expr is the input string and pat is the regular expression pattern for the substring. The repl argument is the replacement string.
The optional pos argument allows you to specify a position within the string to start the search. If omitted, it starts at position 1.
The optional occurrence argument allows you to specify which occurrence of the match to search for. If omitted, all occurrences are replaced.
The optional match_type argument allows you to refine the regular expression. For example, you can use this argument to specify case-sensitive matching or not.

Example 1 – Basic Usage

Here’s a basic example:
SET @str = 'It was good';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'good', 'great!') 'Result';
Result:
+-----------------+---------------+
| Original String | Result        |
+-----------------+---------------+
| It was good     | It was great! |
+-----------------+---------------+
In this case there’s a match, and the string is returned with the modification.

Example 2 – Multiple Matches

By default, if there are multiple matches within the string, all of them are replaced:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'Cat', 'Tiger') 'Result';
Result:
+---------------------+---------------------------+
| Original String     | Result                    |
+---------------------+---------------------------+
| Cat Dog Cat Dog Cat | Tiger Dog Tiger Dog Tiger |
+---------------------+---------------------------+
However, you also have the option of specifying which occurrence you’d like to replace (more on this later).

Example 3 – No Match

Here’s an example where there’s no match:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'Cow', 'Tiger') 'Result';
Result:
+---------------------+---------------------+
| Original String     | Result              |
+---------------------+---------------------+
| Cat Dog Cat Dog Cat | Cat Dog Cat Dog Cat |
+---------------------+---------------------+
There’s no match, so the string is returned unchanged.

Example 4 – The pos Argument

Here’s an example of specifying the starting position:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'Cat', 'Tiger', 2) 'Result';
Result:
+---------------------+-------------------------+
| Original String     | Result                  |
+---------------------+-------------------------+
| Cat Dog Cat Dog Cat | Cat Dog Tiger Dog Tiger |
+---------------------+-------------------------+
We started at position 2, which comes after the start of the first occurrence, so the replace operation only affects those occurrences that come after the first one.

Example 5 – The occurrence Argument

As mentioned, by default, all occurrences are replaced. However, you also have the option of specifying a specific occurrence to replace by using the occurrence argument. Here’s an example:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'Cat', 'Tiger', 1, 2) 'Result';
Result:
+---------------------+-----------------------+
| Original String     | Result                |
+---------------------+-----------------------+
| Cat Dog Cat Dog Cat | Cat Dog Tiger Dog Cat |
+---------------------+-----------------------+
In this case we start at position 1. However, if we start at a different position, the result is different:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'Cat', 'Tiger', 2, 2) 'Result';
Result:
+---------------------+-----------------------+
| Original String     | Result                |
+---------------------+-----------------------+
| Cat Dog Cat Dog Cat | Cat Dog Cat Dog Tiger |
+---------------------+-----------------------+
This happened because our starting position came after the first occurrence had started. Therefore, occurrence 2 became occurrence 1, and occurrence 3 became occurrence 2.
The default value for the occurrence argument is 0, which means all occurrences are replaced. In other words, if you omit this argument, all occurrences are replaced (as we’ve seen in the previous examples). Here’s an example of explicitly specifying all occurrences:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'Cat', 'Tiger', 1, 0) 'Result';
Result:
+---------------------+---------------------------+
| Original String     | Result                    |
+---------------------+---------------------------+
| Cat Dog Cat Dog Cat | Tiger Dog Tiger Dog Tiger |
+---------------------+---------------------------+

Example 6 – The match_type Argument

You can provide an additional argument to determine the match type. This allows you to specify things like whether or not the match is case-sensitive, whether or not to include line terminators, etc.
Here’s an example of specifying a case-sensitive match and a case-insensitive match:
SET @str = 'Cat Dog Cat Dog Cat';
SELECT 
  @str 'Original String',
  REGEXP_REPLACE(@str, 'cat', 'Tiger', 1, 0, 'c') 'Case-Sensitive',
  REGEXP_REPLACE(@str, 'cat', 'Tiger', 1, 0, 'i') 'Case-Insensitive';
Result:
+---------------------+---------------------+---------------------------+
| Original String     | Case-Sensitive      | Case-Insensitive          |
+---------------------+---------------------+---------------------------+
| Cat Dog Cat Dog Cat | Cat Dog Cat Dog Cat | Tiger Dog Tiger Dog Tiger |
+---------------------+---------------------+---------------------------+
The match_type argument can contain the following characters:
c
Case sensitive matching.
i
Case insensitive matching.
m
Multiple-line mode. Recognize line terminators within the string. The default behavior is to match line terminators only at the start and end of the string expression.
n
The . character matches line terminators. The default is for . matching to stop at the end of a line.
u
Unix-only line endings. Only the newline character is recognized as a line ending by the ., ^, and $ match operators.

0 comments:

Post a Comment