Tuesday, 30 July 2019

How the WEIGHT_STRING() Function Works in MySQL

In MySQL, the WEIGHT_STRING() function returns the weight string for the input string. The return value is a binary string that represents the comparison and sorting value of the string.
If the input string is a nonbinary value, the return value contains the collation weights for the string. If it’s a binary value, the result is the same as the input string. This is because the weight for each byte in a binary string is the byte value.
This function is a debugging function intended for internal use. It can be used for testing and debugging of collations. Note that its behavior can change between MySQL versions.

Syntax

The syntax goes like this:
WEIGHT_STRING(str [AS {CHAR|BINARY}(N)] [flags])
Where str is the input string. The optional AS clause allows you to cast the input string to a given type and length. The optional flags argument is not currently used in MySQL (as of version 8.0).

Example 1 – Basic Usage

Here’s a basic example of usage using a nonbinary input string:
SELECT HEX(WEIGHT_STRING('Cat'));
Result:
+---------------------------+
| HEX(WEIGHT_STRING('Cat')) |
+---------------------------+
| 1C7A1C471E95              |
+---------------------------+
Note that I use the HEX() function to display the WEIGHT_STRING() result. This is because WEIGHT_STRING() returns a binary result. We can use HEX() to display the result in a printable form.
If I don’t use HEX() in this example I get this:
SELECT WEIGHT_STRING('Cat');
Result:
+----------------------+
| WEIGHT_STRING('Cat') |
+----------------------+
| zG?                |
+----------------------+
So just to be clear, here’s the string, the hexadecimal representation of that string, and the hexadecimal representation of its weight string:
SET @str = 'Cat';
SELECT @str, HEX(@str), HEX(WEIGHT_STRING(@str));
Result:
+------+-----------+--------------------------+
| @str | HEX(@str) | HEX(WEIGHT_STRING(@str)) |
+------+-----------+--------------------------+
| Cat  | 436174    | 1C7A1C471E95             |
+------+-----------+--------------------------+

Example 2 – The AS Clause

Here’s an example using the AS clause to cast the input string to a given type and length.
SET @str = 'Cat';
SELECT 
  HEX(WEIGHT_STRING(@str AS CHAR(3))) 'Char 3',
  HEX(WEIGHT_STRING(@str AS CHAR(8))) 'Char 8',
  HEX(WEIGHT_STRING(@str AS BINARY(3))) 'Binary 3',
  HEX(WEIGHT_STRING(@str AS BINARY(8))) 'Binary 8';
Result:
+--------------+--------------+----------+------------------+
| Char 3       | Char 8       | Binary 3 | Binary 8         |
+--------------+--------------+----------+------------------+
| 1C7A1C471E95 | 1C7A1C471E95 | 436174   | 4361740000000000 |
+--------------+--------------+----------+------------------+

Example 3 – Collation

The following two examples demonstrate how a string can have a different weight string, depending on the collation being used. The collation used in the first example is accent-insensitive and case-insensitive. The collation used in the second example is accent-sensitive and case-sensitive.
SET @upper = _utf8mb4 'CAT' COLLATE utf8mb4_0900_ai_ci;
SET @lower = lcase(@upper);
SELECT 
  @upper 'String', 
  HEX(@upper) 'Hex', 
  HEX(WEIGHT_STRING(@upper)) 'Weight String'
UNION ALL
SELECT 
  @lower, 
  HEX(@lower), 
  HEX(WEIGHT_STRING(@lower));
Result:
+--------+--------+---------------+
| String | Hex    | Weight String |
+--------+--------+---------------+
| CAT    | 434154 | 1C7A1C471E95  |
| cat    | 636174 | 1C7A1C471E95  |
+--------+--------+---------------+
And here’s the same example, except with an accent-sensitive and case-sensitive collation.
SET @upper = _utf8mb4 'CAT' COLLATE utf8mb4_0900_as_cs;
SET @lower = lcase(@upper);
SELECT 
  @upper 'String', 
  HEX(@upper) 'Hex', 
  HEX(WEIGHT_STRING(@upper)) 'Weight String'
UNION ALL
SELECT 
  @lower, 
  HEX(@lower), 
  HEX(WEIGHT_STRING(@lower));
Result:
+--------+--------+----------------------------------------------+
| String | Hex    | Weight String                                |
+--------+--------+----------------------------------------------+
| CAT    | 434154 | 1C7A1C471E9500000020002000200000000800080008 |
| cat    | 636174 | 1C7A1C471E9500000020002000200000000200020002 |
+--------+--------+----------------------------------------------+

0 comments:

Post a Comment