In database systems, Collation specifies how data is sorted and compared in a database. Collation provides the sorting rules, case, and accent sensitivity properties for the data in the database.
For example, when you run a query using the
ORDER BY
clause, collation determines whether or not uppercase letters and lowercase letters are treated the same.
Collation is also used to determine how accents are treated, as well as character width and Japanese kana characters. Collation can also be used to distinguish between various ideographic variation selectors in certain collations (such as the Japanese_Bushu_Kakusu_140 and Japanese_XJIS_140 collations that were introduced in SQL Server 2017).
Different database management systems will provide different collation options. Depending on the DBMS, collation can be specified at the server level, the database level, the table level, and the column level. Collations can also be specified at the expression level (so you can specify which collation to use when you run a query), and at the identifier level.
Example
Below is a comparison between running a case-sensitive query vs a case-insensitive query in SQL Server.
Case-insensitive
USE Music; SELECT ArtistId, ArtistName FROM Artists ORDER BY ArtistName COLLATE Latin1_General_CI_AI;
This query uses the
ORDER BY
clause to order the results, and it uses the COLLATE
clause to specify the collation to be used when ordering those results. In this case, we use the Latin1_General_CI_AI
collation. Note that this collation has the letters CI, which stands for case-insensitive.
The above query returns the following result:
ArtistId ArtistName -------- ---------------------- 1 Aardvark 2 AArdvark 3 aArdvark 4 aardvark
Case-sensitive
However, if we change the collation to case-sensitive by using the
Latin1_General_CS_AI
collation, we get the following result:ArtistId ArtistName -------- ---------------------- 4 aardvark 3 aArdvark 1 Aardvark 2 AArdvark
Collation Options
As mentioned, case-sensitivity is just one of various options when specifying collation. The options associated with a collation are case sensitivity, accent sensitivity, Kana-sensitivity, width sensitivity, variation-selector-sensitivity.
Collations are expressed in different ways, depending on the DBMS that you use. In SQL Server, collation options are specified by appending the options to the collation name. So for example,
Latin1_General_CI_AS
is case-insensitive and accent-sensitive.
Below is a quick overview of the options:
Option | Explanation |
---|---|
Case-sensitive (_CS ) | Lowercase letters sort ahead of their uppercase versions. |
Case-insensitive (_CI ) | Uppercase and lowercase versions of letters are considered identical for sorting purposes. |
Accent-sensitive (_AS ) | Distinguishes between accented and unaccented characters. For example, a is not equal to ấ . |
Accent-insensitive (_AI ) | Accented and unaccented versions of letters are considered identical for sorting purposes. For example, a is equal to ấ . |
Kana-sensitive (_KS ) | Distinguishes between the Japanese kana characters Hiragana and Katakana. If _KS is omitted, then it is kana-insensitive, which means that Hiragana and Katakana characters are considered equal for sorting purposes. |
Width-sensitive (_WS ) | Distinguishes between full-width and half-width characters. If _WS is omitted, then it is width-insensitive, which means that full-width and half-width representations of the same character are considered identical for sorting purposes. |
Variation-selector-sensitive (_VSS ) | Distinguishes between various ideographic variation selectors in Japanese collations Japanese_Bushu_Kakusu_140 and Japanese_XJIS_140. These collations were introduced in SQL Server 2017. If _VSS is not selected, the collation is variation selector insensitive, and the variation selector is not considered in the comparison. |
You should be able to see a list of available collations for your database by running a query. The syntax will depend on your database management system. The list of available collations is usually quite large, due to the fact that each collation can have many different permutations of the above options.
For example, below is a list of the various Latin1 General collations available in SQL Server 2017:
Latin1_General_BIN | Latin1-General, binary sort |
Latin1_General_BIN2 | Latin1-General, binary code point comparison sort |
Latin1_General_CI_AI | Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive |
Latin1_General_CI_AI_WS | Latin1-General, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive |
Latin1_General_CI_AI_KS | Latin1-General, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive |
Latin1_General_CI_AI_KS_WS | Latin1-General, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive |
Latin1_General_CI_AS | Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive |
Latin1_General_CI_AS_WS | Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive |
Latin1_General_CI_AS_KS | Latin1-General, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive |
Latin1_General_CI_AS_KS_WS | Latin1-General, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive |
Latin1_General_CS_AI | Latin1-General, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive |
Latin1_General_CS_AI_WS | Latin1-General, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive |
Latin1_General_CS_AI_KS | Latin1-General, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive |
Latin1_General_CS_AI_KS_WS | Latin1-General, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive |
Latin1_General_CS_AS | Latin1-General, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive |
Latin1_General_CS_AS_WS | Latin1-General, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive |
Latin1_General_CS_AS_KS | Latin1-General, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive |
Latin1_General_CS_AS_KS_WS | Latin1-General, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive |
Latin1_General_100_BIN | Latin1-General-100, binary sort |
Latin1_General_100_BIN2 | Latin1-General-100, binary code point comparison sort |
Latin1_General_100_CI_AI | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive |
Latin1_General_100_CI_AI_WS | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive |
Latin1_General_100_CI_AI_KS | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive |
Latin1_General_100_CI_AI_KS_WS | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive |
Latin1_General_100_CI_AS | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive |
Latin1_General_100_CI_AS_WS | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive |
Latin1_General_100_CI_AS_KS | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive |
Latin1_General_100_CI_AS_KS_WS | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive |
Latin1_General_100_CS_AI | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive |
Latin1_General_100_CS_AI_WS | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive |
Latin1_General_100_CS_AI_KS | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive |
Latin1_General_100_CS_AI_KS_WS | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive |
Latin1_General_100_CS_AS | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive |
Latin1_General_100_CS_AS_WS | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive |
Latin1_General_100_CS_AS_KS | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive |
Latin1_General_100_CS_AS_KS_WS | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive |
Latin1_General_100_CI_AI_SC | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-insensitive, supplementary characters |
Latin1_General_100_CI_AI_WS_SC | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-insensitive, width-sensitive, supplementary characters |
Latin1_General_100_CI_AI_KS_SC | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-insensitive, supplementary characters |
Latin1_General_100_CI_AI_KS_WS_SC | Latin1-General-100, case-insensitive, accent-insensitive, kanatype-sensitive, width-sensitive, supplementary characters |
Latin1_General_100_CI_AS_SC | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive, supplementary characters |
Latin1_General_100_CI_AS_WS_SC | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-insensitive, width-sensitive, supplementary characters |
Latin1_General_100_CI_AS_KS_SC | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-insensitive, supplementary characters |
Latin1_General_100_CI_AS_KS_WS_SC | Latin1-General-100, case-insensitive, accent-sensitive, kanatype-sensitive, width-sensitive, supplementary characters |
Latin1_General_100_CS_AI_SC | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-insensitive, supplementary characters |
Latin1_General_100_CS_AI_WS_SC | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-insensitive, width-sensitive, supplementary characters |
Latin1_General_100_CS_AI_KS_SC | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-insensitive, supplementary characters |
Latin1_General_100_CS_AI_KS_WS_SC | Latin1-General-100, case-sensitive, accent-insensitive, kanatype-sensitive, width-sensitive, supplementary characters |
Latin1_General_100_CS_AS_SC | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-insensitive, supplementary characters |
Latin1_General_100_CS_AS_WS_SC | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-insensitive, width-sensitive, supplementary characters |
Latin1_General_100_CS_AS_KS_SC | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-insensitive, supplementary characters |
Latin1_General_100_CS_AS_KS_WS_SC | Latin1-General-100, case-sensitive, accent-sensitive, kanatype-sensitive, width-sensitive, supplementary characters |
As you can see, none of the collations in the above list include the
_VSS
option, because that option is only applicable to certain Japanese collations. SQL Server 2017 includes 32 collations with this option (such as Japanese_Bushu_Kakusu_140_CI_AI_KS_WS_VSS
for example).
0 comments:
Post a Comment