Mysql: What are the diffrences between utf8_general_ci and utf8_unicode

Possible Duplicate:
What's the difference between utf8_general_ci and utf8_unicode_ci

I've got two options for unicode that look promising for a mysql database.

utf8_general_ci unicode (multilingual), case-insensitive
utf8_unicode_ci unicode (multilingual), case-insensitive

Can you please explain what is the difference between utf8_general_ci and utf8_unicode_ci?

What are the effects of choosing one over the other when designing a database?

Answers

utf8_general_ci is a very simple — and on Unicode, very broken — collation,

one that gives incorrect results on general Unicode text. What it does is:

converts to Unicode normalization form D for canonical decomposition
removes any combining characters
converts to upper case

This does not work correctly on Unicode, because it does not understand Unicode casing.

Unicode casing alone is much more complicated than an ASCII-minded approach can

handle. For example:

The lowercase of “ẞ” is “ß”, but the uppercase of “ß” is “SS”.
There are two lowercase Greek sigmas, but only one uppercase one;
Letters like “ø” do not decompose to an “o” plus a diacritic, meaning

There are many other subtleties.

utf8_unicode_ci uses the standard Unicode Collation Algorithm,

utf8_general_ci does not support expansions/ligatures, it sorts all these letters as

single characters, and sometimes in a wrong order.

utf8_unicode_ci is generally more accurate for all scripts. For example,

The cost of utf8_unicode_ci is that it is a little bit slower than utf8_general_ci.

But that’s the price you pay for correctness. Either you can have a fast answer that’s wrong,

or a very slightly slower answer that’s right. Your choice. It is very difficult to ever justify

giving wrong answers, so it’s best to assume that utf8_general_ci doesn’t exist and to

always use utf8_unicode_ci. Well, unless you want wrong answers.

Source: http://forums.mysql.com/read.php?103,187048,188748#msg-188748

Tech Blog

Wednesday, 14 November 2018