What effects does using a binary collation have? ~ Tech Blog

While answering this question, I became uncertain about something that I didn't manage to find a sufficient answer to.

What are the practical differences between using the binary utf8_bin and the case insensitive utf8_general_ci collations?

I can see three:

Both have a different sorting order; _bin's sorting order is likely to put any umlauts to the end of the alphabet, because byte values are compared (right?)
Only case sensitive searches in _bin
No A = Ä equality in _bin

Are there any other differences or side-effects to be aware of?

Reference:

9.1.2. Character Sets and Collations in MySQL
9.1.7.6. The _bin and binary Collations in the mySQL manual
9.1.7.7. The BINARY Operator

Similar questions that don't address the issue:

UTF-8: General? Bin? Unicode?

Answers

Binary collation compares your string exactly as strcmp() in C would do, if characters are different (be it just case or diacritics difference). The downside of it that the sort order is not natural.

An example of unnatural sort order (as in "binary" is) : A,B,a,b Natural sort order would be in this case e.g : A,a,B,b (small and capital variations of the sme letter are sorted next to each other)

The practical advantage of binary collation is its speed, as string comparison is very simple/fast. In general case, indexes with binary might not produce expected results for sort, however for exact matches they can be useful.

The other answers explain the differences well.

Binary collation can be useful in some cases :

column contains hexadecimal data like password hashes
you are only interested in exact matches, not sorting
for identifiers with only [a-z0-9_] characters, you can even use it for sorting
for some reason you store numbers in CHAR() or VARCHAR columns (like telephones)
zipcodes
UUIDs
etc

In all those cases you can save a (little) bit of cpu cycles with a binary collation.

Tech Blog

Tuesday, 6 November 2018