Regular expressions come with all sorts of peculiarities, one of which I recently ran into when creating a regex within PHP and preg_match. I was trying to parse strings with the format "Real Name (:username)" when I ran into a problem I would see a lot at Mozilla: my regular expression wasn't properly catching "special" or "international" letters, like à, é, ü, and the dozens of others.
My regular expression was using A-z in the real name matching piece of the regex, which I assumed would match special letters, but it did not:
My regular expression was using A-z in the real name matching piece of the regex, which I assumed would match special letters, but it did not:
preg_match(
"/([A-Za-z -]+)?\s?\[?\(?:([A-Za-z0-9\-\_]+)\)?\]?/",
"Yep Nopé [:ynope]", $matches);
// 0 => '[:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'
To match international letters, I needed to update my regular expression in two ways:
Change A-z to \pL within the matching piece
Add the u modifier makes the string treated as UTF-8
The updated regex would be:
To match international letters, I needed to update my regular expression in two ways:
Change A-z to \pL within the matching piece
Add the u modifier makes the string treated as UTF-8
The updated regex would be:
preg_match(
"/([\pL -]+)?\s?\[?\(?:([\pL0-9\-\_]+)\)?\]?/u",
"Yep Nopé [:ynope]", $matches);
// 0 => 'Yep Nopé [:ynope]', 1 => 'Yep Nopé', 2 => 'ynope'
You can see my simple test bed here. If you're afraid that other characters might seep in, or don't trust \pL, you could list every special letter manually (i.e. [A-zàáâä....])
One of the nice parts of working at a truly global organization like Mozilla is that I'm exposed to many edge cases; in this case, a few special letters!
You can see my simple test bed here. If you're afraid that other characters might seep in, or don't trust \pL, you could list every special letter manually (i.e. [A-zàáâä....])
One of the nice parts of working at a truly global organization like Mozilla is that I'm exposed to many edge cases; in this case, a few special letters!
0 comments:
Post a Comment