I also reached out to them on Twitter but they directed me to this form. I followed up with them on Twitter with what happened in this screenshot but they are now ignoring me.
I also reached out to them on Twitter but they directed me to this form. I followed up with them on Twitter with what happened in this screenshot but they are now ignoring me.
Unicode has standard rules for case folding, which includes the rules for all languages supported by Unicode. Case-insensitive comparisons in all good programming languages uses this data.
Note that you can’t simply convert the text to uppercase or lowercase to compare it, as then you’ll run into the Turkish i problem: https://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/
It’s that capitalization is language dependent, which email addresses shouldn’t be as I hope the rules for France shouldn’t be different than for Dutch. For instance é in Dutch is capitalized as E, but in French it is É. The eszett didn’t even have an official capital before 2017
In most programming languages, case-insensitive string compare without specifying the culture became deprecated. It should imo only be used for fuzzy searching doubles, which you probably will do with ToUpper for performance reasons, or maybe some UI validation.
Sure, but we’re just talking about string comparison rules, and Unicode sees all three of those as being equal. For example, a search engine that uses proper case folding rules in its indexer should return results for “entrée” if you search for “entree”, “Čech” if you search for “cech”, etc.
You can’t just use ToUpper for comparisons due to issues like you mentioned, and the Turkish i problem. You need to do proper case-insensitive comparisons, which is where the Unicode case folding rules are used.
offtopic: The eszett strictly speaking was a ligature for ‘sz’, which Hungarian orthography kinda preserved while for German the separated version is ‘ss’, and there’s plenty of such stuff in nature.
Thank you for saying that more clearly.
So good that we all use Unicode now. No CP1251, no ISO single-byte encodings, no Japanese encoding hell.
Yeah, living in 2123 sure is good