I'm locked out of my 6 year old Chipotle account because they now say my email address is invalid when I login. Here is me asking for their help:

sacbuntchris@lemmy.world · 3 years ago

I'm locked out of my 6 year old Chipotle account because they now say my email address is invalid when I login. Here is me asking for their help:

dan@upvote.au · edit-2 3 years ago

‘U’ and ‘u’ are two different symbols. And you have to make such rules for every language a part of your processing logic.

Unicode has standard rules for case folding, which includes the rules for all languages supported by Unicode. Case-insensitive comparisons in all good programming languages uses this data.

Note that you can’t simply convert the text to uppercase or lowercase to compare it, as then you’ll run into the Turkish i problem: https://haacked.com/archive/2012/07/05/turkish-i-problem-and-why-you-should-care.aspx/

labsin@sh.itjust.works · edit-2 3 years ago

It’s that capitalization is language dependent, which email addresses shouldn’t be as I hope the rules for France shouldn’t be different than for Dutch. For instance é in Dutch is capitalized as E, but in French it is É. The eszett didn’t even have an official capital before 2017

In most programming languages, case-insensitive string compare without specifying the culture became deprecated. It should imo only be used for fuzzy searching doubles, which you probably will do with ToUpper for performance reasons, or maybe some UI validation.

dan@upvote.au · edit-2 3 years ago

For instance é in Dutch is capitalized as E, but in French it is É

Sure, but we’re just talking about string comparison rules, and Unicode sees all three of those as being equal. For example, a search engine that uses proper case folding rules in its indexer should return results for “entrée” if you search for “entree”, “Čech” if you search for “cech”, etc.

It should imo only be used for fuzzy searching doubles, which you probably will do with ToUpper

You can’t just use ToUpper for comparisons due to issues like you mentioned, and the Turkish i problem. You need to do proper case-insensitive comparisons, which is where the Unicode case folding rules are used.

rottingleaf@lemmy.zip · 3 years ago

offtopic: The eszett strictly speaking was a ligature for ‘sz’, which Hungarian orthography kinda preserved while for German the separated version is ‘ss’, and there’s plenty of such stuff in nature.

In most programming languages, case-insensitive string compare without specifying the culture became deprecated. It should imo only be used for fuzzy searching doubles, which you probably will do with ToUpper on all four performance reasons, or maybe some UI validation.

Thank you for saying that more clearly.

rottingleaf@lemmy.zip · 3 years ago

So good that we all use Unicode now. No CP1251, no ISO single-byte encodings, no Japanese encoding hell.

lad@programming.dev · 3 years ago

Yeah, living in 2123 sure is good