You'll regret using natural keys

canpolat@programming.dev · 5 months ago

You'll regret using natural keys

lemmyvore@feddit.nl · 5 months ago

These days I follow a hard heuristic: Always use synthetic keys for database tables.

And the way to follow this rule is fairly simple, but it has a few twists.

For internal use, the best and most common key (in a relational database) is an auto-generated incremental sequence. But it it ok to use it externally? – across databases, across types of data storage, across APIs / services etc.

It’s tempting to refer to the sequence number in API calls, after all they are going to that particular database and are only going to be used with it, right? Well not necessarily; the database and the code powering the API are different systems, who says there won’t be other apps accessing the database for example.

The current OpSec school of thought is that sequence keys are an internal database mechanism and sequence numbers should only be used for internal consistency, never used as external references (even for the “local” API).

Sequence keys also don’t offer any way to deal with creating duplicate data entries. If you’ve been around for a while you’ve seen this, the client sends the same “create” request twice for whatever reason (UI lets user multiple-click a button, client assumes timeout when in fact it had gone through etc.) Some programmers attempt to run heuristics on the data and ignore successive create attempts that look “too similar” but it can backfire in many ways.

An UUID is pretty much universally supported nowadays, its designed to be unique across a vast amount of systems, doesn’t give anything away about your internal mechanisms, and if you ask the client to generate the UUID for create requests you can neatly solve the duplicate issue.

Do keep in mind that this doesn’t solve the problem of bijection across many years and many systems and many databases. An entity may still acquire multiple UUID’s, even if they’re each individually perfectly fine.

There can also be circumstances where you have to offer people a natural-looking key for general consumption. You can’t put UUID’s on car plates for example.

atzanteol@sh.itjust.works · 5 months ago

I love UUID keys. Only drawback is they can be a bit difficult to share, but they solve so many other problems. Being able to generate them client-side makes batch inserting with dependencies easier, being globally unique means you can copy data between environments, never needing to “fix” a sequence again… So many up-sides.

CasualTee@beehaw.org · 5 months ago

On the topic of exposing sequence number in APIs, this has been a security issue in the past. Here is one I remember: https://www.reuters.com/article/us-cyber-travel-idUSKBN14G1I6/

From the article:

Two of the three big booking systems - Amadeus and Travelport - assign booking codes sequentially, making brute-force computer guesswork easier. Of the three, Amadeus, through its web portal CheckMyTrip, is especially vulnerable, Nohl said.

The PNRs (flight booking code) have many more security issues, but at least nowadays, their sequential aspect should no longer be exposed.

So that’s one more reason to be careful when exposing DB id in APIs, even if converted to a natural looking key or at least something easier to remember.

Ephera@lemmy.ml · 5 months ago

There can also be circumstances where you have to offer people a natural-looking key for general consumption. You can’t put UUID’s on car plates for example.

Often times, the first section of the UUID is unique enough. With certain UI design choices, one can encourage users to normally work with that, while having the full UUID available in a detail view or from a copy-button.

Another strategy I quite like, is to have the UUID as the definitely-always-unique identifier, and then have a separate name, which either the users can enter or we generate something like random adjective+animal.

But yes, neither of those strategies would work for car plates.

lemmyvore@feddit.nl · 5 months ago

Speaking of car plates, the Wikipedia pages for “Vehicle license plates of [insert country here]” are a rabbit hole.

I was just reading the page for Romania the other day, speaking of uniqueness, and they had this issue apparently where the combinations overall were enough for the whole country but not enough for their capital city, so they had to hack an extra digit into the plates for the capital.

Reddfugee42@lemmy.world · 5 months ago

Seeing as just five alphanumeric symbols gets you over 60 million variations, exactly how the hell did they fuck up that hard

lemmyvore@feddit.nl · 5 months ago

60M total but divided among 40 counties makes 1.5M variations per county and the capital city (which is its own county, like Berlin) went over that.

I looked it up and Bucharest actually has only a 1.7M population so… I think it’s understandable that nobody expected an almost 1:1 person-to-car ratio. Exactly why and how they reached that crazy ratio I have no idea. 😆

Told you it’s a crazy rabbit hole.

Reddfugee42@lemmy.world · 5 months ago

Ah Ok, so the mistake was allotting per county.

Skydancer@pawb.social · 5 months ago

A well argued point. Could have done without the random transphobic comments about “transsexuals” and “perceived gender”.

AwesomeLowlander@lemmy.dbzer0.com · 5 months ago

Agreed with the other commenter, there’s no real reason to think the author’s transphobic due to a random phrase. We as a society really need to stop organising witch hunts.

Skydancer@pawb.social · edit-2 5 months ago

There’s nothing wrong with the example in and of itself, but the word “transsexual” in place of “transgender” is not generally random. It is explicitly chosen by Trans-Exclusionary Radical Feminists (TERFs) as well as by right-wing transphobes as a dog whistle to conflate gender dysphoria with drag queens and cross-dess fetishists so as to delegitimise transpeople and suggest some sort of sexual deviance. Coupled with the equivocation of “perceived” gender, motive doesn’t even have to come into it. The words themselves and the concepts they reinforce are transphobic and harmful.

A witch hunt would have been for me to say that the author is a transphobic asshole whose writings need to be wiped from the internet - which is very far from what I actually posted, which was regret for the way the language they chose distracted from the flow of their argument by reinforcing the social stigmatization of trans people. (Edit: That was a deliberate choice on my part. Not knowing enough about the author to be sure of motives and having no desire to deep dive into their history, I decided that it was only appropriate to point out the hurtful nature of the language and not imply motive.)

AwesomeLowlander@lemmy.dbzer0.com · 5 months ago

It is explicitly chosen by Trans-Exclusionary Radical Feminists (TERFs) as well as by right-wing transphobes

It can also be used by people with no agenda, including most of the non-western world. Language policing is ridiculous. You want to cancel the transphobes? Stop giving them power and reclaim their terms from them.

TehPers@beehaw.org · 5 months ago

Speaking as someone with a MTF close friend and NB spouse, but the term used in the article is the term everyone around me used when I was growing up. That term may be obsolete now, and if so, the author simply needs to be informed. There’s no need to assume they meant harm by it.

If they knowingly used a term that may offend, then that’s of course a separate issue.

MyNameIsRichard@lemmy.ml · 5 months ago

Some people just need to be offended

Reddfugee42@lemmy.world · 5 months ago

We as a society really need to stop organising witch hunts.

Yeah, we’re way too hard on bigots

Womble@lemmy.world · 5 months ago

The author is Danish, are you so certain that this isnt just slightly awkward usage of a second laguage that you are willing to throw transphobic at them as an insult?

Skydancer@pawb.social · 5 months ago

No - it was the language that I said was transphobic, not the author. Given that there were two different word choices (“transsexual” and “perceived gender”) that reinforced each other, it seems more likely than not that they reflected the mindset of the author, but not having looked further for their other writings I was not sure. That’s why I said " transphobic language" and not “transphobic author”.

Redkey@programming.dev · 5 months ago

I think I can appreciate where you’re coming from, but in the context of the article it was legitimately necessary to address the topic somehow; it’s not like it was written apropos of nothing as a commentary on transsexuality. As a CIS person, I also have a “percieved gender” with which I identify.

Would “post-transition gender” be a more sensitive term, or less?

Skydancer@pawb.social · 5 months ago

More, but there’s an even simpler solution. In the context, the author is distinguishing between “sex assigned at birth” and “perceived gender.” The equivocating word " perceived" could simply be dropped with no loss of clarity.

BrianTheeBiscuiteer@lemmy.world · 5 months ago

Got hands on experience with this. Wasn’t my design choice but I inherited an app with a database where one of the keys was tied to a completely separate database. I mean at the time it probably made sense but the most unlikely of scenarios actually happened: that other database, the one I had zero control over, was migrated to a new platform. All of those keys were synthetic so of course they were like, “Meh, why we gotta keep the old keys?” So post-migration my app becomes basically useless and I spent 6 hours writing migration code, some of it on off hours, to fix my data.

So it’s questionable whether a foreign key of a completely different system is a natural key, but at the very least never use a key YOU don’t control.

state_electrician@discuss.tchncs.de · 5 months ago

I don’t think that holds true in all scenarios. You need to use a key that has some guarantees. In many systems you will use data you don’t control, like email addresses, IBANs, ISBNs, passport IDs and many more. You have zero control over those keys, but because each comes with certain guarantees, they might be suitable as a foreign key in your context.

Kissaki@programming.dev · 5 months ago

People regularly change email addresses. Listing that as an example is a particularly bad example in my opinion.

Jens-Uwe Mager@chaos.social · 5 months ago

@Kissaki @state_electrician Well, I use the same self hosted Email address since the late 80‘s. I do not consider that changing it regularly.

smaximov@lemmy.world · 5 months ago

Well, I use the same self hosted Email address since the late 80‘s

Personal anecdotes are rarely pose a valid argument (unless you are designing a database specifically for users who use the same email address since the late 80’s).

Kissaki@programming.dev · 5 months ago

Ignoring secondary email addresses, what was my primary [onlineaccount] E-Mail address has changed four times.

BrianTheeBiscuiteer@lemmy.world · 5 months ago

Even in this scenario it’s feasible for standards to change. ISBN-15 becomes a thing and suddenly you have books that never get an ISBN-13 so your primary key constraints cause an error for trying to insert a null. Granted, you can see a lot of these changes coming but again, they come on a schedule you don’t control.

Eager Eagle@lemmy.world · edit-2 5 months ago

These things can just be unique fields. I think the takeaway here is exactly to not use these unique fields as database keys if you have the option / if it’s up to you.

kevincox@lemmy.ml · 5 months ago

It is true, don’t do it.

Even at huge companies like Google, lots of stuff was keyed on your email address. This was a huge problem so Google employees were not allowed to change their email for the longest time. Eventually they opened it up by request but they made it very clear that you would run into problems. So many systems and services would break. Over time I think most external services are pretty robust now, but lots of internal systems still use emails (or the username part of it) and have issues.

IIUC Google accounts now use a random number as the key. But there are still places where the email is in use, slowly being fixed at massive cost.

Reddfugee42@lemmy.world · 5 months ago

random number as the key

UUID

kevincox@lemmy.ml · 5 months ago

I don’t think that is true. Not much at Google really bought into the UUID hype. At least not for internal interfaces. But really there is no difference between a UUID v4 and a large random number. UUID just specifies a standard formatting.

Reddfugee42@lemmy.world · 5 months ago

Why reinvent the wheel

Thorry84@feddit.nl · 5 months ago

I’d recommend reading this if you are interested in why making assumptions when designing a system is a bad idea (but often necessary): https://github.com/kdeldycke/awesome-falsehood

Especially the one on names: https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ And the one on time: http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time

magic_lobster_party@kbin.run · 5 months ago

This is tied to YAGNI. Do you really need to impose this restriction on this database? Don’t add restrictions “just in case it might be useful”. If you don’t have a good reason, you ain’t gonna need it.

Kissaki@programming.dev · 5 months ago

Are you saying “don’t use a synthetic key, you ain’t gonna need it”?

Corbin@programming.dev · 5 months ago

I think they’re saying that e.g. you shouldn’t index a natural key unless you know that you’re going to search/collate by that key as a column. Telling the database that a certain column contains (a component of) the primary key is adding a restriction to that column.

Aux@lemmy.world · 5 months ago

I didn’t even know that natural keys exist. Who is using them and why?