I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
It’s so basic that documentation is completely unnecessary.
“De-duping” could mean multiple things, depending on what you mean by “duplicate”.
It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.
It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.
A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.
SSN is also not a valid unique key, there have been situations with multiple people issued the same SSN:
Yeah. And the fix for that has nothing to do with “de-duping” as a database operation either.
The main components would probably be:
- Decide on a new scheme (with more digits)
- Create a mapping from the old scheme to the new scheme. (that’s where existing duplicates would get removed)
- Let people use both during some transition period, after which the old one isn’t valid any more.
- Decide when you’re going to stop issuing old SSNs and only issue new ones to people born after some date.
There’s a lot of complication in each of those steps but none of them are particularly dependant on “de-duped” databases.
Just read the format of the us ssn in that wikipedia. That wasnt a smart format to use lol. Only supports 99*999 ( +/- 100k ) people per area code. No wonder numbers are reused.
In some countries its birthday+sequence number encoded with gender+checksum and that has been working since the 80’s.
Before that was a different number, but it wasnt future proof like the us ssn so we migrated away in the 80’s :')In my country the only way that someone has the same number is if someone was born on the same day (±1 century), in the same city and has the same name and family name. Is extremely difficult to have duplicates in that way (exception: immigrants, because the “city code” is the same for the whole foreign country, so it’s not impossible that there are two Ananya Gupta born on the same day in the whole India)
Oh ye, our system wouldnt fit india as its limited to 500 births a day ( sequence is 3, digits and depending if its even or uneven describes your gender ). Your system seems fine to me and beats the us system hands down haha
There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.
Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.
JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.
Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.
Another accusation Elon made was that payments are going to people missing SSNs.
A much simpler answer is thatnot all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.It’s true that some Americans don’t have Social Security numbers, but those Americans can’t collect Social Security benefits unless/until they get one.
My bad, I thought it was about payments in general (including other programs) but it says social security database. Sorry.
The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.
The man continues to be a malignant moron
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.
Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.
https://www.ssa.gov/history/hfaq.html
Q20: Are Social Security numbers reused after a person dies?
A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.
Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.
In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.
A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.
Of all the comments ao far, I find yours the most compelling.
Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:
SSN_Table
ID | SSN | Other info
Other_Table
ID | SSN_ID | Other info
When you want to connect them to have both sets of info, it’d be the following:
SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID
EDIT: Oh, just to clear up any confusion, the SSN_ID in this simple example is not the SSN itself. To access that in this example query, it’d by SSN_Table.SSN
This is true, but there are many instances where denormalization makes sense and is frequently used.
A common example is a table that is frequently read. Instead of going to the “central” table the data is denormalized for faster access. This is completely standard practice for every large system.
There’s nothing inherently wrong with it, but it can be easily misused. With SSN, I’d think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I’m sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I’m positively certain they’re completely ignoring the security aspect.
To be a bit more generic here, when you’re at government scale you’re generally deep in trade-off territory. Time and space are frequently opposed values and you have to choose which one is most important, and consider the expenses of both.
E.g. caching is duplicating data to save time. Without it we’d have lower storage costs, but longer wait times and more network traffic.
Yeah, no one appreciates security.
I probably overused that saying to explain it: ‘if theres no break ins, why do we pay for security? Oh, there was a break in - what do we even pay security for?’
Yeah, I work daily with a database with a very important non-ID field that is denormalized throughout most of the database. It’s not a common design pattern, but it is done from time to time.
Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.
But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.
It’s necessary to split it out into different tables if you have a one-to-many relationship. Let’s say you have a list of driver licenses the person has had over the years, for example. Then you’d need the second table. So something like this:
SSN_Table
ID | SSN | Other info
Driver_License_Table
ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info
Then you could do something like pull up a person’s latest driver’s license, or list all the ones they had, or pull up the SSN associated with that license.
I think a likely scenario would be for name changes, such as taking your partner’s surname after marriage.
The SSN is likely to appear in multiple tables, because they will reference a central table that ties it all together. This central table will likely only contain the SSN, the birth date (from what others have been saying), as well as potentially first and last name. In this table, the entries have to be unique.
But then you might have another table, like a table listing all the physical exams, which has the SSN to be able to link it to the person’s name, but ultimately just adds more information to this one person. It does not duplicate the SSN in a way that would be bad.It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.
Hell, I work in a state agency and one of our older databases has a dozen tables with databases.
- One has the whole thing as a long int: 222333444
- One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
- One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
- One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444
The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.
Okay but if that happens, musk is right that that’s a bit of a denormalization issue that mayne needs resolving.
SSNs should be stored as strings without any hyphen or additional markup, nothing else.
- Storing as a number can cause issues if you ever wanna support trailing zeros
- any “styling” like hyphens should be handled by a consuming front end system, you want only the important data in the DB to maximize query times
It’s more likely though it’s just a composite key…
This is not what he is actively doing though. He isn’t trying to improve databases.
He is tearing down entire departments and agencies and using shit like this to justify it.
Sure but my point is, if it was the scenario you described, then Elon would be talking about the right kind of denormalization problem.
Denormalization due to multiple different tables storing their own copies of the same data, in different formats worse yet, would actually be the kind of problem he’s tweeting about.
As opposed to a composite key on one table which means him being an ultracrepidarian, as usual.
Musk canceled the support for the long running Common Education Data Standards (CEDS) which is an initiative to promote better database standards and normalization for the states to address this kind of thing.
It does not fucking matter if he is technically correct about one tiny detail because he is only using to to destroy, not to improve efficiency.
My guess would be around your note. If someone mistakenly has two SSNs (due to fraud, error, or name changes), combining DOB helps detect inconsistencies.
Some other possibilities, and I’m just throwing out ideas at this point:
- Adding DOB could help with manual lookups and verification.
- Using SSN + DOB ensures a standard key format across agencies, making it easier to link records.
- Prevents accidental duplication if an SSN is mistyped.
- Maybe the databases were optimized for fixed-length fields, and combining SSN + DOB fit within memory constraints.
- It was easier to locate records with a “human-readable” key. Where as something like a UUID is harder for humans to read or sift through.
Beat me to asking this follow up, though you linking additional resources is probably more effort that I would have done. Thanks for that!
Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn’t understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.
Is this is true, then by this time we are all fucked. Like Monday someone checks their banking or retirement and it all gone. That’s gonna be a crazy day.
I hope they’re not using the actual SSN as the primary key. I hope its a big ass number that is otherwise unrelated.
TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.
You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)
now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.
The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,
i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case. Maybe even use historical backups or CoW to retain that kind of data.
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
and naturally, he doesn’t know what the term “de-duplication” means. Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.
in this case you would just overwrite the existing row, you wouldn’t use de-duplication because it would do the opposite of what you wanted in that case.
… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there
Definitionally, the actual identity of the person MUST be unique, otherwise you’re going to somehow return two rows, when you call one, which is functionally impossible given how a DB is designed.
… I don’t think you understand how modern databases are designed
… That’s what I said, you’d just update the row, i.e. replace the existing data, i.e. overwrite what’s already there
u were talking about not keeping historical data, which is one of the proposed reasons you would have “duplicate” entries, i was just clarifying that.
… I don’t think you understand how modern databases are designed
it’s my understanding that when it comes to storing data that it shouldn’t be possible to have two independent stores of the exact same thing, in two separate places, you could have duplicate data entries, but that’s irrelevant to the discussion of de-duplication aside from data consolidation. Which i don’t imagine is an intended usecase for a DB. Considering that you literally already have one identical entry. Of course you could simply make it non identical, that goes without saying.
Also, we’re talking about the DB used for the social security database, not fucking tigerbeetle.
Ssn being unique isnt a dumb idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
even then, i wonder if there’s some sort of “row hash function” that takes a hash of all the data in a single entry, and generates a universally unique hash of that entry, as a form of “global id”
As a data engineer for the past 20+ years: There is absolutely no fucking way that the us gov doesnt use sql. This is what shows that he’s stupid not only in sql but in data science in general.
Regarding duplications: its more nuanced than those statements each side put. There can be duplications in certain situations. In some situations there shouldnt be. And I dont really see how duplications in a db is open to fraud.
Well we heard what the Whitehouse press secretary has to say about the fraud they found 2 days ago. They found massive amounts and she brought receipts! All of them were examples of money being spent that disagree with Trump’s new policies. Like money spent on DEI initiatives and aid sent to countries in Africa to help slow the spread of HIV. That receipt was for a laughable $57,000.
Then when asked how any of it was fraud she said, well they consider that fraud because it wasn’t used to help Americans.
So the 27 year old married to a billionaire 32 years older than her is complaining that the money wasnt directly spent on her gold digging ass, and if it’s not spent directly on her, it’s fraud.
Biggest disgrace of a government that has ever existed.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
If it’s used as an identifier to link together rows from different tables. Also known as “joining” tables. SSN (with birthdate) is a unique identifier, and so it’s natural to choose as a primary/foreign key.
It really is baffling trying to make sense of what he is saying. It’s like the only explanation that makes any sense at all is that he has no idea what he is talking about. Even if he knew just cursory knowledge about database cardinality you wouldn’t say stuff so stupid.
Oh yeah? How about SCD? I bet all ssn are in an SCD.
It doesn’t matter without scope. Are we looking at a database of SSNs? tax records? A sign in log? The social security number database might require uniques in some way, but tax records could be the same person over multiple years. A sign in gives a unique identifier but you could be signing in every day.
It’s like saying a car VIN shows up multiple times in a database. Where? What database? Was it sold? Tickets? Registered every year?
This is nothing more than a “assume I mean immigrants or tax fraud and get mad!” inflammatory statement with no proof or reason.
deleted by creator
The sheer size of the federal government and its age would mean there are thousands of databases out there. Some may be so old that they predate RDBMS/SQL.
That alone makes his comment come from a place of ignorance. Of course it’s confident ignorance. The worst kind.
Definitely “Confidently Incorrect” material.
Some may be so old that they predate RDBMS/SQL.
I don’t follow. Wouldn’t that lend credence to his assertion that it’s incorrect to assume that everything in government is SQL?
People here are being irrationally obtuse about the possibility that an agency that’s existed since the 1930s may keep business-critical records on legacy systems predating relational databases. Systems serving a national agency may not migrate databases frequently.
What’s he’s arguing is that the government doesn’t use SQL at all.
Were those his exact words? When words are ambiguous, are we selecting interpretations that serve best in the contention? Does the context suggest something obvious was left unstated? Yours seems like a forced interpretation.
- He complains about 1 specific database.
- Some rando assumes it’s SQL & retorts he doesn’t know it.
- He literally writes “This retard thinks the government uses SQL.”
Always, sometimes, here? In typical Twitter fashion, it’s brief and leaves room for interpretation.
In context, always or here makes the most sense as in “This dumbass thinks the government always uses SQL.” or “This dumbass thinks the government uses SQL here.” Does it matter some other database is SQL if this one isn’t? No. With your interpretation, he pointlessly claims that it does matter for no better reason than to discredit himself. With narrower interpretations, he doesn’t. In a contention, people don’t typically make pointless claims to discredit themselves. Therefore, narrower interpretations make more sense. Use context.
All I did here was apply textbook guidelines for analyzing arguments & strawman fallacies as explained in The Power of Logic. I welcome everyone to do the same.
A problem with objecting to a proposition that misrepresents the original proposition is that the objector fails to engage with the actual argument. Instead, they argue with themselves & their illusions, which looks foolish & isn’t a valid argument. That’s why strawman is a fallacy.
The fact is there’s very little information here. We don’t know which database he’s referring to exactly. We don’t know its technology. Some of us have worked enough with local government & legacy enterprise systems to know that following any sort of common industry standards is an unsafe assumption. No one here has introduced concrete information on any of that to draw clear conclusions, though there’s an awful lot of conjecture & overreading.
He seemed to use the word de-duplicated incorrectly. However, he also explained exactly what he meant by that, so the word hardly matters. Is there a good chance he’s wrong that multiple records with the same SSN indicate fraud? Without a clear explanation of the data architecture, I think so.
I despise idiocy. Therefore, I despise what Musk is doing to the government. Therefore, I despise it when everyone else does it.
Seeing this post keep popping up in the lemmy feed is annoying when it’s clear from context that there’s nothing there but people reading more into it.
We don’t have to become idiots to denounce idiocy.
He literally writes “This retard thinks the government uses SQL.”
That is all you need. He’s not saying “This retard thinks the SSA uses SQL”. He is saying “the government” which means all of it. Saying someone is a retard because they think the government uses SQL means Elon doesn’t think they do because we all know he doesn’t consider himself a retard.
You are looking for ambiguity where there is none.
Nah, that’s ignoring context irrationally. Context matters. I’ll show.
He’s not saying “This retard thinks the SSA uses SQL”.
Can SSA not be called “the government”?
He is saying “the government” which means all of it.
So, let’s try your suggested interpretation.
This retard thinks all the government uses SQL.
That seems to agree with mine.
However, you denied ambiguity of language, and that context matters, so let’s explore that: which government? The Brazilian government? Your state government? Your local government? No? How do you know? That’s right: context.
Why stop there? There’s more context: a Social Security database was specifically mentioned.
Does “the government” always mean all of it? When a federal agent knocks someone’s door & someone gripes “The goddamn government is after me!” do they literally mean the entire government? I know from context I or anyone else can informally refer to any part of the government at any level as “the government”. I think you know this.
Likewise, when people refer to the ocean or the sky or the people, they don’t necessarily mean all of it or all of them.
Another way to check meaning is to test whether a proposition still makes sense when something obvious unstated is explicitly written out.
This retard thinks the government uses SQL. Why assume they use SQL here?
Still make sense? Yes. Could that be understood from context without explicitly writing it out? Yes.
A refrain:
Use context.
Elon Musk is also an idiot. He thinks he’s smart enough to quickly understand complex situations and complex problems about which he knows next to nothing, within just a few minutes.
Most people would only try to claim that level of understanding in areas with which they have professional experience or about which they’re extremely geeky. He does it with everything, and nobody can be an expert in everything, and everybody knows that except for narcissistists.
I suppose for non-tech people it might be convenient to assume that because someone knows something about some kind of tech, they therefore know a lot about all kinds of tech, and the reality is that’s just not true. There are so many fields that are totally different. But if it did, actually he would look even more idiotic, because Twitter is a train wreck, so clearly he’s incompetent in tech field, right?
Lol talk about burying the lede… The issue here is that the government absolutely uses SQL to traverse a DB and anyone who thinks otherwise is an idiot.
Naw, I definitely meant to be asking about duplication of data in databases (vs if the government actually uses SQL).
Sorry to have communicated that so poorly. Everyone seems to be taking the angle you’re arguing though. Guess I’ll need to work on that.
The Social Security FAQ page (Q 20, specifically) says that they do not do re-use of old SSNs when people die.
The SSN is 9 digits long; so technically they would have to start re-using them after the billionth one. Given the current population size, and how many people have been born/died since its implementation - it’s fair to say they haven’t had to re-use any figures yet.
The number is structured though. Some positions represent things like the geographic region you were born in, others relate to the year you were born. That drastically cuts down the available numbers as the entire range isn’t available in all situations.
Not to mention, anyone who has worked in the US gets a SSN, not just citizens or current residents.
I know a bunch of people over here in europe who have them after working a few months/ years in the US.
They have several generations to go. Literally not a problem for us or our grandchildren.
Right. Fingers crossed we figure out national IDs before then.
Papers please
Elon Musk’s degree is in economics. He might be a script kiddie.
But I was assured he was a materials engineer, rocket scientist, computer programmer, and businessman extraordinaire!
Elin musk is a (criminal) scammer, he always has been.
He was fired for incompetence from his own company
Pretty much everything he’s promised for every company he has headed had been a lie. Tesla full self driving? Lie. Hyperloop? All lies to successful kill high speed rail and start a movement that wasted billions of dollars including tax payer money. Even SpaceX, the least shit of all, is shit. Once you really look at it, its all promises with no results and lots of cheering when millions of tax payer dollars -yet again- blow up in the sky.
The guy has one quality: convincing people that he’s smart even though he literally doesn’t know shit
Also he doesnt know what an SCD is.
Wait, SSNs weren’t designed to be GUIDs? I mean, I fully follow that they aren’t and we’ve had to reuse them when the circle of life does its thing, but I thought they were just designed poorly and we found out the hard eay they don’t work as GUIDs. What purpose were they designed for if not to act as GUIDs?
They were designed to be only used for the administration of social security. Since they were sending monthly checks, they needed a way to know that the person going to the office and saying their address changed was who they said they were. This was at a time before driver’s licences were common and they didn’t have any other type of ID, and there were just a lot fewer people.
Later on the SSN started to be used by banks and other entities even though it was never meant for that, and the risks associated with the relatively insecure design just compounded, because instead of just fraudulently claiming someone else’s social security checks (which, unless the target died, would probably be figured out within a month), it opened up all sorts of extra avenues for fraud.
I’m not arguing that Elon musk is anything but an absolute tool.
SS numbers have 999 million options. Are we already repeating them?
We have over 300 million people in the US right now. Social security started in the US in 1935 with just over 127 million people then.
Yeah, we probably have gone through 999 million options by now.
I don’t think we’ve gone through 999 million options yet. Only about 350 million people have been born since 1933, so even if we add all 127 million US citizens alive in 1935, that’s just over half of the possible social security numbers.
The reason we’ve likely reused numbers is because they weren’t randomly assigned until like 2011. Knowing that I was born in 1995 in Wichita, KS, you could make an educated guess at the first three digits of my SSN
We have 335 million people in this country literally right now. I don’t think “350 million born since 1933” makes sense. There gotta be a lot of churn just from early deaths alone.
Edit: number fixin
Not every person in the United States was born in the United States and even temporary workers can get a SSN
I mean you can check my math, I just added up all the births per year in this article
Rounding to one significant figure, it’s 311.9 million people born in the US between 1933 and 2018. Adding an average of 4 million births per year since then, it’s 335.9. I rounded up to 350 to bring it to a nice round number
A bit of research tells me that around 44.8 million of us are first generation immigrants, so 291.1 million were born here. Is it reasonable to assume that 291.1 out of the 335.9 million people born since 1933 have survived so far? I have absolutely no idea, I’m not a professional census taker
Well, I think this is twice in the same thread where my intuition was considerably off base. Lesson learned, I suppose.
Said this elsewhere, but wanted to be sure you had the chance to see the linked material. The Social Security FAQ page (Q 20, specifically) says that they do not do re-use of old SSNs when people die.
Just read that, and it says they’ve only issued 453 million numbers so far. Huh. I really thought it would’ve been a lot more than that.
I don’t want to come off as a bot spamming this in a bunch of different comment threads, but The Social Security FAQ page (Q 20, specifically) says that they do not do re-use of old SSNs when people die.
Billionaires are stealing our dollars, tax or otherwise.
It’s more than just SQL. Social Security Numbers can be re-used over time. It is not a unique identifier by itself.
i’ve heard conflicting reports on this, i have no idea to what degree this is true, but i would be cautious about making this statement unless you demonstrate it somehow.
As read on wikipedia ( https://en.wikipedia.org/wiki/Social_Security_number ) the format only allows +/- 100k numbers per area code ( which is also limited to 999 codes? ), so over time you are forced to reuse some codes. In total the format allows 99m unique codes, and the us currently has 334mil people sooooo :')
On June 25, 2011, the Social Security Administration changed the SSN assignment process to “SSN randomization”,[36] which did the following:
The Social Security Administration does not reuse Social Security numbers. It has issued over 450 million since the start of the program, about 5.5 million per year. It says it has enough to last several generations without reuse and without changing the number of digits. https://www.ssa.gov/history/hfaq.html
evidently they must be doing something else on the backend for this to be working, assuming there are quite literally 100M numbers, which is going to be static due to math, obviously, but they clearly can’t be reassigning numbers to 3 people on average at any given time, without some sort of external mechanism.
There are approximately 420 million numbers available for assignment.
https://www.ssa.gov/employer/randomization.html
that certainly doesnt seem like it would support several generations, possibly at our current birth rate i suppose.
DDG AI bullshit tells me that there are a billion codes. https://www.marketplace.org/2023/03/10/will-we-ever-run-out-of-social-security-numbers/ this article says it’s 1 billion
https://www.ssn-verify.com/how-many-ssns
this website also lists it as approximately 1 billion.
I think i see the change. They are mentioning the ssn is 9 numbers long, which is 1 longer than the 3-3-2 format wikipedia mentions. That does mean its around 999mil numbers, which ye allows for a few generations ( like, 1 or 2 lol )
It’s an insanely idiotic thing to say. Federal government IT is myriad, and done at a per agency level. Any relational database system, which the federal government uses plenty of, uses SQL in one way or another. Elon doesn’t know what he is talking about at all, and is being an ultimate idiot about this. Even in the context of mainframe projects thatif we are giving elong the benefit of doubt about referring to, most COBOL shoprbibknow have adapted to addressing internal data records using an SQL interface, although obviously in that legacy world it is insanely fractured and arcane.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Another commentor pointed out a legitimate use case, but it’s not even worth thinking about that much. De-duplocated is usually a word you use in data science to talk aboutakong sure your dataset is “hygienic” and that you aren’t duplicating data points. A database is much different because it is less about representing data, and more about storing it in a way that allows you to perform transactions at scale - retrieval, storage, modification, etc. Relational databases are analyzed in terms of data cardinality which essentially describes tradeoffs in representation between speed of retrieval (duplications good) vs storage efficiency (duplications bad).
The issue is that Elon is so vague and so off the mark that it is very hard to believe that he even has the first clue about what he is a talking about. Even you are confused just by reading it. It is all a tactic to convince others that he is smarter than he is while doing extreme damage to the hardworking people that actually make this stuff possible. Have you noticed that the man has never come to a conclusion that wasn’t in his interests? This is not honest intellectualism, or discussion based on technical merit. It’s self serving propaganda.
Well, if someone changes their name you’d add a new record with the same SSN to hold their new name, that way it keeps the records consistent with the paperwork; old papers say their old name and reference the retired record, new papers use their new name and reference the new record.
You can use the SSN as the key to find all records associated with a person, it doesn’t have to be a single row per SSN, in fact that would make the data harder to manage and less accurate.
E.g. if someone changes their last name after getting married, it could be useful to be able to have their current and former name in the database for reference.
Musk’s statement about the government not using SQL is false. I worked for FEMA for fourteen years, a decade of which was as a Reports Analyst. I wrote Oracle SQL+ code to pull data from a database and put it into spreadsheets. I know, I know. You’re shocked that Elon Musk is wrong. Please remain calm.
I work for a crown corp in Canada we have, off the top of my head, about 800 MSSQL, Oracle, MySQL/MariaDB, Postgres databases across the org (I manage our CMDB). Musk is a retard. The world runs on SQL.
He wouldn’t know this though because he’s a techbro that builds apps with MongoDB b cause he doesn’t understand what normalizing data is and why SQL is the best option for 99.9999999% of applications.
Fucking idiots.
As a former DOD contractor I can also confirm we built whole platforms that use Oracle (shudder) SQL
Elmo Susk surely thinks they store everything on excel.
Pandas and Pickle man….
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Elon Musk is the walking talking embodiment of the Dunning-Kruger effect.
100%
What’s fascinating is you can take pretty much ANY topic, beside scamming at scale because there he truly is a master, you have some knowledge about and see very fast that he has no fucking clue. From engineering to video game, the guy has no idea. Sure his entourage, paid or not, might actually be World expert about said topic, but not him. So obvious.
The US government pays lots of money to Oracle to use their database. And it’s not for BerkleyDB either. (Poor sleepy cat). Oracle provides them support for their relational databases… and those databases use… SQL.
Now if Musk tries to end the Oracle contracts, then Oracle’s lawyers will go after his lawyers and I’m a gonna get me some popcorn. (But we all know that won’t happen in any timeline… Elon gotta keep Larry happy.)
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database
formally, changing the identity of someone would have a very explicit reason to keep a “duplicate” ssn entry, if purely for historical reasons for example. I’m sure there are a myriad of technical reasons to be doing this.
Because everyone hates IPv6?
Why not reuse SSN that are no longer are in service for whatever reason?
He gonna write everything in Pandas. Who the fuck needs to pay hundreds of millions a year to Oracle. (And I bet thats really how much they pay Oracle)
Also, ohh boy Oracle’s layers… those you dont wanna mess with.
139 comments and no one addresses his use of a slur.
Because that’s really just to be expected at this point, and what his audience would want…
Better to focus on constantly poking at him for being dumb, which he and his fans hate, rather than give them what they want, ie being upset at their hateful language
it seems that nobody really cares about the word retard anymore, it’s quite funny how it went from super common language, to being less common, to people just saying it again now.
I’m curious how many people actually consider the word a slur, and how many people even care these days.
The ignorance of Elon is truly concerning, but somehow the worst part to me is Elon calling someone a retard for pointing that out.
Ableist, racist white supremacist doing their ableist-racist-white-supremacist thing.
He called a rescuer a pedophile for trying to rescue children…
After his own offer to rescue the children was turned down