The last thing a file system based database needs is file permissions and all the other weird stuff that goes on in a file system. They’re really complicated for the functionality they support. But a file system is always available to software so it seems intuitive and simple. I just wish computers didn’t rely on them so much.
I’m not sure what you’re proposing, the unix philosophy is really valuable for like enabling all modern computing (even as it’s less followed we only got this far because of it) and if we stop storing data as strings in file systems basically all interoperability fails as APIs are too bespoke.
You can’t or shouldn’t keep a lot of data in memory, serialised data stored in DBs is super hard to handle and often less performant with transcoding for manipulation. What do you propose instead? Models like what android apps use still used file systems and are basically the worst of both worlds, being horribly anti user.
File systems are mad intuitive, look at a book shelf, you’re 90% there. What else do you propose?
A lot of storage these days is in object storage like S3. There’s no partial writes, file permissions, extended metadata (it might have this actually), or directories. It just does what you want most things to do: stores a bit of data that can be retrieved later using some key. You could imagine storing a shared library or or executable just about anything in that or a similar fashion. In fact, I’m using Fedora Silverblue, which is a Linux distro that stores most of its root filesystem in a git-like content-addressable ostree repository (potentially even stored in object storage on the remote update servers). The filesystem is just a formality because that’s what software is currently written to understand.
And even document storage is stored this way in cloud services. Google Docs and Office 365 are presumably just storing their documents as a group of objects. A Word document is a zip file containing XML files and other assets. In a cloud environment, that can be exploded into separate objects for faster updating.
Filesystems don’t even support atomic transactions other than file renames. Windows NTFS used to have this but it was buggy and now basically unsupported. The performance characteristics of filesystems are also really unpredictable (when is your flash storage going to do a garbage collection?) and they don’t have a whole lot of features for doing performance-sensitive things outside of escaping the filesystem altogether. Relational databases just preallocate larger files and then use filesystem-specific operations to avoid a bunch of churn in the write-ahead logs and such. Since they have their own durability mechanisms.
I think a virtual-memory like addressing system could also work for persistent storage. You basically keep all the bookkeeping stuff that filesystems do to keep track of free space, but without the variable-size files with extents and permissions and all those shenanigans.
I would not really hold up s3 style object storage as particularly useable. Discoverability is arse, name collisions are easy, listing contents is extremely inefficient, updating access control is a fucking nightmare.
I don’t think the demands of distributed computing should necessarily inform design decisions for single user or mainframe computing, I agree that performance is unpredictable but that’s why you don’t rely on hammering the disk for performance. The underlying hardware is always going to be a bit unpredictable anyway, if you want stuff like always having atomic transactions you are going to be doing a lot of copying, that wears out hardware and is just unnecessary for a lot of applications.
I’m not saying they’re always good, just that saying they’re (always) a bad abstraction is wrong.
The last thing a file system based database needs is file permissions and all the other weird stuff that goes on in a file system. They’re really complicated for the functionality they support. But a file system is always available to software so it seems intuitive and simple. I just wish computers didn’t rely on them so much.
I’m not sure what you’re proposing, the unix philosophy is really valuable for like enabling all modern computing (even as it’s less followed we only got this far because of it) and if we stop storing data as strings in file systems basically all interoperability fails as APIs are too bespoke.
You can’t or shouldn’t keep a lot of data in memory, serialised data stored in DBs is super hard to handle and often less performant with transcoding for manipulation. What do you propose instead? Models like what android apps use still used file systems and are basically the worst of both worlds, being horribly anti user.
File systems are mad intuitive, look at a book shelf, you’re 90% there. What else do you propose?
A lot of storage these days is in object storage like S3. There’s no partial writes, file permissions, extended metadata (it might have this actually), or directories. It just does what you want most things to do: stores a bit of data that can be retrieved later using some key. You could imagine storing a shared library or or executable just about anything in that or a similar fashion. In fact, I’m using Fedora Silverblue, which is a Linux distro that stores most of its root filesystem in a git-like content-addressable ostree repository (potentially even stored in object storage on the remote update servers). The filesystem is just a formality because that’s what software is currently written to understand.
And even document storage is stored this way in cloud services. Google Docs and Office 365 are presumably just storing their documents as a group of objects. A Word document is a zip file containing XML files and other assets. In a cloud environment, that can be exploded into separate objects for faster updating.
Filesystems don’t even support atomic transactions other than file renames. Windows NTFS used to have this but it was buggy and now basically unsupported. The performance characteristics of filesystems are also really unpredictable (when is your flash storage going to do a garbage collection?) and they don’t have a whole lot of features for doing performance-sensitive things outside of escaping the filesystem altogether. Relational databases just preallocate larger files and then use filesystem-specific operations to avoid a bunch of churn in the write-ahead logs and such. Since they have their own durability mechanisms.
I think a virtual-memory like addressing system could also work for persistent storage. You basically keep all the bookkeeping stuff that filesystems do to keep track of free space, but without the variable-size files with extents and permissions and all those shenanigans.
I would not really hold up s3 style object storage as particularly useable. Discoverability is arse, name collisions are easy, listing contents is extremely inefficient, updating access control is a fucking nightmare.
I don’t think the demands of distributed computing should necessarily inform design decisions for single user or mainframe computing, I agree that performance is unpredictable but that’s why you don’t rely on hammering the disk for performance. The underlying hardware is always going to be a bit unpredictable anyway, if you want stuff like always having atomic transactions you are going to be doing a lot of copying, that wears out hardware and is just unnecessary for a lot of applications.
I’m not saying they’re always good, just that saying they’re (always) a bad abstraction is wrong.