It's hard to shrink the Master File Table (MFT) on Windows/NTFS. Each file or di...

lisper · on Aug 16, 2021

The real problem here is that the unix file system is kinda sorta like a database but not really. So people try to use it like a database and it kinda sorta works, but not really.

Someone ought to write a clean-sheet OS with an embedded copy of SQLite built in to the kernel. That would kick some serious tushy.

smallstepforman · on Aug 16, 2021

BeOS has the Be File System, a light database on top of a filesystem, and you can query to your hearts content.

https://web.archive.org/web/20170213221835/http://www.nobius...

lmm · on Aug 16, 2021

You can deploy your application as a unikernel and if you don't need a filesystem then you don't have to include one. I really think that's the future.

xxpor · on Aug 16, 2021

That didn't go very well for MS back in the mid 2000s

lisper · on Aug 16, 2021

Huh? What are you referring to?

croes · on Aug 16, 2021

Probably WinFS

https://en.m.wikipedia.org/wiki/WinFS

lisper · on Aug 16, 2021

OK, yeah, that's not quite the same as what I'm suggesting. WinFS was intended to be used at the application level. I'm talking about using SQLite (or something like that) to store filesystem metadata, more like the resource fork in the original MacOS, except that the resource fork was per-file and what I'm suggesting here would use the embedded DB to store directories (in addition to per-file metadata). The schemas would be part of the OS design. Applications would not be able to modify them or add new ones.

Spooky23 · on Aug 16, 2021

There are other considerations with Windows as well - if you’re using SMB, large file counts in a directory will create performance issues with SMB shares.

magicalhippo · on Aug 16, 2021

Maybe you're referring to something else, but what I noticed was due to case sensitivity.

Turning off case sensitivity lead to orders of magnitude difference in directory performance, and since most applications just use what they get from the system for filenames, there's very few problems in practice.

My main machines are all Windows and I've been running my NAS with case sensitivity off for almost a decade now, and only a few times did I have to manually rename some file through the NAS (two files with same name but different case). I use my NAS actively for a lot of things, including sharing files across my machines.

southerntofu · on Aug 16, 2021

> Turning off case sensitivity lead to orders of magnitude difference in directory performance

What filesystem are you using? I assume case-insensitivity means your filesystem does not support UTF-8 filenames. Is that the case?

zaarn · on Aug 16, 2021

No, case insensitive just means that the filesystem considers uppercase letters and lowercase letters to be the same. You have to be Unicode (or character set) aware for that.

You can set ZFS and in newer Linux Kernels for a filesystem to be case insensitive, and neither really cares about UTF8 to begin with as long as the filename contains no NUL characters.

Windows only requires the filename to be somewhat valid UCS-2 (ie, UTF-16 with the safeties off) on NTFS, FAT does the same for ASCII (though nothing stops a kernel from putting UTF8 in a FAT filename.

southerntofu · on Aug 16, 2021

> No, case insensitive just means that the filesystem considers uppercase letters and lowercase letters to be the same. You have to be Unicode (or character set) aware for that.

I assumed in your case that meant ASCII encoding, but i still don't understand how turning off case sensitivity would speed up things. Was that a typo, or am i missing something?

> nothing stops a kernel from putting UTF8 in a FAT filename

Except interoperability with other systems who may access this filesystem of course. Thanks for this explanation.

magicalhippo · on Aug 16, 2021

> i still don't understand how turning off case sensitivity would speed up things.

Maybe it's very Samba specific, but when doing a directory listing on a case-insensitive SMB share it parses the entire directory first before returning the first result.

I forgot the exact reason why, maybe to resolve conflicts, say "foo.txt" and "FOO.txt" can only have one entry.

Regardless, the result is a massive initial delay. Marking the share as case-sensitive skips all this. Worked wonders for me anyway.

Also, when creating "Foo.txt" on a case-insensitive share, it has to check that it doesn't collide with any existing file with different case, etc.

southerntofu · on Aug 17, 2021

That's also my understanding, but your initial comment suggested that turning off case-sensitivity sped things up. Sorry for the misunderstanding!