Interesting point, this does appear to be Linux and situation specific though. I...

bifrost · on Aug 15, 2021

Its going to take me a bit to generate several million files but so far I've got a single directory with 550k files in it, it takes 30s to ls it on a very busy system running FreeBSD.

1.1M files -> 120 seconds

1.8M files -> 270 seconds (this could be related to system load being over 90 heh)

loeg · on Aug 15, 2021

Try "ls -f" (don't sort)?

Which filesystem you use will also make a big difference here. You could imagine some filesystem that uses the getdirentries(2) binary format for dirents, and that could literally memcpy cached directory pages for a syscall. In FreeBSD, UFS gets somewhat close, but 'struct direct' differs from the ABI 'struct dirent'. And the FS attempts to validate the disk format, too.

FWIW, FreeBSD uses 4kB (x86 system page size) where glibc uses 32kB in this article[1]. To the extent libc is actually the problem (I'm not confident of that yet based on the article), this will be worse than glibc's larger buffer.

[1]: https://github.com/freebsd/freebsd-src/blob/main/lib/libc/ge...

bifrost · on Aug 15, 2021

with "ls -f" on 1.9M files its 45 seconds, much better than regular ls (and system load of 94)

2.25M and its 60 seconds

I'm also spamming about 16-18 thousand new files per second to disk using a very inefficient set of csh scripts...

scottlamb · on Aug 15, 2021

A more efficient one-liner:

    seq 1 8000000 | xargs touch

bifrost · on Aug 16, 2021

Interestingly, that didn't work.

Strangely "jot -w file 8000000 1 | xargs touch" worked.

I'm going to try to replicate this because I sense some kind of tomfoolery.

scottlamb · on Aug 16, 2021

The command I quoted works verbatim on one of my Ubuntu systems. It's ~60X faster than eg "for i in $(seq 1 8000000); do touch $i; done" because it creates many files per fork+exec, and fork+exec is a much heavier operation than creating an empty file.

loeg · on Aug 16, 2021

Seq isn’t installed by default in FreeBSD, while jot is. That’s all. I think seq may be part of coreutils and installed as ‘gseq’? Not entirely sure.

bifrost · on Aug 18, 2021

seq is in /usr/bin, its default in FreeBSD as on FreeBSD 9.

bifrost · on Aug 18, 2021

I'm actually not sure why it failed after generating 1.3M files, no error messages or anything, it was weird. Initially I thought maybe it was like an inode/fd issue or something but no.

bifrost · on Aug 16, 2021

ok, unloaded system, 12M files. Using old SATA 300GB Raptor disks that I had sitting around. Fairly old E5-2650 CPUs clocked at 1.5Ghz because of power usage, this is single core performance.

"ls" is using 2.5GB of ram, 76 seconds.

"ls -f" is using 2.4GB of ram, 18 seconds.

"ls -mf" uses like 2.4GB of ram, 20 seconds.

For those who say "cache!", no, I pre-warmed the cache and this is the result after that.

There are a few other things that could be related since the original article was about a VM. The VM is going to be affected by SPECTRE/Meltdown patches, a known performance thief. I've got them enabled on this box but I'll disable them shortly and re-test. Also my test box has 64GB of ram and is running FreeBSD 13 with ZFS. I get about 150MB/sec and 1100IOPs with the spinning rust drives.

Update: Turned off Spectre/Meltdown patches, "ls -f" takes 17 seconds, "ls -mf" takes 18 seconds, "ls" takes 70 seconds.

ipaddr · on Aug 15, 2021

At 3,000 my windows 7 os freezes. Not bad for a million.

ygra · on Aug 15, 2021

You may want to disable short name generation on windows when putting many files on one directory.

bifrost · on Aug 16, 2021

Non-Loaded system with 2.8M files -> 13 seconds