Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
You can list a directory containing 8M files, but not with ls (be-n.com)
172 points by _wldu on Aug 15, 2021 | hide | past | favorite | 128 comments


If you are going to have a directory with millions of files, probably there's one more interesting thing to consider.

As you might know ext* and some other FSs store filenames right in the directory file. Means the more files you have in the directory, the bigger directory size gets. In majority of cases nothing unusual happens, cause people have maybe a few dozens of dirs / files.

However if you'll put millions of files, then directory size grows up to a few megabytes in size. If you decide to clean up later, you'd probably expect the directory size to shrink. But it never happens. Unless you run fsck or re-create a directory.

That's because nobody believes the implementation effort really worth it. Here's a link to lkml discussion: https://lkml.org/lkml/2009/5/15/146

PS. Here's a previous discussion of the very same article posted in this submission. It's been 10 years already :) https://news.ycombinator.com/item?id=2888820

upd. Here's a code example:

$ mkdir niceDir && cd niceDir

# this might take a few moments

$ for ((i=1;i<133700;i++)); do touch long_long_looong_man_sakeru_$i ; done

$ ls -lhd .

drwxr-xr-x 2 user user 8.1M Aug 2 13:37 .

$ find . -type f -delete

$ ls -l

total 0

$ ls -lhd .

drwxr-xr-x 2 user user 8.1M Aug 2 13:37 .


Interesting - a while ago I had a runaway script create files in a directory until the system ran out of inodes. I thought I had cleaned up after myself, and while that directory only has a hundred or so files in it now, the directory itself is still 154 megs. Wild.

Fixing it by just creating a new dir and copying the files over is, thankfully, relatively straight-forward in my case :)


Generational garbage collection for the win!


Sounds more like a plain old stop-and-copy/two space GC... Generational would be if they built a chain of directories to represent one (like with an overlay filesystem of some kind) and then periodically GC'd those at different rates, moving the persistent entries into less and less frequently scanned dirs.


I haven't done any low Unix level filesystem stuff since the System V days. My recollection is that back then directories were just files that contained name to inode mappings. All the magic that made them directories was in how the system interpreted the data in them and in the access rules the system imposed on them.

Allocating space for a directory was the same as allocating space for any other file, and the same for shrinking a directory.

It sounds like the ext family of Linux filesystems didn't work that way, with directory storage managed in some manner completely separate from file storage. If that is the case I'm curious why.


IIRC it does work that way, but deleting a file just zeros the entry for the file, so the file never shrinks


This issue is what makes me paranoid when putting a lot of files into a directory. Having directories whose sizes cannot be shrunk makes me really uncomfortable, so I try to avoid such a situation as much as possible. What's unfortunate is that you cannot predict at what point a directory will grow above its initial size, because the size of a directory is affected by not only the number of files, but also the varying lengths of their filenames. It's a complete mess.

I wondered if NTFS on Windows is also not capable of shrinking already-grown directories but could not find information about it. As Windows doesn't report the size of a directory itself, it is hard to test the situation.


It's hard to shrink the Master File Table (MFT) on Windows/NTFS. Each file or directory consumes 1 KiB in the MFT. So this is similar to the ext* directory-shrinking problem, but at the level of the entire volume instead.


The real problem here is that the unix file system is kinda sorta like a database but not really. So people try to use it like a database and it kinda sorta works, but not really.

Someone ought to write a clean-sheet OS with an embedded copy of SQLite built in to the kernel. That would kick some serious tushy.


BeOS has the Be File System, a light database on top of a filesystem, and you can query to your hearts content.

https://web.archive.org/web/20170213221835/http://www.nobius...


You can deploy your application as a unikernel and if you don't need a filesystem then you don't have to include one. I really think that's the future.


That didn't go very well for MS back in the mid 2000s


Huh? What are you referring to?



OK, yeah, that's not quite the same as what I'm suggesting. WinFS was intended to be used at the application level. I'm talking about using SQLite (or something like that) to store filesystem metadata, more like the resource fork in the original MacOS, except that the resource fork was per-file and what I'm suggesting here would use the embedded DB to store directories (in addition to per-file metadata). The schemas would be part of the OS design. Applications would not be able to modify them or add new ones.


There are other considerations with Windows as well - if you’re using SMB, large file counts in a directory will create performance issues with SMB shares.


Maybe you're referring to something else, but what I noticed was due to case sensitivity.

Turning off case sensitivity lead to orders of magnitude difference in directory performance, and since most applications just use what they get from the system for filenames, there's very few problems in practice.

My main machines are all Windows and I've been running my NAS with case sensitivity off for almost a decade now, and only a few times did I have to manually rename some file through the NAS (two files with same name but different case). I use my NAS actively for a lot of things, including sharing files across my machines.


> Turning off case sensitivity lead to orders of magnitude difference in directory performance

What filesystem are you using? I assume case-insensitivity means your filesystem does not support UTF-8 filenames. Is that the case?


No, case insensitive just means that the filesystem considers uppercase letters and lowercase letters to be the same. You have to be Unicode (or character set) aware for that.

You can set ZFS and in newer Linux Kernels for a filesystem to be case insensitive, and neither really cares about UTF8 to begin with as long as the filename contains no NUL characters.

Windows only requires the filename to be somewhat valid UCS-2 (ie, UTF-16 with the safeties off) on NTFS, FAT does the same for ASCII (though nothing stops a kernel from putting UTF8 in a FAT filename.


> No, case insensitive just means that the filesystem considers uppercase letters and lowercase letters to be the same. You have to be Unicode (or character set) aware for that.

I assumed in your case that meant ASCII encoding, but i still don't understand how turning off case sensitivity would speed up things. Was that a typo, or am i missing something?

> nothing stops a kernel from putting UTF8 in a FAT filename

Except interoperability with other systems who may access this filesystem of course. Thanks for this explanation.


> i still don't understand how turning off case sensitivity would speed up things.

Maybe it's very Samba specific, but when doing a directory listing on a case-insensitive SMB share it parses the entire directory first before returning the first result.

I forgot the exact reason why, maybe to resolve conflicts, say "foo.txt" and "FOO.txt" can only have one entry.

Regardless, the result is a massive initial delay. Marking the share as case-sensitive skips all this. Worked wonders for me anyway.

Also, when creating "Foo.txt" on a case-insensitive share, it has to check that it doesn't collide with any existing file with different case, etc.


That's also my understanding, but your initial comment suggested that turning off case-sensitivity sped things up. Sorry for the misunderstanding!


Mlocate can also be used for this so long as the updatedb index config contains the path of interest. Just run “updatedb” then run “locate /full/path/of/interest > file”


It’s worth noting that the last problem is Linux-specific. In the BSD world, directories are “shrunk” when you create a new file or subdirectory; you don’t need to recreate the directory. (They are not compacted though, so if the last directory entry remains, the directory size won’t shrink.)


> if the last directory entry remains, the directory size won’t shrink

So if you keep appending data the directory will never be shrinked? If so i would say "the BSD world" is also affected by this problem, in practice. I'm assuming it all depends on the specifics of your filesystem, though.


If you keep appending the directory obviously won’t shrink because those entries need to fit somewhere - to shrink you’d need to remove stuff.

Let me rephrase, though. In UFS/FFS, the kernel will try to truncate the directory size every time you add a new entry. However, it will only cut off the free space at the end, it won’t punch a hole inside. This means that if you remove a bunch of files/subdirectories, but leave the one occupying the last (highest numbered) directory entry, that directory won’t shrink. It will shrink after you remove that last one.


Yeah that's what i mean, that for a sort of append-only cache where a garbage collector kicks in to remove old entries (<-- the part i forgot to mention), the directory size will in fact grow forever.

I must say that came as a surprise to me (i've used this pattern more than once, but never with millions of files), so thanks for telling me!


Can you mv niceDir/* newNiceDir/ && rm -fr niceDir to reset the directory size?


I'd probably

1) hardlink all the files from the old directory into a temporary new directory, and

2) rename the new directory to the old directory's name.

I'm no Unix guru but I believe that this should be completely unintrusive to even running processes, or at least less intrusive than your approach. Also failing in the middle seems more graceful.


> completely unintrusive to even running processes

Unless between the time that you create the hardlinks and the time that you delete the old directory, a process creates new files in the old directory. Unlikely of course, but not impossible.


Yep, I've already thought of it in the several minutes after posting it. If it were possible to make a hardlink to a directory, my solution for a directory with a varying set of files would be:

1) hardlink all the files from the old directory ("foobar") into a temporary new directory ("foobar_tmp1"),

2) create a hardlink ("foobar_tmp2") to the old directory ("foobar"),

3) rename the new directory ("foobar_tmp1") to the old directory's name ("foobar"),

4) patch the "old" directory ("foobar" now linking to the shrunk directory) with the new files in "foobar_tmp2",

5) get rid of "foobar_tmp2".

Still doesn't take care of all possible issues such as a program creating a file in "foobar" between steps 2) and 3) and reopening it very quickly after step 3), but at least it seems to avoid the issue of lost files.

Sadly step 2) seems impossible. I still believe the original version should be transparent for a directory with a fixed set of files, but of course very large directories probably have a higher chance on average of their set of files changing in a short period of time. That may or may not be a problem.

EDIT: Maybe a mandatory lock on the directory around the original step 2) with an additional check whether the directory's file set didn't change would work on Linux, if such a thing works the way I think it does (quite possibly it doesn't).


I assume that if you could create hardlinks to directories, it would be to the same directory file and thereby point to the same garbage you want to get rid of. Don’t do that and you have a symlink.


No, the point is to preserve the original set of files to account for lost file creations after the directory rename. What point would a symlink achieve? The original directory (the i-node, NOT the entry in the parent directory) with the new, unaccounted-for files would be gone and the symlink would point to a directory with only those files that would have been already preserved by the rename. So pointing "to the same old garbage" is exactly the point.


You can't hardlink directories on Linux.


They seem well aware of that :p


You can't rename over a non-empty directory, so you can't do this atomically without using a symlink.


Oh crap, I've never noticed that (or maybe I did many years ago and promptly ignored that because it wasn't what I was looking for at that time). Well, there goes the whole idea. Symlinks are of no use here, or at least I don't see how they help.


If you know in advance that you'll need to switch a directory atomically, you can make `dir` a symlink to `dir_v1`. Then you can atomically replace it with a symlink to `dir_v2`.

This doesn't really solve the situation we're discussing though.


Wouldn't you need a step 1.5: rm -rf <old directory>


Well, apparently I would, which mildly sucks because it takes too much time that I wanted to avoid completely by the ole' atomic rename technique. Worst case, at least one could possibly rename the old directory to a temporary name and then immediately rename the new directory with the new file links to the old name and hope for the best. Sadly the atomic technique doesn't seem to work here. I never noticed that I can't do this with a directory. Well, shit.


On top of what others said, creating a copy of an existing directory isn’t trivial. You have to copy

  - owner and group,
  - rwx flags,
  - OS-specific flags such as the FreeBSD immutable ones,
  - ACLs,
  - extended file attributes
  - possibly stuff I forgot
Some of these you should set before copying in the files for security reasons, some of these you can only set after doing that.

If the directory isn’t writable, for example, you can’t simply create it writable, copy in the files, and make the directory read only, as that opens a hole where others can write files to the directory.

I think the only safe way is to create a directory that’s only readable and writable by you, copy in the files, then set the attributes right.


Is this an issue on ZFS?


Way way back in 2005, before ZFS shipped, I actually checked this. Solaris' ls(1) with the `-f` option was able to list a directory with millions of files very quickly. But before I tried the `-f` flag I tried plain `ls`, and that failed miserably because a) ls(1) wants to sort the listing, which means reading the whole directory into memory then sorting it, b) I was using en_US.UTF-8 as my locale, and sorting in UTF-8 locales was way slower than sorting in the C locale.

Any time you're dealing with such massive directories, you really must use the `-f` option to ls(1).


This is a practical issue that can cause real problems. I was looking after an AWS-hosted web application where an unimportant plugin was stuck in a crash-restart loop for weeks. It had created something like tens of millions of tiny log files in the same directory over that period.

It turns out that cleaning this up is decidedly non-trivial. Worse still, this was "managed" cloud storage with strict IOPS limits. The "one op per file delete" model seems logical until you're staring down the barrel of multi-week completion times.

I tried various things, but in the end the only practical solution was to copy the valid data across to a new volume and drop the old one. There was almost no reasonable way to delete that many files. As in this article, even the most basic command line tools like "ls" were useless!

You know you're in a bad spot when you're using "kill" on "ls" multiple times per hour...


> Worse still, this was "managed" cloud storage with strict IOPS limits.

It almost sounds like the cloud was someone else's computer all along, and we don't have any control over it. Thanks for sharing this anecdote!


Meta, if the author is around: there seems to be some kind of encoding problem, on my system (Linux, Firefox) I see a lot of strange characters where there should probably be punctuation.

The first section header reads "Why doesn’t ls work?", for instance.


This is because the page has no doctype, thus putting the browser in "quirks mode", defaulting to a charset of ISO-8859-1 (as the page does not specify one). The author can fix this either by specifying the charset, or adding the HTML5 doctype (HTML5 defaults to UTF-8).


> HTML5 defaults to UTF-8

I’m not sure this is correct, though the WHATWG docs[1] are kind of confusing. From what I can tell, it seems like HTML5 documents are required to be UTF-8, but also this is required to be explicitly declared either in the Content-Type header, a leading BOM, or a <meta> tag in the first 1024 bytes of the file. Reading this blog post[2] it sounds like there is a danger that if you don’t do this then heuristics will kick in and try to guess the charset instead; the documented algorithm for this doesn't seem to consider the doctype at all.

[1]: https://html.spec.whatwg.org/dev/semantics.html#charset

[2]: https://blog.whatwg.org/the-road-to-html-5-character-encodin...


Oddly if you select "automatic" then Firefox correctly detects it as UTF-8. I would have expected quirks mode to use automatic rather than ISO-8859-1.


The single quot for doesn't is an ASCII character though, why does that one become ’?


Here's the heuridtic-based hypothesis of the python package ftfy

    >>> ftfy.fix_and_explain("’")

    ExplainedText(
        text="'",
        explanation=[
            ('encode', 'sloppy-windows-1252'),
            ('decode', 'utf-8'), 
            ('apply', 'uncurl_quotes')
        ]
    )


Note that uncurl_quotes is a FTFY fix unrelated to character encoding, it’s basically just s/’/'/. (FTFY turns all of its fixes on by default, which sometimes results in it doing more than you might want it to.)

You can play around with FTFY here (open the “Decoding steps” to see the explanation of what it did and why): https://www.linestarve.com/tools/mojibake/?mojibake=’


Thanks!


It's not. It's a Unicode 'RIGHT SINGLE QUOTATION MARK' (U+2019), which in UTF-8 is encoded as 0xe2 0x80 0x99.

0xe2 is â in iso8859-1. 0x80 is not in iso8859-1, but is € in windows-1252. 0x99 is not in iso8859-1, but is ™ in windows-1252.

So, the browser here appears to be defaulting to windows-1252.


Maybe browsers should default to UTF-8 already. It's 2021.


Why? Defaulting to UTF-8 for modern HTML, and to ISO-8859-1 for legacy pages, makes a lot of sense.

Pages that haven‘t been adapted to HTML 5 in the last 10 years or so are exceedingly unlikely to do so in year 11.


ISO-8859-1 is a subset of UTF-8 isn't it? No harm done by defaulting to the superset.


No. ASCII is a subset of UTF-8, ISO-8859-1 is not. The first 256 codepoints of unicode match ISO-8859-1, which is probably the source of your confusion. However Codepoints 128-255 are encoded differently in UTF-8. They are represented by a single byte when encoded as ISO-8859-1, while they turn into two bytes encoded in UTF-8.

Plus "ISO-8859-1" is treated as Windows-1252 by browsers, while unicode uses ISO-8859-1 extended with the ISO 6429 control characters for its initial 256 codepoints.


Ah I see, thanks.


If it were, the characters in question would already display correctly for this website, since they are within ISO-8859-1. ASCII is a subset of UTF-8.


We need to handle a lot of crappy data-in-text-files at work, and for most of them using the UTF-8 duck test seems to be the most reliable.

If it decodes successfully as UTF-8 it's probably UTF-8.


That requires scanning the whole file before guessing the encoding, which browsers don’t do for performance reasons (and also because an HTML document may never end, it’s perfectly valid for the server to keep appending to the document indefinitely). The HTML5 spec does recommend doing this on the first 1024 bytes, though.


Browsers are quite happy on re-rendering the whole document multiple times though, so it could just switch and re-decode when UTF-8 fails. Sure it wouldn't be the fast path, but sure beats looking at hieroglyphs.

And yeah, add some sensible limits to this logic of course. Most web pages aren't never-ending nor multi-GB of text.


The “sensible limit” is 1024. That’s what the standard is saying.


Not really. I can write a lot of Norwegian text, especially with HTML overhead of scripts, inline CSS and whatnot, before needing a non-ASCII Norwegian character.

Also the only reference I find to this 1024 limit is for the charset meta element[1] in HTML5, which also seems highly redundant given that the very same page states UTF-8 must be used regardless.

Limits are good, but 1k is way to low.

[1]: https://html.spec.whatwg.org/multipage/semantics.html#charse...


https://hsivonen.fi/utf-8-detection/ explains why Firefox doesn’t.


Use your browser to override the encoding. For example in Firefox choose "View > Repair Text Encoding" from the menu or in Safari choose "View > Text Encoding > Unicode (UTF-8)" from the menu. Many browsers still default to Latin 1, but this page is using UTF-8.

(This used to happen a lot ~15 years ago. Did the dominance of UTF-8 make people forget about these encoding issues?)


Chrome removed this setting altogether, I have an extension installed to set the decode charset from a menu.


Same here


I once had a directory on OpenZFS with more than a billion files, and after cleaning it up with only handful of folders remaining, running ls in it still took a few seconds. I guess some large but almost empty tree structure remained.

https://0kalmi.blogspot.com/2020/02/quick-moving-of-billion-...


tl;dr: try "ls -1 -f". It's fast.

This doesn't pass my smell test:

> Putting two and two together I could see that the reason it was taking forever to list the directory was because ls was reading the directory entries file 32K at a time, and the file was 513M. So it would take around 16416 system calls of getdents() to list the directory. That is a lot of calls, especially on a slow virtualized disk.

16,416 system calls is a little inefficient but not that noticeable on human terms. And the author is talking as if each one waits 10 ms for a disk head to move to the correct position. That's not true. The OS and drive both do readahead, and they're both quite effective. I recently tried to improve performance of a long-running sequential read on an otherwise-idle old-fashioned spinning disk by tuning the former ("sudo blockdev --setra 6144 /path/to/device"). I found it made no real difference: "iostat" showed OS-level readahead reduces the number of block operations (as expected) but also that total latency doesn't decrease. It turns out in this scenario the disk's cache is full of the upcoming bytes so those extra operations are super fast anyway.

The real reason "ls" takes a while to print stuff is that by default it will buffer everything before printing anything so that it can sort it and (when stdout is a terminal) place it into appropriately-sized columns. It also (depending on the options you are using) will stat every file, which obviously will dwarf the number of getdents calls and access the inodes (which are more scattered across the filesystem).

"ls -1 -f" disables both those behaviors. It's reasonably fast without changing the buffer size.

    moonfire-nvr@nuc:/media/14tb/sample$ time ls -1f | wc -l
    1042303

    real    0m0.934s
    user    0m0.403s
    sys     0m0.563s
That's on Linux with ext4.


Agree re smell test. Those directory blocks are cached, even in front of a slow virtualized disk, and most of those syscalls are hitting in cache. Author is likely running into (1) stat calls and (2) buffer and sort behavior, exactly as you describe.


Interesting, tried myself on a test VM

~/test$ time for I in `seq -w 1 1000000`; do touch $I; done real 27m8.663s user 14m15.410s sys 12m24.411s

OK

~/test$ time ls -1f | wc -l 1000002

real 0m0.604s user 0m0.180s sys 0m0.422s

~/test$ time ls -f | wc -l 1000002

real 0m0.574s

~/test$ time perl -E 'opendir(my $d,".");say while readdir $d' |wc -l 1000002

real 0m0.597s

All seems reasonable. Directory size alone is 23M, somewhat larger than the typical 4096 bytes.


Having worked in the past with some pathological codebase that made a horribly huge directory of source, I can confirm that this is unrelated to filesystem - otherwise opening a file would be also slowed down, and I have found exactly the same behaviour by disabling sorting and other output niceties.


did you try ls -1? in the far past I had the same problem listing millions of files in a dir edit: if I remember correctly ls bufferizes the results for sorting. with -1 it just dumps the values


It’s not “ls -1” but “--sort=none” or “-U”.


"ls -f" in POSIX ls (which GNU ls also implements). Also, avoid "-F", which will stat each file.


`ls | more` works too, right?


At least with GNU ls, 'ls | more' does not disable sorting. It disables automatic color (which is important -- coloring requires 'stat(2)'ing every file in the directory).


No - it still does all of the work to sort the entries, which is the slow part since it prevents the first entry from being displayed until the last has been retrieved.


I think it also tries to nearly format into columns and this requires it to know name lengths for all files. If you do -l it basically outputs one file per line and can be done more efficiently.


  find . 
would be another alternative


Makes me think that findfirst and findnext were not that bad after all.


I know the article is for Linux...

I recently was involved with a Windows filesystem driver. Windows lowest-level usermode APIs support iterating through directory contents for infinitely sized directories. (You just need to make sure whatever wrapper your language provides doesn't wrap the result in a pre-allocated array.)

For example, in C#, the System.IO.Directory static methods that return IEnumerables will let you iterate over millions of files without issue, at least on Windows. They probably will do the same on Linux and Mac.


Isn’t that comparing apples with oranges? Low-level APIs on both OSes can deal with very large directories — the question is whether the high-level APIs can.

(For that matter, streaming isn’t a panacea if your goal is, as the article states, to determine the number of files in the directory.)


Yes, the high-level APIs can. As I mentioned, "For example, in C#, the System.IO.Directory static methods that return IEnumerables will let you iterate over millions of files without issue, at least on Windows."

In C#, at that point counting the number of files in a giant directory is trivial. (I was wrong about the method being static) Basically, "new DirectoryInfo(path).EnumerateFiles().Count()" is all you need. You're going to hit the most efficient APIs on Windows, so all you're limited by is hardware speed. (It's also worth checking on Linux.)


This happened to me a couple months ago; I screwed up a logrotate entry on some of our servers (thankfully, mostly in the dev dc) and ended up with log.txt.0, log.txt.0.0, log.txt.0.0.0.0, &c &c. If your CWD was the directory with those files, anything you did that called readdir just hung. I ended up having to write a tiny program that looped getdents, and was surprised at how poorly documented/integrated it was with the rest of the API.


If you haven't prepared for this eventuality then odds are you're going to run out of inodes first. And it's probably not useful to just dump all those filenames to your terminal. And don't even say you were piping the output of `ls` to something else!

Anyway the coreutils shouldn't have arbitrary limits like this, at least if they do then the limits should be so high that you have to be really trying hard in order to reach them.


There isn't actually an arbitrary limit, it's just that glibc's readdir() implementation is really really slow with millions of files according to the article. Presumably if you waited awhile `ls` would eventually get the whole list.


it's not readdir() that is slow - it's the way ls needs to load the whole directory into memory then do a bunch of things on it (sorting, calculating column sizes, etc), and depending on what you have in your defaults or passed options it might also try to stat() every of those files first.


The glibc functions are just bad wrappers for the real system calls which no doubt work much more efficiently. I fully expected to find the system call solution in the article and was not disappointed.


I remember crazy bug that happened in our system because of lots of files in one directory.

We had a special directory for temporary files produced when generating labels for barcode printer, and as they were very small (and always caused problems) - we never removed them so we can check out later if something went wrong.

After a few years customer complained some barcodes doesn't print. The error message was "cannot create a file XYZ" or something similar.

I logged in, checked the disk usage (plenty of space), checked the privileges (correct), did a "touch random_file_name" in that directory - worked fine, copied a few files into this directory - still works fine. Then I did "touch XYZ" and it failed :) But the file XYZ didn't already existed in that directory.

I don't remember the exact case (I think it was problem with index growing too big and depending on filename it was assigned to different levels or sth similar). Or maybe it was a hash collision in the filesystem? I don't even remember what filesystem it was - probably reiserfs or ext3.

The solution was to remove files after printing them, obviously.


This is good to know. The other week, I ended up breaking a flat directory with approaching that number of files in it, into a hierarchy of subdirectories.

Fortunately, the filenames were timestamp-based (down to the nanosecond), so could be be solved in one line of `sed`, and the hierarchy made tons of sense.

Normally, especially knowing older filesystem limits, I would've done such a filesystem hierarchy from the start (even if something wasn't as inherently hierarchical as timestamp strings, like if it used hash strings).

But this wealth of files (originally a secondary QA/debugging check, now a potentially valuable training data set) was due to unexpected success of an MVP launch, and it held up perfectly in production. I only cleaned it up when saving that data offline the other week, so it would be convenient&correct to use with tools that might not scale as well as that Linux filesystem and GNU tools did.


seems the author didn't read the man page. ls -1f as others have pointed out is a much better than the solution.

Additionally, having 8 million anything in a single directory screams bad planning. It's common for some hashing if directory structure to be planned.


I'm sure I am just showing my ignorance, but in the man page it says:

"-1 list one file per line." and "-f do not sort, enable -aU, disable -ls --color"

It's not obvious to me that this would bypass the buffer and fix the problem of being able to list a large number of files. Just that I would get the files unsorted, one line at a time.


Because the buffer is not the problem.

ls is slow because it will first load everything to ram, then do a bunch of operations (and it's not really optimized on those, it seems, at least not for speed) and if you have certain options enabled it will also need to stat() each file first.

Been there, done that, learnt not to run ls without -1f in a directory that would result in half-hour run.


In the linked article it seems his conclusion was the issue was the small buffer size ls uses. But it's not the buffer itself so much as the things done to the data in the buffer that slow things down?


The transfer buffer is not a problem (if you get slowdown in actually reading the directory, the reason is probably the amount of data is huge and you have slow i/o).

The problem is when ls then tries to do all kinds of things after reading that data in order to sort, columnize, colorize, etc.

For example, just figuring out if you should add / after filename to indicate directory can, in worst case, require stat() call on every returned file - because there's no guarantee that file type will be filled in response from readdir()/getdents(). Sorting and displaying in columns is particularly slow. All sorts of things that are nice to have, cheap quality of life benefits, breakdown and become huge slowdowns with such insane directories.


As a programmer you should know that sorting generally requires everything to be buffered and therefore affects performance / stops streaming from working. Agreed that the stuff about knowing how many columns etc. is not obvious though.


I'm not a programmer ;)


https://github.com/hpc/mpifileutils handles this pretty well -- with SYS_getdents64. It has a few other tricks in there in addition to this one.


Just ran into this. I had the video app motion running for a few weeks and had accumuluated several million small jpegs and wasn't paying attention to the updater which had stopped pushing images. Basically my webserver was timing out because of how I was searching for new files because there were too many files and I was using a bash script in a cron job that used ls. I found a workaround but it is cool to see this discussion.

Kinda reminds me of MS Excel when it first started defaulting to 1048576 rows. You could ctrl-downArrow to the bottom row, but if you ever tried to fill 1M rows it would hang.


perl -E 'opendir(my $d,".");say while readdir $d'


Found the author's project history here:

http://be-n.com/portfolio/


I ran into this issue a couple years back, and while googling found this stackoverflow answer which I thought was pretty neat.

https://unix.stackexchange.com/questions/120077/the-ls-comma...


Interesting point, this does appear to be Linux and situation specific though.

Its interesting enough that I'm going to run my own test now.


Its going to take me a bit to generate several million files but so far I've got a single directory with 550k files in it, it takes 30s to ls it on a very busy system running FreeBSD.

1.1M files -> 120 seconds

1.8M files -> 270 seconds (this could be related to system load being over 90 heh)


Try "ls -f" (don't sort)?

Which filesystem you use will also make a big difference here. You could imagine some filesystem that uses the getdirentries(2) binary format for dirents, and that could literally memcpy cached directory pages for a syscall. In FreeBSD, UFS gets somewhat close, but 'struct direct' differs from the ABI 'struct dirent'. And the FS attempts to validate the disk format, too.

FWIW, FreeBSD uses 4kB (x86 system page size) where glibc uses 32kB in this article[1]. To the extent libc is actually the problem (I'm not confident of that yet based on the article), this will be worse than glibc's larger buffer.

[1]: https://github.com/freebsd/freebsd-src/blob/main/lib/libc/ge...


with "ls -f" on 1.9M files its 45 seconds, much better than regular ls (and system load of 94)

2.25M and its 60 seconds

I'm also spamming about 16-18 thousand new files per second to disk using a very inefficient set of csh scripts...


A more efficient one-liner:

    seq 1 8000000 | xargs touch


Interestingly, that didn't work.

Strangely "jot -w file 8000000 1 | xargs touch" worked.

I'm going to try to replicate this because I sense some kind of tomfoolery.


The command I quoted works verbatim on one of my Ubuntu systems. It's ~60X faster than eg "for i in $(seq 1 8000000); do touch $i; done" because it creates many files per fork+exec, and fork+exec is a much heavier operation than creating an empty file.


Seq isn’t installed by default in FreeBSD, while jot is. That’s all. I think seq may be part of coreutils and installed as ‘gseq’? Not entirely sure.


seq is in /usr/bin, its default in FreeBSD as on FreeBSD 9.


I'm actually not sure why it failed after generating 1.3M files, no error messages or anything, it was weird. Initially I thought maybe it was like an inode/fd issue or something but no.


ok, unloaded system, 12M files. Using old SATA 300GB Raptor disks that I had sitting around. Fairly old E5-2650 CPUs clocked at 1.5Ghz because of power usage, this is single core performance.

"ls" is using 2.5GB of ram, 76 seconds.

"ls -f" is using 2.4GB of ram, 18 seconds.

"ls -mf" uses like 2.4GB of ram, 20 seconds.

For those who say "cache!", no, I pre-warmed the cache and this is the result after that.

There are a few other things that could be related since the original article was about a VM. The VM is going to be affected by SPECTRE/Meltdown patches, a known performance thief. I've got them enabled on this box but I'll disable them shortly and re-test. Also my test box has 64GB of ram and is running FreeBSD 13 with ZFS. I get about 150MB/sec and 1100IOPs with the spinning rust drives.

Update: Turned off Spectre/Meltdown patches, "ls -f" takes 17 seconds, "ls -mf" takes 18 seconds, "ls" takes 70 seconds.


At 3,000 my windows 7 os freezes. Not bad for a million.


You may want to disable short name generation on windows when putting many files on one directory.


Non-Loaded system with 2.8M files -> 13 seconds


ls -f


       -f      do not sort, enable -aU, disable -ls --color
       
       -a      do not ignore entries starting with .
       -U      do not sort; list entries in directory order
       -l      use a long listing format
       -s      print the allocated size of each file, in blocks
       --color colorize the output
I assume you mean to imply that by turning off sorting/filtering/formatting ls will run in a more optimized mode where it can avoid buffering and just dump the dentries as described in the article?


Yeah, exactly. OP is changing 3 variables and concluding that getdirent buffer size was the significant one, but actually the problem was likely (1) stat calls, for --color, and (2) buffer and sort, which adds O(N log N) sorting time to the total run+print time. (Both of which are avoided by using getdirent directly.)



Page has mojibake


xargs?


Yeah ...

However, lets just accept that regular people don't know those tricks and we should keep files in subfolders? I have that logic in any app that has potential to spam a directory. You can still show them as a single folder (somewhere called branch view) if you like but every other tools that uses ls will work like a charm (such as your backup shell script)


Then anything working on it needs to recurse.


Only if you have multiple layers of depth. You can easily have a single layer and then its a flat loop to list everything and broken up in to multiple much faster operations.

I haven't looked in to the details but one system I can imagine is hashing the file names and then placing the file in the folder named with the first x parts of the hash. Then to look up you can use the filename to work out exactly which folder the file is in and you never exceed a set limit for files in one folder.

If we say 3000 files is perfectly safe in a single folder, 3000 folders with 3000 files holds 9M files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: