Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think here is a good argument for not using case-insensitive filesystems - because every single filename comparison gets affected and it can lead to vulnerabilities like this (I wonder what others are out there...) Case-insensitive initially feels like a good idea to some, but I think it's a good example of "trying to do too much" and often in subtle ways that even the user might not fully understand - the definition of "case" changes with locale, for instance. In contrast, with filenames that are treated as dumb and simple, plain sequences of bytes that just cannot contain certain characters (and thus compared accordingly for equality, bit-by-bit), there is no need to even consider the concept of "case", and no ambiguity: It either matches exactly or doesn't match.

(I am aware of all the - quite frankly ridiculous - complexity of Unicode characters that are visually identical and "should be treated as such for the purposes of comparison", but I think that's another example of excess complexity leading to things like directory-traversal attacks.)



> Unicode characters that are visually identical

This was actually a further bug, reported as part of the same CVE - you could also overwrite .git/config by adding any of a number of zero-width Unicode characters that many filesystems ignore when checking for filename equality (but string comparison doesn't, of course).


http://en.m.wikipedia.org/wiki/Unicode_equivalence

http://en.m.wikipedia.org/wiki/IDN_homograph_attack

What seems really scary about this is that even Unicode has several different ways of comparing strings, and the correct one depends on the exact situation, so the common response of "just use a library" doesn't work; for example, if a user were searching for a filename it might make sense for full-width characters to compare equal to half-width ones, but not if opening a file where you wouldn't want e.g. the full width version of /etc/passwd to be equivalent to the half-width one.


I don't understand. Why can't a library just compare strings at the code-point level, ignoring "canonical equivalence"?


Then you run into problems with how characters are represented. For instance, é (lowercase latin e with an acute accent) can be represented either by one unicode codepoint (U+00E9, 'LATIN SMALL LETTER E WITH ACUTE'), or by two unicode codepoints (U+0065 U+0301 -- LATIN SMALL LETTER E, COMBINING ACUTE ACCENT). There are normalization forms that will convert these two representations into the same representation for easier comparison.

If you don't perform canonical equivalence checking, you could search for "café" and not find a file named "café.txt" if it uses the other representation.


"for example, if a user were searching for a filename"

It's useful when I search for "café", if I also get results for "cafe" - Chrome's search does this. Not to mention searching for "don't" and getting hits including "don’t". But that should definitely be restricted to text data operations, rather than lower-level ones, including filenames.


But that should definitely be restricted to text data operations, rather than lower-level ones, including filenames.

The issue arises when this "text data" includes filenames. Having café.txt and cafe.txt be equivalent when searching is useful, but the real problem is if a filesystem decides that two "equivalent" filenames are essentially identical - to contrive an example, suppose it thought /étc/passwd was referring to the same file as /etc/passwd . It makes checking for and filtering out "sensitive" filenames far more difficult. For example, just take a look at all the ways Unicode homoglyphs and "special" characters can be used to bypass forum wordfilters, and you'll see how difficult that problem is.

(I know permissions, ACLs, etc. can help here with access control, but the problem of distinguishing between filenames still stands.)


Linus actually has a great rant about brain-dead filesystems that mangle people's data. It's eerily prescient. http://thread.gmane.org/gmane.comp.version-control.git/70688...


He even mentioned the security aspect here:

http://article.gmane.org/gmane.comp.version-control.git/7076...

"Having programs that get different results back from what they actually wrote, that tends to be a security issue"


I think the main argument against case-insensitivity is that every file system driver must contain a large [1] code blob that does the case-insensitive comparison. That blob cannot be shared between drives or with the OS because one must guarantee that it stays the same forever. It is almost a sure bet that case-insensitivity in NTFS is different from that in HFS+ (even disregarding their different canonicalization).

Directory-traversal attacks work just as well with ASCII or byte sequences.

[1] of course, what is large becomes less and less important over time, thanks to Moore's law. On embedded devices, this still may be quite significant, though.


I once tried to install OS X with case sensative filesystem. Turns out Photoshop for OS X does not work in that case. Had to go back to case-insensitive. Not sure how many other apps have the same issue.


Also had issues with steam. The solution ? Install it in a disk image: https://github.com/kdeldycke/dotfiles/commit/05cef3c1de4a208...


World of Warcraft (and any other Blizzard game) won't run off of a case sensitive filesystem.


Steam also requires case-insensitive filesystem on OSX.


Steam on Linux does not. That probably means that Linux doesn't get any steam games that require a case-insensitive fs even if they would work otherwise (unity3d, monogame, ...), at least if there isn't a special linux version that does not require it.

I never really thought about this issue.


The issue here is not case-insensitive filesystems, they are a huge benefit to novice users. But that the type system does not distinguish between paths and strings. A path is distinctly different from a string, and should never be compared as one. The type system should always enforce this and never allow you to mistakingly do the comparison you propose, for exactly the reasons you state. Modern filesystem libraries (for type-safe languages) do this, the problem is (as is becoming more and more common lately) the abundance of old tools that were not designed with security in mind.


As you point out, a path is different from a string. It makes sense in many cases to do a case-insensitive comparison on strings. It NEVER makes sense to do that on paths.

A path is an identifier. Just as it's infuriating and horrendous to have programming language identifiers act case-insensitively, it's a poor design choice to have path identifiers act case-insensitively.

I suspect that what you describe as a huge benefit to novice users is entirely seen in search functionality, where you are comparing a string (the search needle) against a number of paths (the haystack). Since this search should be conducted by converting the path to a string (never the other way around, that's nonsensical) it's perfectly natural to support CI operations on strings but forbid them on paths.


so true


The OS should always be in service of the user. And users (in my experience) vastly prefer case insensitivity.

Sorry, but there's just no way around it.


Can you elaborate on that experience? I would have thought users not typing filenames wouldn't care less, and that users who are typing filenames would tend to be developers or 'advanced' users who aren't confused by case-sensitivity, and recognise the advantages.


That's how I see it too - inexperienced users will only be typing in filenames to name new files and likely use a GUI filechooser for selecting existing ones, while the ones typing in filenames are probably using CLIs.

The common complaint "but it's annoying to have to type in the filenames exactly" can be solved with tab-completion (I'm surprised how many CLI users don't know their shell has this feature), and using sane naming conventions like not GiVinG yOuR fIleS_stUpiD-NaMESLikEth1s.


Users shouldn't have access to the filesystem. The usability issues are too significant even for competent users. A good laugh/example: http://xkcd.com/1459/


The length at which I have to go to remember how to get stuff onto an ipad through itunes is an exercise that always reminds me that I always want access to the filesystem, even if I don't normally use it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: