Then you run into problems with how characters are represented. For instance, é ...

Then you run into problems with how characters are represented. For instance, é (lowercase latin e with an acute accent) can be represented either by one unicode codepoint (U+00E9, 'LATIN SMALL LETTER E WITH ACUTE'), or by two unicode codepoints (U+0065 U+0301 -- LATIN SMALL LETTER E, COMBINING ACUTE ACCENT). There are normalization forms that will convert these two representations into the same representation for easier comparison.

If you don't perform canonical equivalence checking, you could search for "café" and not find a file named "café.txt" if it uses the other representation.