User talk:Dvasil

From Ext4
Revision as of 05:28, 21 October 2008 by Captainobsequious (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Hi, Dhimitrios.

You have an interesting proposal, but I'd like to take issue (respectfully, of course), with some of its points.

A lesser issue is that your character repertoire restrictions still aren't restricted enough. What do you gain from permitting the non-breaking space (U+00A0)? It could only be confused with the regular space (U+0020), which is a disadvantage. The XML suitability guidelines don't fare well as filename guidelines. You don't have line breaks in filenames, and even BiDi text is displayed on a LTR screen. I personally think ZWNJ (for Farsi) is one of the few formatting characters that are really necessary.

There's a much bigger issue, though: your choice of UTF-8. I'm all for exclusive use of Unicode, and the restriction to the ASCII range when dealing with foreign filesystems looks well-reasoned to me too. However, I think UTF-8 has a lot of serious disadvantages:

  1. The use of UTF-8 requires draconian enforcement against illegal sequences. Control characters and slash can be encoded in UTF-8 surreptitiously by using overlong sequences (C0 AF or E0 80 AF for the slash). The security risk is considerable.
  2. Cutting down the available number of character to a third of the bytes is a huge waste. The limit is little better than Joliet (only 21 more characters; people often reach more than that in their filenames, particularly when they save web page titles with source website name and date of authorship).

I think UTF-16 is a much better choice than UTF-8. You rightly note how widely supported Joliet is. You could object Joliet is limited to 64 characters, but that's only because it needs to be compatible with ISO 9660. If you're doing a filesystem proposal from scratch (as I gather is the case here), then you can go for a maximum length of 127 characters. I agree with you on prohibiting characters outside the BMP, though: outlaw all surrogate code points and you're golden. --Captain Obsequious 05:28, 21 October 2008 (UTC)

Personal tools