Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Unicode, Korean, normalization form, Mac OS X and tab completion

Thanks for the reply, Chet.

On May 31, 11:21am, Chet Ramey wrote:
} Subject: Re: Unicode, Korean, normalization form, Mac OS X and tab complet
} Your description and solution are right on the mark.  Mac OS X stores and
} returns filenames in decomposed Unicode (NFD), while Mac keyboards return
} characters in precomposed Unicode (NFC).

Hrm.  I'm rather surprised this hasn't broken something *else*, because
zsh is freely mixing keyboard and filesytem representations all over the
place.  E.g., does globbing also fail, in at least some cases?

} What I did in bash was to convert between keyboard and file system
} representations when performing filename comparisons for filename
} completion.  Zsh can do the same using iconv, which provides (on Mac
} OS X) the UTF-8-MAC encoding to do the conversion.

Unfortunately it's not isolated there.  Except for the (old, deprecated)
compctl completions, zsh does all the interesting work in shell functions
with strings that may come from glob patterns or array variables or any
number of other places.  Only sometimes are those strings passed through
the helper builtin that interprets them as file names, and even then it
can't possibly know whether they originated from readdir().

Fortunately, I think it *would* be OK to use the zreaddir() wrapper to
convert everything from NFD to NFC.  zreaddir() already applies zsh's
metafy() operation to all the file names, so as long as the OS properly
converts back to NFD (which it must, or we'd already be in deep doody
from throwing keyboard input at it) it should be safe to also iconv()
at this point.  This should cover globbing as well as completion.

What are the configure / compile-time / run-time tests needed to detect
this situation?  Are we going to run into problems with e.g. NFS or Samba
filesystems that are NOT in NFD representation?  Do we need to handle this
as a general case where we should always be testing in some way for wonky
filesystems in order to normalize (e.g., a Mac FS mounted on Linux)?

Messages sorted by: Reverse Date, Date, Thread, Author