Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Unicode, Korean, normalization form, Mac OS X and tab completion

On 5/30/14, 11:56 PM, Kwon Yeolhyun wrote:
> I have to work with lots of files of Korean names. 
> But the problem is that zsh failed in tab completion with Korean files.
> So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
> Also I searched mailing list and read some threads related to unicode or multibyte support. 
> But I can’t find any solution.
> I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Your description and solution are right on the mark.  Mac OS X stores and
returns filenames in decomposed Unicode (NFD), while Mac keyboards return
characters in precomposed Unicode (NFC).  Decomposed Unicode is as you
describe: certain characters are `decomposed' into multiple codepoints.
(My use of NFD and NFC is not exact, but it's useful shorthand.)

What I did in bash was to convert between keyboard and file system
representations when performing filename comparisons for filename
completion.  Zsh can do the same using iconv, which provides (on Mac
OS X) the UTF-8-MAC encoding to do the conversion.

One possible strategy is to convert each filename to NFC for comparison,
something like the following.

1.  Keyboard input stays in NFC and is converted (dequoted, for example)
    to a `raw' form for comparison.

2.  Read directory, assume each name will be returned in NFD, convert
    name to NFC.

3.  Perform comparison using whatever strategy you'd like (e.g., taking
    case into account, mapping equivalent characters, whatever)

4.  If the comparison succeeds, add the matching filename (NFC) to the
    list of completions.

5.  If you have to add the filename to the command line (e.g., there is a
    single match), you have already converted it to NFC and can insert it


``The lyf so short, the craft so long to lerne.'' - Chaucer
		 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRU    chet@xxxxxxxx    http://cnswww.cns.cwru.edu/~chet/

Messages sorted by: Reverse Date, Date, Thread, Author