Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Unicode, Korean, normalization form, Mac OS X and tab completion



2014/06/02 04:13, Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
>> $ ls u<TAB>     # completes to über (useful for some user??)
> 
> The current behavior here is pretty much by accident,

I've been thinking the NFD/NFC problem is not so serious because I can
use u<TAB> instead of ü<TAB> (u is easier to type than ü on my keyboard),
and simply guessed that Western non-English-speacking people (German, French,
Spanish, etc.) were using something like u<TAB>. But maybe I was wrong.

In Japanese, some Hiragana/Katakana can have a kind of accent, e.g.,
か + accent = が. It's OK for me to type か<TAB> instead of が<TAB>,
but many Japanese Mac/zsh users were frustrated with the problem and
one of those users came up with the patch I mentioned in the previous post.

I was thinking Korean (and Chinese) are free from the NFC/NFD problem, but
now I know I was wrong. I didn't know that Korean filenames are completely
decomposed down to each consonant/vowel. It was a surprise to know that

$ echo '\u1100 \u1161'
ᄀ ᅡ
$ echo'\u1100\u1161' 
가

Anyway, I did the following quick tests concerning the file sharing
among Mac and Win/Linux. But the tests are incomplete, and I did
them in a hurry so there may be mistakes:

(1) File sharing between Mac and Windows (samba):
It seems samba server/client on Mac do automatic conversion between
NFD and NFC. A Mac volume mounted on Win behaves as if it is a NFC volume,
and a Win volume mounted on Mac behaves as if it is an NFD volume.
This means composing readdir() output on Windows is not necessary
even if the volume is physically an NFD volume, while it must be
converted to NFC on Mac even if the volume is physically a NFC volume.

(2) A USB flash drive (FAT format):
If mounted on a Windows box it is a NFC volume, of course, and if mounted
on Mac it behaves as if it is a NFD volume (decomposed by a driver on Mac).
So the situation is the same as (1). I believe Linux behaves similarly
as Windows.

(3) File sharing between Mac and Linux (NFS):
If a Mac volume is mounted on Linux, then no NFC/NFD conversion takes
place; it seems readdir() on Linux returns NFD filenames for the volume.
(I enabled nfsd on my Mac with the default setting. I looked into nfsd(8)
or exports(5) man pages but they don't mention anything about NFC/NFD).
This means that zsh on Linux can't complete decomposed filename correctly.
But it seems iconv(3) on Linux doesn't support UTF-8-MAC and I can't think
of any solution here.

I had no time to test mounting Linux volume on Mac, but the mount_nfs(8)
man page on Mac says it has an option to convert NFD filename on Mac
to NFC filename on the Linux server.

I also couldn't test mounting Mac volume on Linux via samba, but I guess
it behaves as if it is a NFC volume on Linux.

The results so far suggest that readdir() output must be always converted
to NFC on Mac.
On Linux (and maybe on Windows) no conversion is possible because iconv()
doesn't support UTF-8-MAC, but conversion is not necessary except for when
mounting Mac volume via NFS.



Messages sorted by: Reverse Date, Date, Thread, Author