Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Unicode, Korean, normalization form, Mac OS X and tab completion



I have to work with lots of files of Korean names. 
But the problem is that zsh failed in tab completion with Korean files.
So I’ve done research to figure out what’s going on and I found some keywords such as unicode, normalization form, Mac OS X, and decomposition.
Also I searched mailing list and read some threads related to unicode or multibyte support. 
But I can’t find any solution.

I’m not an expert about Unicode, zsh, Mac OS X. So I’m asking your help..

Here’s my description about the issue..

1) Unicode spec has defined normalization forms, which is related to canonical equivalence, comparing two unicode strings.
2) Normalized forms are to decompose a character into some components.
    For example, Å(alphabet A with a ring above) -> A(alphabet A) + ˚(ring above) or 가(hangul syllable ga) -> ㄱ(hangul choseoung gieuk) + ㅏ(hangul jungseong ah)
3) A Korean letter, a.k.a hangul, has three parts: Choseong, jungseong, jongseong. For example, 가 is decomposed into the choseong, ㄱ, and the jungseong, ㅏ.
    And 각 can break down into ㄱ,ㅏ,ㄱ(the jongseong).
4) Mac OS X uses normalized string as filename. Assuming there’s a file with the name of 가나다, it has the name of ㄱㅏㄴㅏㄷㅏ(decomposed into hangul jamos) internally. (Link to hangul jamos: http://www.utf8-chartable.de/unicode-utf8-table.pl?start=4352&number=1024 )
5) I guess the reason why the tab completion has failed is that zsh compare the user input, 가나다, with the filename, ㄱㅏㄴㅏㄷㅏ.
    가나다 and ㄱㅏㄴㅏㄷㅏ are canonically equivalent but have different binary representations.
6) I insist that comparing two unicode strings must be done with respect to the canonical equivalence.
7) Unicode spec has the dedicated section for treating hangul syllables. Fortunately, hangul can be decomposed and composed algorithmically.
( Please refer to the unicode spec section 3.12 under “Parsing" http://www.unicode.org/faq/specifications.html )
8) On Ubuntu, the tab completion is perfectly working. Currently, this issue is restricted to Mac OS X. (I’ve never tested on the other platform.)
9) I think this is related to the COMBINING_CHAR option but the option is not regarding hangul.
10 ) Now, the latest version of bash is the only shell with working tab completion feature on Mac OS X.
11) ‘Hangul’ is the name of Korean letters. If you have interested in it, please refer to http://en.wikipedia.org/wiki/Hangul

Thanks for reading.

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail



Messages sorted by: Reverse Date, Date, Thread, Author