Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Unicode, Korean, normalization form, Mac OS X and tab completion



Kwon Yeolhyun wrote on Sun, Jun 01, 2014 at 14:30:03 +0900:
> 
> On Jun 1, 2014, at 11:25 AM, Daniel Shahaf <d.s@xxxxxxxxxxxxxxxxxx> wrote:
> 
> > Bart Schaefer wrote on Sat, May 31, 2014 at 14:29:26 -0700:
> >> On May 31,  8:16pm, Peter Stephenson wrote:
> >> }
> >> } I'm currently wondering if there is scope for normalising keyboard input
> >> } really early --- before we feed it back to the shell --- and turning it
> >> } back into the usual keyboard form right at the end
> >> 
> >> Per thread with Chet, I think normalizing the filesystem is the easier
> >> way to go.  Keyboard input is already as close to normalized as it needs
> >> to be, I think, and with only a couple of exceptions all the names we
> >> get from the filesystem come through zreaddir().
> > 
> > What about, say, people doing 'ls' and copy-pasting a filename from the
> > output into a command line?  Wouldn't that result in NFD keyboard
> > input?
> > 
> > FWIW, while OS X always returns NFD filenames, one could also imagine an
> > OS that is normalization-aware (forbids creating a file if its
> > normalized name is the same as the normalized name of an existing file)
> > but octet-sequence-preserving, and on such an OS both the readdir()
> > output and the user input would need to be normalized.
> > 
> > Also, other unixes allow you to have both the NFC-form and NFD-form in
> > the same directory, e.g., 'touch fooá fooá' works just fine on linux
> > ext4 (the first filename is composed, the second decomposed); in such
> > cases normalization magic should not be done.
> > 
> > Fun! :-)
> > 
> > Daniel
> 
> Fortunately, I think Mac OS X can handle input in decomposed or composed form.

Yes, AFAIK, OS X accepts input in any normalization and returns
NFD-normalized filenames.

> So I think we can convert decomposed filenames into composed after readdir. It will work at least for Korean.

That would work if the input is in NFC.

> Detecting, composing, and decomposing hangul can be done easily.

It is easy to convert any Unicode string to NFC or to NFD, not just
strings consisting of Hangul codepoints.

Cheers,

Daniel



Messages sorted by: Reverse Date, Date, Thread, Author