Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: UTF-8



Olivier Verdier wrote:
> 
> I'm using Darwin and Mac OS X 10.1 together with zsh (zsh --version =
> zsh 4.0.4 (powerpc-apple-darwin1.4)), and I can't figure out how to make
> it work properly with UTF-8 encoding. All file names are indeed encoded
> in UTF-8 on macintosh hard-disk (HFS+ format). I use a terminal which is
> UTF-8 aware (apple Terminal.app). It works perfectly with
> UTF-8-configured 'less' and 'vim' commands.
> 
> Some examples of misbehaviors:
> 1) a 'ls' command for "Téléchargement" gives "Te??e??hargement"

The output of the ls command doesn't pass through zsh at all but goes
straight to the terminal so in this case, it is either ls or the
apple terminal which is failing to handle UTF-8.

>         *but* 'ls | less' gives "Téléchargement" if less is configured for
> UTF-8
>         so the output of 'ls' is correct, but is misinterpreted by the shell

That seems a little strange. I would suspect that the terminal is expecting
something like ISO-8859-1 and less is converting to that from UTF-8. Try
using a more weird character and see what happens then.

> 2) completion doesn't work; if 'Télé' is on the directory, Té[tab] gives
> nothing, but 'cd Télé' works...
>         *moreover* 'cd Té' writes 'cd T@' on screen, but 'cd Té[tab]' turns
> itself into 'cd Té'
> 
> 3) 'cd Télé' together with the option 'printeightbit' prints correctly
> the pwd; mkdir Télé works as expected.

I'm not quite sure why the completion there doesn't work. I don't have a
UTF-8 aware terminal to experiment with this which doesn't help.

Unfortunately, zsh was never built to handle UTF-8 correctly. For many
things it would be transparent because of the way UTF-8 is designed.
Commands like echo and cd I would expect to work. In some areas though,
it won't work. For example, if you assign a UTF-8 string to a variable
and use $#var to get its length, it will report the length wrongly
because it will count two for two-byte characters.

Fixing this would be quite a big job because it would affect virtually
all the code and need initial thought to work out where to use wide
characters, where to use UTF-8 and where to do conversions for input
and output.

For future reference, send any zsh questions to zsh-users@xxxxxxxxxx or
zsh-workers@xxxxxxxxxxx The address you used just goes to the people who
maintain the web pages.

Oliver Kiddle

_____________________________________________________________________
This message has been checked for all known viruses by the 
MessageLabs Virus Scanning Service. For further information visit
http://www.messagelabs.com/stats.asp



Messages sorted by: Reverse Date, Date, Thread, Author