Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Unable to input multibyte characters.



On Wed, 4 Nov 2009, Ian-Xue Li wrote:

> numerous
> Hi,
> my problem is that input "äåå" but appears some weird code like "ÃåÃÂ?
> ÃÂÃÂ".

That appears to be the UTF-8 sequence äåå interpreted as some other 
char set. (e.g. ISO-8859-1)

$ echo äåå | iconv -f ISO-8859-1 -t UTF-8    
ÃÂÂÃÂÂÃ

So, I suspect it's a locale issue.  What do you get from the 'locale' 
command?  For me, under Gentoo Linux, with working UTF-8 support, I get:

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=POSIX (<-- personal preference... shouldn't matter here)
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

> I tried setting the multibyte option by "setopt multibyte", but after 
> that, "setopt" outputs no multibyte flag in its listing. So I figure 
> there might be something wrong with the version 4.3.10 ?

If built with multibyte support, the default is that multibyte will be on.  
For me (with multibyte working):

$ set -o | grep multibyte
nomultibyte           off

(The 'no' prefix means that it's on by default.)


> Terminal is urxvt and xterm, both were unable to input Chinese and 
> Japanese characters with SCIM. (doable in bash, nothing else is 
> changed.)

With rxvt-unicode (urxvt), the following works for me:

$ scim -d
# SCIM starts as daemon
$ XMODIFIERS=@im=SCIM GTK_IM_MODULE=scim QT_IM_MODULE=scim urxvt
(... new urxvt starts ...)

$ <Ctrl+Space>
(...activates scim-pinyin... and I'm able to enter things via pinyin.)



To be sure Zsh itself is okay, you can try the following:

$ autoload insert-unicode-char
$ zle -N insert-unicode-char
$ bindkey "^U" insert-unicode-char

Then, to type your 'ni hao ma' from before, where '^U' represents Ctrl+U:

^U 4f60 ^U ^U 597d ^U ^U 55ce ^U

(The first '^U' tells Zsh to expect a hex-coded Unicode charpoint.  The 
second '^U' tells Zsh you're finished inputting the hex and then it 
inserts the char.)


> 
> I've also recompiled Zsh with an explicit "--enable-multibyte" && has 
> started Zsh with --multibyte flag, they did no help.
> 
> (this is vital because there are numerous files are named in these 
> characters, and I use the shell tools to manage them. So please help !)
> 

Best,
Ben


Messages sorted by: Reverse Date, Date, Thread, Author