Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Autocompletion doesn't work with kanji

On Tue, 22 Dec 2009, Christoph Dittmann wrote:

> Hi,
> today I stumbled upon a problem with autocompletion and Japanese kanji
> characters. My ~/.zshrc only contains the following 3 lines and nothing
> else:
> zstyle -e ':completion:*' completer '
>     _foo_bar="$HISTNO$BUFFER$CURSOR"
>     reply=(_complete _match _ignored _prefix _files)'
> Usually I use the .zshrc from the grml project. When I noticed the kanji 
> problem, I tried to extract a minimal piece of code where it still 
> happens. The above 3 lines are what I ended up with.
> I noticed that I can rename _foo_bar to anything and the problem 
> persists. However, removing the line with _foo_bar completely also makes 
> the problem disappear. But removing this line would break the real grml 
> .zshrc, so this is not a solution.

[shoot... just noticed after writing this up that you're using a UTF-8 
locale... nonetheless I think the solution might be the one I suggest...]

Something makes me doubt that's the essential line in the real grml .zshrc 
that's doing what you want.  The command you've entered essentially says:

In all completion contexts (':completion:*') evaluate ('-e') for the 
'completer' style this text:

reply=(_complete _match _ignored _prefix _files)

The _foo_bar="$HISTNO$BUFFER$CURSOR" portion is essentially assigning to a 
variable that gets thrown away.

The problem, and the interaction with kanji and hiragana is (I think) some 
combination of not having a multibyte-capable Zsh (@4.3.10, though?) and 
your locale and/or terminal settings:

I suspect you're somehow ending up with ISO-2022-JP encoding, where kanji 
'hon' and hiragana 'a' encode to the following sequences:

æ  1b 24 42 4b 5c 1b 28 42  == \033$BK\033(B
ã  1b 24 42 24 22 1b 28 42  == \033$B$"\033(B

$BUFFER contains the line as edited thus far.  So, in one case, there's 
an extra double-quote hanging about that messes things up.

The completer style function tries to evaluate (something like):

_foo_bar="12345grep \033$B$"\033(B7"

which fails due to the extra double-quote.

It might be as simple as using 'setopt multibyte' somewhere.

I'd check your terminal's encoding.  Unless you work with very esoteric 
kanji, UTF-8 is your friend.  I couldn't get this to break w/ any of { 
xterm, rxvt, mlterm, rxvt-unicode, gnome-terminal, and konsole }

Another (improbable) point of failure is your input method outputting the 
wrong locale.  (In olden times, IM's often defaulted to certain locales.)  
(But improbable, because you're seeing it correctly?)

> [...] However, I could not find a pattern in which characters work and 
> which do not.

If the ISO-2022-JP theory holds any water, ã and ã should be the only 
kana with a double quote, as should the following kanji (but no others in 
the åç set):


Messages sorted by: Reverse Date, Date, Thread, Author