Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: UTF-8 input [was Re: PATCH: zle_params.c]

X-seq: zsh-workers 20762
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: Zsh hackers list <zsh-workers@xxxxxxxxxx>
Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c]
Date: Mon, 31 Jan 2005 16:18:26 +0000
In-reply-to: <200501311146.j0VBki1g028832@xxxxxxxxxxxxxx>
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <200501261806.j0QI6Q2d021854@xxxxxxxxxxxxxx> <20050129034740.GA21742@xxxxxxxxxxx> <20050130010754.6F985863A@xxxxxxxxxxxxxxxxxxxxxxxx> <1050130063525.ZM24312@xxxxxxxxxxxxxxxxxxxxxxx> <200501311146.j0VBki1g028832@xxxxxxxxxxxxxx>

On Jan 31, 11:46am, Peter Stephenson wrote:
} Subject: Re: UTF-8 input [was Re: PATCH: zle_params.c]
}
} > Otherwise don't you have issues if what the user really means to
} > bind to self-insert is a single-byte character that happens to have
} > the high bit set?
}
} Hmmm... you mean that on a system where mbrtowc() reports that a
} single-byte character is incomplete, the user might nonetheless want to
} insert a single-byte character onto the command line?

No.  I mean, suppose the user uses the same .zshrc in both a iso-8859-*
and a UTF-8 locale, and has an explicit bindkey command which is intended
to work only in the iso-8859-* locale.  That bindkey happens to use a
character for which, in the UTF-8 locale, mbrtowc() reports incomplete.
This was in part why I added the footnote asking about plans for UTF-8
in shell scripts; is it even possible to have the same .zshrc in these
cases?

However, I wasn't thinking very clearly, since mbrtowc() won't report
incomplete for an iso-8859-* character if LC_CTYPE is set correctly.

I'm still worried about the case where that bindkey exists but is for a
function other than self-insert.  If multibyte translation is handled by
a widget at the same priority as all other widgets, that "stray" bindkey
can mess up the whole scheme.

} In other words, are you supposing this is some kind of fallback in
} case the locale isn't set correctly, e.g. it's set to UTF-8 but on an
} xterm with character set ISO-8859-1?

That was probably what was in my head, but on reflection it's not really
something that the shell can deal with.

Follow-Ups:
- Re: UTF-8 input [was Re: PATCH: zle_params.c]
  - From: Peter Stephenson

References:
- PATCH: zle_params.c
  - From: Peter Stephenson
- UTF-8 input [was Re: PATCH: zle_params.c]
  - From: Clint Adams
- Re: UTF-8 input [was Re: PATCH: zle_params.c]
  - From: Peter Stephenson
- Re: UTF-8 input [was Re: PATCH: zle_params.c]
  - From: Bart Schaefer
- Re: UTF-8 input [was Re: PATCH: zle_params.c]
  - From: Peter Stephenson

Messages sorted by: Reverse Date, Date, Thread, Author