Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Some groundwork for Unicode in Zle



Peter wrote:
> from how the line is encoded internally.  We can use wchar_t inside and
> pass back a multibyte string.

Good to see this being addressed. How do you plan to cope with encoding
nulls if you use wchar_t? (or does zle not bother?) The whole meta stuff
is what really scared me off ever touching this.

> I've made a very dull patch that does a few things that might make adding
> Unicode support to Zle easier.  Actually, I think within Zle it should
> be easy to use generic wchar_t's and not worry about whether they're
> really Unicode, but I still propose to rely on __STDC_ISO_10646__ to

Why? Relying on __STDC_ISO_10646__ will rule out a good number of
systems that do otherwise have good support for multibyte encodings such
as UTF-8. __STDC_ISO_10646__ is defined on surprisingly few systems. We
really don't care about what wchar_t is internally if we let libc do our
conversions.

> Before I get to details of what I've patched so far, one question: how
> do we turn input into characters?  My first thought was to do it at a low
> level around getkey, possibly in getkeybuf which already does

That would seem more sensible to me. Allowing partial multi-byte
sequences to be bound is not very nice and probably not very useful.

> The actual Unicode-related changes are minimal.  system.h shows how I

Did you mean to attach an actual patch?

> #if defined(HAVE_WCHAR_H) && defined(HAVE_WCTOMB) && defined (__STDC_ISO_10646__)
> # include <wchar.h>

You'll probably want to include wchar.h even if __STDC_ISO_10646__ is
not defined. For \u/\U, wchar_t was only useful when converting from
unicode to wchar_t could be done trivially: when __STDC_ISO_10646__ is
defined. It otherwise uses iconv or a hardcoded UTF-8 conversion. For
zle, I can't think of any instance where you would care whether whar_t
is unicode.

> /*
>  * More stringent requirements to enable complete Unicode conversion
>  * between wide characters and multibyte strings.
>  */
> #if defined(HAVE_MBTOWC)
> /*#define ZLE_UNICODE_SUPPORT	1*/

I don't quite follow the logic of that check.

I wouldn't have thought ZLE_UNICODE_SUPPORT is a good name for the
define. The requirement is to support multibyte character encodings, not
specifically "unicode" and the same define will probably be extended to
areas outside of zle. How about ENABLE_MULTIBYTE, perhaps linked to a
configure --disable-multibyte option.

> typedef wchar_t *ZLE_STRING_T;
> #else
> typedef int ZLE_CHAR_T;

Why int and not unsigned char? Is it really worth having the separate
STRING type? Again, I wouldn't use "ZLE" in the name given that we may
want to use it outside zle someday.

> All the tests still pass, so I will commit this some time today.

Would it be worth creating a separate branch for multibyte support? It
could later become 4.3. If so I'd suggest we continue to commit
everything non-multibyte related to the current branch to avoid the old
issue of the current release being very old.

Oliver



Messages sorted by: Reverse Date, Date, Thread, Author