Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [PATCH] zle_refresh multibyte fix



Andrey Borzenkov wrote:
> The patch allows you to edit multibyte input (do not press TAB it will crash 
> zsh).

Hmm, I think Clint had already tried to write it so that it used
multibyte strings.  But whatever works.

> There are some bits missing, and most confusing is {lr,}prompt 
> treatment that is still mb and not wc.

I think these could be converted when zleread() starts (and freed at the
end if necessary).

> Actually I find wc stuff very easy and suitable for using as internal 
> representation in zsh core. But this is separate topic.

Apart from the inefficiency of extending every byte that comes into the
shell into (typically) a four-byte integer, we can't rely on input and
output bytes being a valid wide character in the current locale at all.
I think the shell has to handle arbitrary strings of bytes without
mutilating them.  Consider, for example:

  # Pass secret byte to my utility
  my_utility $'\xff'

(or any other string you like, the only point being that it isn't a
valid multibyte character string).  I don't see why we should
arbitrarily decide that doesn't work because it doesn't convert to a
wide character.  It will simply break far too many things.

However, in any case this isn't going to change soon.

> This does not use VARARR as is, I can add it in committed patch if deemed 
> necessary. Where can I find more info about it?

See system.h; it's fairly simple: type, name, size.

> Please test it without ZLE_UNICODE_SUPPORT.

There's a comma missing between fwrite arguments, and ZS_memset is
incorrectly defined to wmemset in this case.  Otherwise it seems OK
after a quick test.

> I may have got confused by ZLE_CHAR_T vs. ZLE_STRING_T; Peter please get a 
> look is usage is right.

Basically, any existing int that holds a character should be ZLE_CHAR_T
(though I'm coming to the view I should have made it wint_t, not
wchar_t, and dropped ZLE_INT_T, since the whole point of using int
instead of a character in the old code was to hold EOF --- or maybe
ZLE_INT_T is the right one to keep).  Any char * or
unsigned char * that refers to an array which is now a wide character
should be ZLE_STRING_T.

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************



Messages sorted by: Reverse Date, Date, Thread, Author