Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: bug in completion/expansion of files with LANG=C

On Fri, 6 Jan 2006 13:58:29 -0800
Wayne Davison <wayned@xxxxxxxxxxxxxxxxxxxxx> wrote:
> What should zsh do with characters that are outside the current
> character set?  Display them as \M-* values?  A zsh without multibyte
> support displays the name as hmm-\M-C\M-$ when being listed by the
> completion system, but inserts the name into the command-line as literal
> characters.  Perhaps a multibyte-enabled zsh should edit these illegal
> characters on the command-line as 4-byte-wide \M-* values; and go back
> to displaying them in completion lists that way too?

There are two difficulties that I can see at the moment.

First, and more fundamentally, we don't have any way of representing
invalid characters now that zle uses wchar_t internally (and that's not
going to change since it works very well).  If we can't convert a
multibyte character to a wchar_t we therefore can't do anything with it.
We would need to add a flag for each character position to indicate that
it contained, say, a byte-wide chunk of a character that we couldn't
convert.  That's a little hairy and messes up any attempt to convert a
complete wide string in one go.

Alternatively, we could convert the characters to a \M- or other
representation on input, but it doesn't help much since we still need
somehow to mark that a group of characters needs converting back to a
byte on output.

The big difference from the old single-byte code is that in that case we
knew that every byte could be treated in that fashion.  It's mixing the
two that's the difficulty.

Second, and less difficult, it's quite a big change to have characters
in the command line displayed differently from the way they naturally
output.  No doubt it could be done, perhaps by an extra stage of mapping
in zrefresh().  It would be quite helpful to have them with some
terminal effect, too.

Once that were done, it wouldn't be too hard to have different sorts of
mapping, so you could pick between a \M- representation and a hex code.

Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page still at http://www.pwstephenson.fsnet.co.uk/

Messages sorted by: Reverse Date, Date, Thread, Author