Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: zsh/complist colours improperly handle multibyte characters



On Sun, 23 Oct 2016 12:34:16 -0700
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Oct 23,  7:59pm, Peter Stephenson wrote:
> } Subject: Re: zsh/complist colours improperly handle multibyte characters
> }
> } On Sun, 23 Oct 2016 10:56:52 -0700
> } Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> } > No, sorry, this is a UTF-8 full-line-height vertical-bar, not ascii pipe.
> } > It's incorrectly interpreted as a left angle bracket pattern character,
> } > if that BUG message is accurate.
> } 
> } Ah, then there's a good chance this is indeed a problem with
> } zshtokenize.  We probably ought at least to pass through metafied
> } characters.  I don't know that fits this particular case, but it's the
> } obvious problem.
> 
> Nope, the zshtokenize patch doesn't help in this case at all.  I still
> get the BUG: message.  Strangely (?) I do NOT get that message if I use
> the character directly in a pattern expression such as [[ ... ]], so
> it has something to do with the way compdescribe is passing it around.

For me this was working just with metafying the string in complist when
it goes to patcompile().  The patch I posted makes this a bit safer in
theory, though in fact I don't think we hit the problem in practice.

In the previous code, the input string is

* E2 94 82 *

That 94 looks like a token.  On tokenisation we get

87 E2 94 82 87

The * has become a token, but 94 still looks like a token because it's
not protected.  So the pattern compiler turns it back into the
corresponding string form, '<', when it gets an incomplete multibyte
pattern.  This makes the pattern look invalid, so it gives up.  Later you
get "<" as a token, which doesn't work as there's no numeric expression.

To fix this safely, we need first to metafy the input string,

* E2 83 B4 82 *

then tokenise it with the change I previously posted to skip Meta,

87 E2 83 B4 82 87

What the extra change is doing is making sure that 83 B4 goes through as
is --- a metafied character is by definition escaped from tokenisation.
However, because this only happens when bit 7 is set, and we'll never
tokenise such a character, I don't think it actually makes a
difference.  But I've left it in as it respects the intention.

pws

diff --git a/Src/Zle/complist.c b/Src/Zle/complist.c
index 39ac782..d4672a1 100644
--- a/Src/Zle/complist.c
+++ b/Src/Zle/complist.c
@@ -415,6 +415,7 @@ getcoldef(char *s)
 		break;
 	    *s++ = '\0';
 	}
+	p = metafy(p, strlen(p), META_USEHEAP);
 	tokenize(p);
 	if ((prog = patcompile(p, 0, NULL))) {
 	    Patcol pc, po;
diff --git a/Src/glob.c b/Src/glob.c
index a845c5f..50f6dce 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -3499,6 +3499,10 @@ zshtokenize(char *s, int flags)
     for (; *s; s++) {
       cont:
 	switch (*s) {
+	case Meta:
+	    /* skip both Meta and following character */
+	    s++;
+	    break;
 	case Bnull:
 	case Bnullkeep:
 	case '\\':



Messages sorted by: Reverse Date, Date, Thread, Author