Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Another idea on how to insert illegal multibyte characters



On Thu, Jan 12, 2006 at 09:23:19AM +0000, Peter Stephenson wrote:
> The completion system is a bit more quoting aware: it knows whether or
> not it needs to insert a backslash before special characters because of
> quotes earlier on the line.  Ideally it should handle unprintable
> characters at the same point where it tries to do that.  That doesn't
> need to be done at the same time, though.  (I would hope it could be
> done independently and prevent the equivalent code inside zle kicking
> in.)

The attached patch is an alternative to my older patch that changed
stringaszleline().  This one changes add_match_data(), which means that
it is happening early enough that zsh could be made to figure out how
to insert the $'\123' sequences into single- or double-quoted strings
(though it does not yet do this).  This patch also fixes the updating
glitch that I mentioned my last patch had.

I think this would be good enough to include in the next release.  It
would at least make the completion of filenames with invalid charset
sequences possible, which is better than the current truncating.
Thoughts?

One caveat about my renaming of "sl" to "stl":  add_match_data() had two
variables with the same name (one more deeply nested), so I changed the
outer one (which holds the length of "str") to be "stl".

..wayne..
--- Src/Zle/compcore.c	15 Nov 2005 08:44:18 -0000	1.78
+++ Src/Zle/compcore.c	11 Feb 2006 09:44:45 -0000
@@ -2227,10 +2227,15 @@ add_match_data(int alt, char *str, char 
 	       char *psuf, Cline sline,
 	       char *suf, int flags, int exact)
 {
+#ifdef MULTIBYTE_SUPPORT
+    mbstate_t mbs;
+    char *t, *f, *new_str = NULL;
+    int fl, eol = 0;
+#endif
     Cmatch cm;
     Aminfo ai = (alt ? fainfo : ainfo);
     int palen, salen, qipl, ipl, pl, ppl, qisl, isl, psl;
-    int sl, lpl, lsl, ml;
+    int stl, lpl, lsl, ml;
 
     palen = salen = qipl = ipl = pl = ppl = qisl = isl = psl = 0;
 
@@ -2445,6 +2450,59 @@ add_match_data(int alt, char *str, char 
 	    line = p;
 	}
     }
+
+    stl = strlen(str);
+#ifdef MULTIBYTE_SUPPORT
+    /* If "str" contains a character that won't convert into a wide
+     * character, change it into a $'\123' sequence. */
+    memset(&mbs, '\0', sizeof mbs);
+    for (t = f = str, fl = stl; fl > 0; ) {
+	wchar_t wc;
+	size_t cnt = eol ? MB_INVALID : mbrtowc(&wc, f, fl, &mbs);
+	switch (cnt) {
+	case MB_INCOMPLETE:
+	    eol = 1;
+	    /* FALL THROUGH */
+	case MB_INVALID:
+	    /* Get mbs out of its undefined state. */
+	    memset(&mbs, '\0', sizeof mbs);
+	    if (!new_str) {
+		/* Be very pessimistic about how much space we'll need. */
+		new_str = zhalloc(stl*7 + 1);
+		memcpy(new_str, str, t - str);
+		t = new_str + (t - str);
+	    }
+	    *t++ = '$';
+	    *t++ = '\'';
+	    *t++ = '\\';
+	    *t++ = '0' + ((STOUC(*f) >> 6) & 7);
+	    *t++ = '0' + ((STOUC(*f) >> 3) & 7);
+	    *t++ = '0' + (STOUC(*f) & 7);
+	    *t++ = '\'';
+	    f++;
+	    fl--;
+	    break;
+	case 0:
+	    /* Converting '\0' returns 0, but a '\0' is a real
+	     * character for us, so we should consume 1 byte
+	     * (certainly true for Unicode and unlikely to be false
+	     * in any non-pathological multibyte representation). */
+	    cnt = 1;
+	    /* FALL THROUGH */
+	default:
+	    fl -= cnt;
+	    while (cnt--)
+		*t++ = *f++;
+	    break;
+	}
+    }
+    if (new_str) {
+	*t = '\0';
+	str = new_str;
+	stl = strlen(str);
+    }
+#endif
+
     /* Allocate and fill the match structure. */
     cm = (Cmatch) zhalloc(sizeof(struct cmatch));
     cm->str = str;
@@ -2539,10 +2597,9 @@ add_match_data(int alt, char *str, char 
     if (!ai->firstm)
 	ai->firstm = cm;
 
-    sl = strlen(str);
     lpl = (cm->ppre ? strlen(cm->ppre) : 0);
     lsl = (cm->psuf ? strlen(cm->psuf) : 0);
-    ml = sl + lpl + lsl;
+    ml = stl + lpl + lsl;
 
     if (ml < minmlen)
 	minmlen = ml;
@@ -2566,7 +2623,7 @@ add_match_data(int alt, char *str, char 
 		    e += lpl;
 		}
 		strcpy(e, str);
-		e += sl;
+		e += stl;
 		if (cm->psuf)
 		    strcpy(e, cm->psuf);
 		comp_setunset(0, 0, CP_EXACTSTR, 0);


Messages sorted by: Reverse Date, Date, Thread, Author