Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Unicode problem



On Mon, 21 Jan 2008 10:16:49 -0800
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> In the line editor I'm not so sure.  Treating it like a non-printable
> character seems like a good first step.

OK, here is a first step.  It turns out we haven't done very well with
unprintable wide characters anyway:  the only special handing is for
control characters, and the code is the same as for ASCII control
characters, which doesn't really work.

So this covers any zero-width or unprintable characters not in the range 0
to 255 when multibyte support is enabled.  Note it uses the native wide
character type, not necessarily Unicode---I don't think it's appropriate at
this level to assume Unicode.  The character shows up as hex digits in
angle brackets.  Suggest improvements if you like, but it needs to be
short.

Play with this and see if it works:  you can use insert-unicode-char to
insert character 0xfeff.

A possible way forward for the future is that I'd quite like to add
functionality for highlighting parts of the command line after 4.3.5.  (To
be more accurate, I'd quite like someone else to add it, but I don't think
that's going to happen.)  Doing this within zle_refresh.c is the easy (or
easiest) bit.  Then the non-printable character could be reverse video,
which is clearer.

This obviously doesn't preclude adding combining character support but
that's not going to happen today.

Index: Src/Zle/zle_refresh.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/Zle/zle_refresh.c,v
retrieving revision 1.52
diff -u -r1.52 zle_refresh.c
--- Src/Zle/zle_refresh.c	8 Jan 2008 15:07:02 -0000	1.52
+++ Src/Zle/zle_refresh.c	22 Jan 2008 09:54:30 -0000
@@ -447,6 +447,10 @@
     int tmpalloced;		/* flag to free tmpline when finished        */
     int remetafy;		/* flag that zle line is metafied            */
     struct rparams rpms;
+#ifdef MULTIBYTE_SUPPORT
+    int width;                  /* width of wide character                   */
+#endif
+
     
     /* If this is called from listmatches() (indirectly via trashzle()), and *
      * that was called from the end of zrefresh(), then we don't need to do  *
@@ -633,8 +637,7 @@
 		while ((++t0) & 7);
 	}
 #ifdef MULTIBYTE_SUPPORT
-	else if (iswprint(*t)) {
-	    int width = wcwidth(*t);
+	else if (iswprint(*t) && (width = wcwidth(*t)) > 0) {
 	    if (width > rpms.sen - rpms.s) {
 		/*
 		 * Too wide to fit.  Insert spaces to end of current line.
@@ -649,7 +652,7 @@
 		    rpms.nvcs = rpms.s - nbuf[rpms.nvln = rpms.ln];
 		}
 	    }
-	    if (width > rpms.sen - rpms.s) {
+	    if (width > rpms.sen - rpms.s || width == 0) {
 		/*
 		 * The screen width is too small to fit even one
 		 * occurrence.
@@ -663,7 +666,11 @@
 	    }
 	}
 #endif
-	else if (ZC_icntrl(*t)) {	/* other control character */
+	else if (ZC_icntrl(*t)
+#ifdef MULTIBYTE_SUPPORT
+		 && (unsigned)*t <= 0xffU
+#endif
+	    ) {	/* other control character */
 	    *rpms.s++ = ZWC('^');
 	    if (rpms.s == rpms.sen) {
 		/* text wrapped */
@@ -671,9 +678,42 @@
 		    break;
 	    }
 	    *rpms.s++ = (((unsigned int)*t & ~0x80u) > 31) ? ZWC('?') : (*t | ZWC('@'));
-	} else {			/* normal character */
+	}
+#ifdef MULTIBYTE_SUPPORT
+	else {
+	    /*
+	     * Not printable or zero width.
+	     * Resort to hackery.
+	     */
+	    char dispchars[11];
+	    char *dispptr = dispchars;
+	    wchar_t wc;
+
+	    if ((unsigned)*t > 0xffffU) {
+		sprintf(dispchars, "<%.08x>", (unsigned)*t);
+	    } else {
+		sprintf(dispchars, "<%.04x>", (unsigned)*t);
+	    }
+	    while (*dispptr) {
+		if (mbtowc(&wc, dispptr, 1) == 1 /* paranoia */)
+		{
+		    *rpms.s++ = wc;
+		    if (rpms.s == rpms.sen) {
+			/* text wrapped */
+			if (nextline(&rpms, 1))
+			    break;
+		    }
+		}
+		dispptr++;
+	    }
+	    if (*dispptr) /* nextline said stop processing */
+		break;
+	}
+#else
+	else {			/* normal character */
 	    *rpms.s++ = *t;
 	}
+#endif
 	if (rpms.s == rpms.sen) {
 	    /* text wrapped */
 	    if (nextline(&rpms, 1))
-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070



Messages sorted by: Reverse Date, Date, Thread, Author