Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: problem in prompt in utf-8



"Zvi Har'El" wrote:
> I have started using zsh-4.3.0 from the CVS, in a uft-8 locale. I enjoy it
> very much. However, I have a problem with the prompting. This is not new, but
> since the completion now works nicely, I thought I'll mention it, since it is
> not solved yet.

> /home/rl$ cd אבגדהוזחטיךכלםמןנסעףפץצקרשת 
> 
> The next prompt had invalid utf-8 sequences:
> 
> 
> /home/rl/������������לםמןנסעףפץצקרשת$ 

[This message uses raw 8-bit UTF-8, as the original did; hope this
came through OK, since I hacked the headers by hand.  MH in Emacs is a
bit antiquated.  I'm only surprised my system managed to display Hebrew
characters OK...  It doesn't actually matter apart from the quoted text
above.]

There was an inconsistency when formatting a string that contained a
character in the range reserved for tokens: conversion to the zsh
internal form (metafication) wasn't done correctly.  This particular
problem wasn't actually within zle, it was in the main shell and (as you
sort of indicated) wasn't directly related to multibyte characters.

This should fix the immediate problem, but note that the width of the
prompt isn't calculated correctly yet: we don't scan prompts for
multibyte characters.  Hence you might see oddities with the display
since the shell doesn't know the position of the cursor after the
prompt.  This is another thing on the list of fixes needed in zle.  (It
should come under the "not rocket science" heading, unlike the
completion code, so I hope it will be fixed relatively soon.)

Please do report any more of these inconsistencies; users who regularly
encounter character sets other than latin-based ones are valuable for
this.

I hope I haven't caused any new problems... I think I caught all the
uses of nicechar() and made sure they expected metafied strings.  The
first hunk is tangential to the rest: on the way in, I noticed that the
variable pwd was metafied and so needed to be unmetafied on output.

Index: Src/builtin.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/builtin.c,v
retrieving revision 1.148
diff -u -r1.148 builtin.c
--- Src/builtin.c	9 Sep 2005 16:06:48 -0000	1.148
+++ Src/builtin.c	17 Sep 2005 18:09:24 -0000
@@ -699,7 +699,7 @@
 	else
 	    fmt = " ";
 	if (OPT_ISSET(ops,'l'))
-	    fputs(pwd, stdout);
+	    zputs(pwd, stdout);
 	else
 	    fprintdir(pwd, stdout);
 	for (node = firstnode(dirstack); node; incnode(node)) {
Index: Src/utils.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/utils.c,v
retrieving revision 1.89
diff -u -r1.89 utils.c
--- Src/utils.c	9 Sep 2005 20:34:42 -0000	1.89
+++ Src/utils.c	17 Sep 2005 18:10:08 -0000
@@ -146,7 +146,7 @@
 		putc('%', stderr);
 		break;
 	    case 'c':
-		fputs(nicechar(num), stderr);
+		zputs(nicechar(num), stderr);
 		break;
 	    case 'e':
 		/* print the corresponding message for this errno */
@@ -195,15 +195,21 @@
     return 0;
 }
 
-/* Turn a character into a visible representation thereof.  The visible *
- * string is put together in a static buffer, and this function returns *
- * a pointer to it.  Printable characters stand for themselves, DEL is  *
- * represented as "^?", newline and tab are represented as "\n" and     *
- * "\t", and normal control characters are represented in "^C" form.    *
- * Characters with bit 7 set, if unprintable, are represented as "\M-"  *
- * followed by the visible representation of the character with bit 7   *
- * stripped off.  Tokens are interpreted, rather than being treated as  *
- * literal characters.                                                  */
+/*
+ * Turn a character into a visible representation thereof.  The visible
+ * string is put together in a static buffer, and this function returns
+ * a pointer to it.  Printable characters stand for themselves, DEL is
+ * represented as "^?", newline and tab are represented as "\n" and
+ * "\t", and normal control characters are represented in "^C" form.
+ * Characters with bit 7 set, if unprintable, are represented as "\M-"
+ * followed by the visible representation of the character with bit 7
+ * stripped off.  Tokens are interpreted, rather than being treated as
+ * literal characters.
+ *
+ * Note that the returned string is metafied, so that it must be
+ * treated like any other zsh internal string (and not, for example,
+ * output directly).
+ */
 
 /**/
 mod_export char *
@@ -238,7 +244,17 @@
 	c += 0x40;
     }
     done:
-    *s++ = c;
+    /*
+     * The resulting string is still metafied, so check if
+     * we are returning a character in the range that needs metafication.
+     * This can't happen if the character is printed "nicely", so
+     * this results in a maximum of two bytes total (plus the null).
+     */
+    if (itok(c)) {
+	*s++ = Meta;
+	*s++ = c ^ 32;
+    } else
+	*s++ = c;
     *s = 0;
     return buf;
 }
@@ -292,7 +308,7 @@
 nicefputs(char *s, FILE *f)
 {
     for (; *s; s++)
-	fputs(nicechar(STOUC(*s)), f);
+	zputs(nicechar(STOUC(*s)), f);
 }
 #endif
 
@@ -3177,7 +3193,7 @@
 static char *
 nicedup(char const *s, int heap)
 {
-    int c, len = strlen(s) * 5;
+    int c, len = strlen(s) * 5 + 1;
     VARARR(char, buf, len);
     char *p = buf, *n;
 
@@ -3190,11 +3206,13 @@
 	}
 	if (c == Meta)
 	    c = *s++ ^ 32;
+	/* The result here is metafied */
 	n = nicechar(c);
 	while(*n)
 	    *p++ = *n++;
     }
-    return metafy(buf, p - buf, (heap ? META_HEAPDUP : META_DUP));
+    *p = '\0';
+    return heap ? dupstring(buf) : ztrdup(buf);
 }
 
 /**/
@@ -3228,7 +3246,7 @@
 	}
 	if (c == Meta)
 	    c = *s++ ^ 32;
-	if(fputs(nicechar(c), stream) < 0)
+	if(zputs(nicechar(c), stream) < 0)
 	    return EOF;
     }
     return 0;

-- 
Peter Stephenson <pws@xxxxxxxxxxxxxxxxxxxxxxxx>
Work: pws@xxxxxxx
Web: http://www.pwstephenson.fsnet.co.uk



Messages sorted by: Reverse Date, Date, Thread, Author