Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Fw: Phil's prompt is not working when LANG is set to UTF-8



On Fri, 15 Feb 2008 22:55:58 +0300
Andrey Borzenkov <arvidjaar@xxxxxxxxxx> wrote:
> On Friday 15 February 2008, Andrey Borzenkov wrote:  
> > The actual prompt lengths are (see screenshot)
> > 
> > lpromptw = 13
> > rptomptw = 16 (it has one space in it)
> > 
> > this perfectly correspnds to something (zsh?) ignoring invalid characters
> > with high bit set.  
> 
> For sure.
> 
> Src/prompt.c:countprompt()
> 
>             case MB_INVALID:
>                 memset(&mbs, 0, sizeof mbs);
>                 /* FALL THROUGH */
>             case 0:
>                 /* Invalid character or null: assume no output. */
>                 multi = 0;
>                 break;
> 
> Oops.
> 
> I do not actually see how can we fix it except introducing prompt
> expansion syntax for ACS (or may be for any terminfo sequence in general)
> and simply assuming characters in any of them are of width 1.  

Thanks for looking.  I think I've now roughly caught up; tell me if I'm
mistaken.

- Both terminal and shell start correctly in UTF-8 mode.
- However, Phil's prompt (http://aperiodic.net/phil/prompt/) uses
  the Alternative Character Set by appropriate terminfo trickery.
- The ACS is an old-fashioned grungy VT100 thing from the days
  when nobody had heard of multibyte character sets.
- Hence it falls foul of the multibyte tests.  In principle it
  might clash with a UTF-8 character anyway and have the wrong
  width, so assuming a width 1 for an unknown character is not
  necessarily better than assuming width 0.
- Anyway, assumptions are best avoided if possible.
- Nobody is worrying about editing the ACS, only using it in prompts,
  so a prompt-specific fix is fine.  (Editing with ACS would be
  stupid since the glyphs on the screen wouldn't actually reflect what
  the bytes meant to any programme to which they got fed, right?)

How about the following tweak to prompts to support this?  The upshot is
that you include any funny characters in %{...%G%} where the %G for
`glitch' (which may be repeated or take a numeric argument) indicates a
screen cell taken up by the sequence.  I like this because it uses
facilities that have been present in the shell for a long time and hence
was trivial to implement and might work.

I played with this in simple cases, but would anybody like to confirm
this works in the cases that matter (and maybe produce an updated Phil's
Prompt)?  To put it another way:  I am happy to support this fix but
have no interest in doing anything with it myself.

I think this is clean and useful enough that I will commit it anyway.

Index: Doc/Zsh/prompt.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/prompt.yo,v
retrieving revision 1.9
diff -u -r1.9 prompt.yo
--- Doc/Zsh/prompt.yo	29 Jan 2008 17:51:02 -0000	1.9
+++ Doc/Zsh/prompt.yo	15 Feb 2008 23:34:06 -0000
@@ -188,6 +188,18 @@
 The string within the braces should not change the cursor
 position.  Brace pairs can nest.
 )
+item(tt(%G))(
+Within a tt(%{)...tt(%}) sequence, include a `glitch': that is, assume
+that a single character width will be output.  This is useful when
+outputting characters that otherwise cannot be correctly handled by the
+shell, such as the alternate character set on some terminals.
+The characters in question can be included within a tt(%{)...tt(%})
+sequence together with the appropriate number of tt(%G) sequences to
+indicate the correct width.  An integer between the `tt(%)' and `tt(G)'
+indicates a character width other than one.  Hence tt(%{)var(seq)tt(%2G%})
+outputs var(seq) and assumes it takes up the width of two standard
+characters.
+)
 enditem()
 
 sect(Conditional Substrings in Prompts)
Index: Src/prompt.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/prompt.c,v
retrieving revision 1.44
diff -u -r1.44 prompt.c
--- Src/prompt.c	20 Nov 2007 09:55:10 -0000	1.44
+++ Src/prompt.c	15 Feb 2008 23:34:06 -0000
@@ -473,6 +473,16 @@
 		    *bp++ = Inpar;
 		}
 		break;
+	    case 'G':
+		if (arg > 0) {
+		    addbufspc(arg);
+		    while (arg--)
+			*bp++ = Nularg;
+		} else {
+		    addbufspc(1);
+		    *bp++ = Nularg;
+		}
+		break;
 	    case /*{*/ '}':
 		if (trunccount && trunccount >= dontcount)
 		    return *fm;


-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/



Messages sorted by: Reverse Date, Date, Thread, Author