Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: UNICODE Private Use Area characters in BUFFER



> 2022/10/24 2:29, Roman Perepelitsa <roman.perepelitsa@xxxxxxxxx> wrote:
> 
> You are right, iswprint(0xE0B0) returns 0.
> 
> I'm compiling zsh with --enable-unicode9, so instead of iswprint() it
> goes into u9_iswprint(). This function explicitly handles this case
> and returns 0, just like iswprint(). So we get this:
> 
>    WCWIDTH(0xE0B0) => 1
>    WC_ISPRINT(0xE0B0) => 0

I think iswprint(0xe0b0) (or WC_ISWPRINT()) returns 1 (in UTF-8 locale).
The reason that it doesn't work in Zle seems to be in Zle/zle_refresh.c:

1328 #ifdef MULTIBYTE_SUPPORT                                              
1329         else if (                                            
1330 #ifdef __STDC_ISO_10646__                                              
1331                  !ZSH_INVALID_WCHAR_TEST(*t) &&                        
1332 #endif                                                           
1333                  WC_ISPRINT(*t) && (width = WCWIDTH(*t)) > 0) {

__STDC_ISO_10646__ is defined in (probably all) Linux (but not in macOS),
and ZSH_INVALID_WCHAR_TEST() is defined in Zle/zle.h:

512 /* The start of the private range we use, for 256 characters */
513 #define ZSH_INVALID_WCHAR_BASE  (0xe000U) 
514 /* Detect a wide character within our range */       
515 #define ZSH_INVALID_WCHAR_TEST(x)                       \
516     ((unsigned)(x) >= ZSH_INVALID_WCHAR_BASE &&         \  
517      (unsigned)(x) <= (ZSH_INVALID_WCHAR_BASE + 255u))   

ZSH_INVALID_WCHAR_TEST() returns true for the wide character wc in the
range 0xe000 <= wc <= 0xe0ff. It seems zsh assume that this range
is not used by users and use it for representing "invalid" (or incomplete)
characters (see line 452 in Zle/zle_utils.c).

If characters in this range need be output as is, then we need some
options or such to disable this feature.

On macOS __STDC_ISO_10646__ is not defined (I think this is a bug of
macOS), and the character U+e0b0 is output as is. But on standard
macOS there is no font that has a glyph for this character, and
it is rendered as "a square with ? inside" (double width).
If you install a font that has a gliph for this character, and if the
gliph is single width, then I guess it will work OK in Zle.




Messages sorted by: Reverse Date, Date, Thread, Author