Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

bug report : printf %.1s outputting more than 1 character



I'm using the macOS 13.2.1 OS-provided zsh, version 5.8.1, which I understand isn't the latest and greatest of 5.9, so perhaps this bug has already been addressed.

In the 4-byte sequence as seen below ( defined via explicit octal codes ), under no Unicode scenario should 4 bytes be printed out via a command of printf %.1s, by design. 

 - The first byte of \377 \xFF is explicitly invalid under UTF-8 (even allowing up to 7-byte in the oldest of definitions). 
 - The 4-byte value is too large to constitute a single character under either endian of UTF-32. 
 - It's also not a pair of beyond-BMP UTF-16 surrogates either, regardless of endian

At best, if treated as UTF-16, of either endian, this 4-byte sequence represents 2 code points, in which case, only 2 bytes should be printed not 4.

My high-level understanding of printf %.1s is that it should output the first locale-valid character of the input string, and in its absence, output the first byte instead, if any, so setting LC_ALL=C or POSIX would defeat the purpose of this bug report.

The reproducible sample shell command below includes what the output from zsh built-in printf looks like, what the macOS built-in printf looks like, and what the gnu printf looks like, all else being equal. The testing shell was invoked via

invoked via

    zsh --restricted --no-rcs --nologin --verbose -xtrace -f -c

In all 3 test scenarios, LC_ALL is explicitly cleared, while LANG is explicitly set to a widely used one. 

The od used is the macOS one, not the gnu one.

To my best knowledge, the other printfs have produced the correct output.

Thanks for your time.

====================================================================

echo; echo "$ZSH_VERSION"; echo; uname -a; echo; LC_ALL= LANG="en_US.UTF-8" builtin printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo; LC_ALL= LANG="en_US.UTF-8" command printf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo; LC_ALL= LANG="en_US.UTF-8" gprintf '\n\n\t[%.1s]\n\n' $'\377\210\234\256' | od -bacx ;  echo;
+zsh:1> echo

+zsh:1> echo 5.8.1
5.8.1
+zsh:1> echo

+zsh:1> uname -a
Darwin m1mx4CT 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64
+zsh:1> echo

+zsh:1> LC_ALL='' LANG=en_US.UTF-8 +zsh:1> printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'
+zsh:1> od -bacx
0000000   012 012 011 133 377 210 234 256 135 012 012
          nl  nl  ht   [   ?  88  9c   ?   ]  nl  nl
          \n  \n  \t   [ 377 210 234 256   ]  \n  \n
             0a0a    5b09    88ff    ae9c    0a5d    000a
0000013
+zsh:1> echo

+zsh:1> LC_ALL='' LANG=en_US.UTF-8 printf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'
+zsh:1> od -bacx
0000000   012 012 011 133 377 135 012 012
          nl  nl  ht   [   ?   ]  nl  nl
          \n  \n  \t   [ 377   ]  \n  \n
             0a0a    5b09    5dff    0a0a
0000010
+zsh:1> echo

+zsh:1> LC_ALL='' LANG=en_US.UTF-8 gprintf '\n\n\t[%.1s]\n\n' $'\M-\C-?\M-\C-H\M-\C-\\M-.'
+zsh:1> od -bacx
0000000   012 012 011 133 377 135 012 012
          nl  nl  ht   [   ?   ]  nl  nl
          \n  \n  \t   [ 377   ]  \n  \n
             0a0a    5b09    5dff    0a0a
0000010
+zsh:1> echo

zsh 5.8.1 (x86_64-apple-darwin22.0)



Messages sorted by: Reverse Date, Date, Thread, Author