Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: sh emulation POSIX non-conformances (printf %10s and bytes vs character)



2021-04-13 15:57:44 +0000, Daniel Shahaf:
> Stephane Chazelas wrote on Sun, Apr 11, 2021 at 20:42:05 +0100:
> > Another POSIX bug fixed by zsh (but which makes it non-compliant):
> > 
> > With multibyte characters:
> > 
> > $ printf '|%10s|\n' Stéphane Chazelas
> > |  Stéphane|
> > |  Chazelas|
> > 
> > POSIX requires:
> > 
> > | Stéphane|
> > |  Chazelas|
> > 
> > (with a UTF-8 é encoded one 2 bytes
> 
> Note that e-with-acute has two encodings in Unicode:
> 
> é, one codepoint, two UTF-8 bytes
> é, two codepoints, three UTF-8 bytes
> 
> https://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms

That was shown already in the part of my message you didn't
quote, where I pointed out how ksh93 addresses it with its %Ls
(zsh also has ${(ml[10])var} for that though).

See also:

https://unix.stackexchange.com/questions/350240/why-is-printf-shrinking-umlaut

Cheers,
Stephane




Messages sorted by: Reverse Date, Date, Thread, Author