Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: In POSIX mode, ${#var} measures length in bytes, not characters



ZyX schreef op 07-06-15 om 02:34:
> Do you have a reference where “character” is defined?

Yes:
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap06.html#tag_06_02

POSIX specifically allows any character encoding, including multibyte
characters, depending on the user's locale, and on the condition that
the portable character set (basically US-ASCII) is a subset of the
locale's character set.

With UTF-8 now the de facto standard locale and it including multibyte
characters, it's become important for shells to get this right.

> This behaviour is the same in posh and dash:

Yes, dash and pdksh/mksh/posh unfortunately have this bug, too.

But bash, ksh93, and yash correctly measure characters, not bytes. (yash
is supposed to be the most POSIX-compliant of them all.)

Thanks,

- Martijn



Messages sorted by: Reverse Date, Date, Thread, Author