Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [PATCH] [[:blank:]] only matches on SPC and TAB



2018-05-14 14:50:56 +0100, Peter Stephenson:
[...]
> It wouldn't be ridiculous to change the documentation for this case and
> require "unsetopt multibyte" for strict byte-by-byte comparisions, which
> is already how it works in the vast majority of other cases.
[...]

But note that here it's not about multibyte vs singlebyte but
whether [:blank:] honours the locale like the other POSIX
character classes (alpha, punct...) do.

There are locales on some systems (like NetBSD already
mentioned) that use a single-byte charset where more than SPC
and TAB are classified as "blank" (like 0xA0 (nbsp) in locales
using iso8859-x charsets or 0x9A in KOI8-R on NetBSD).

IMO, without the "multibyte" option, we should still call
isblank() which on most systems and most locales will match only
on SPC and TAB but is not guaranteed to (and does not in
practice like on NetBSD).

I just noticed that on NetBSD, in locales using UTF-8 or
GB18030, isblank() returns true on \v (vertical TAB), not in any
other locale! So does iswblank(). So out goes my claim that
"blank" should be for horizontal spaces. On OpenBSD (where only
UTF-8 charsets are supported in locales other than C/POSIX),
iswblank() matches on \v and \f. 

What a mess!

-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author