Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: file globbing

2021-08-05 09:05:19 -0700, Ray Andrews:
> On 2021-08-05 8:36 a.m., Peter Stephenson wrote:
> > > On 05 August 2021 at 16:27 Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> > > 
> > d,1^[[:digit:]]*
> > 
> > 
> Cool.  I thought zsh always used ' [:digit:]' not the double bracket form. 

I would avoid [[:digit:]] in zsh globs / patterns especially for
input validation.

[:digit:] within a bracket expression is a POSIX character
class, it is a POSIX invention. It would be recognised, but
within bracket expressions only by anything specified by POSIX
and that uses shell filename patters or regular expressions
(basic or extended) such as sh (for globs or case constructs),
find (for -name/-path matching) for grep/sed/ed...

[X[:digit:]] would match on any character that is either X or
any character classified as decimal digit in the locale.

What that matches in practice depends on the system and locale. 
In 2016, someone pointed out to POSIX that isdigit() in the C
standard was not locale dependent and matched on 0123456789 only
(https://www.austingroupbugs.net/view.php?id=1078), so, to align
with that future versions of the standard will restrict
[:digit:] to match on 0123456789 only and will forbid to match
on any other decimal digits. I wouldn't be surprised if that's
later reverted again though as it's quite unintuitive /

Still, there are systems where iswdigit() matches on a lot more
than 0123456789 in some locales, and as a consequence, the
[[:digit:]] of zsh globs and most other tools will too. For
instance, on FreeBSD 12.2 and in a en_US.UTF-8 locale, [[:digit:]]
matches on

All decimal digits, some variations on the 0123456789 Arabic
ones, and some other decimal digits in some other scripts.

[0-9] itself, in general, is even worse. Not two systems or
utilities or library functions and version thereof agree on what
characters are ranked between 0 and 9. It could even match on
sequences of characters (collating elements).

That's not the case of zsh globs though where [0-9] only matches
on 0123456789, as ranges in zsh are based on the wide char value
of the characters (or byte value if the multibyte option is
off), and for those 0123456789 characters specifically, in
practice, the wide char values are consecutive and in that order
regardless of the locale and system.

Beware though that it only applies to zsh globs. It doesn't
apply to [0-9] in regexps which use the system's extended
regexps matching functions (or pcre with the rematchpcre
option; see also \d there).

The only thing guaranteed to match only 0123456789 regardless of
locale and system is [0123456789], do not use [[:digit:]] or \d
for that. In zsh, you can use [0-9] but only with globs.

[[ $d = [0-9] ]] && echo is one of 0123456789

is correct (in zsh, not in bash / ksh93)

[[ $d =~ '^[0-9]$' && echo is one of 0123456789

is not (at least on some systems/locales).

With set -o rematchpcre

[[ $d =~ '^[0-9]\Z' && echo is one of 0123456789

should be OK (so would the same with \d, though I wouldn't trust
it as it could vary with the version and what flags are passed
to the matcher as \d can be told to match other digits under
some circumstances).

Also beware re matching doesn't work properly on non-text.

See also
https://www.mail-archive.com/bug-bash@xxxxxxx/msg25885.html for
a glimpse at the (more messier) situation in the bash shell.


Messages sorted by: Reverse Date, Date, Thread, Author