Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: =~ doesn't work with NUL characters



On 2017-06-13 at 11:02 +0100, Stephane Chazelas wrote:
> [[ $'a\0b' =~ 'a$' ]]
> 
> returns true both with and without rematchpcre

Let's break this down, non-PCRE and PCRE, and consider appropriate
behaviour for each separately.

Without rematchpcre, this is ERE per POSIX APIs, which don't portably
support size-supplied strings, relying instead upon C-string
null-termination.

Current macOS has regnexec() but this is not in the system regexp
library I see on Ubuntu Trusty or FreeBSD 10.3.  It appears to be an
extension from when they switched to the TRE implementation in macOS
10.8.  <https://laurikari.net/tre/>

Trying to support this would result in variations in behaviour across
systems in a way which I think might be undesirable.  The whole point of
adding the non-PCRE implementation was to match Bash behaviour by
default, and Bash does the same thing.

So for non-PCRE, I think this current behaviour is the only sane choice.

For PCRE, I'm inclined to agree that we should be able to portably
supply the length and there would not be any cross-platform behavioural
variances.  I think it's also reasonable that PCRE matching could
diverge from ERE matching even more.  Others might disagree?

We've "always" used strlen here; the most recent change was to handle
meta/unmeta (by me), but the strlen usage has been present since the
pcre module was introduced in commit bff61cf9e1 in 2001.

Thus: do we want to change behaviour, after 16 years, to allow embedded
NUL for the PCRE case, being different from the ERE case?

There's enough room for disagreement here that I'm not rushing to write
a patch, but instead deferring to those with commit-bit.  My personal
inclination is to handle NULL in the PCRE case.  It should just be a
case of passing an int* instead of NULL as the second parameter to
unmetafy().

-Phil



Messages sorted by: Reverse Date, Date, Thread, Author