Zsh Mailing List Archive
Messages sorted by:
Re: =~ doesn't work with NUL characters
- X-seq: zsh-workers 41293
- From: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>
- To: Zsh hackers list <zsh-workers@xxxxxxx>
- Subject: Re: =~ doesn't work with NUL characters
- Date: Wed, 14 Jun 2017 16:49:38 -0400
- Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201705; h=In-Reply-To:Content-Type:MIME-Version:References :Message-ID:Subject:To:From:Date:Sender:Reply-To:Cc:Content-Transfer-Encoding :Content-ID:Content-Description:Resent-Date:Resent-From:Resent-Sender: Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=KxM0g6Ublu9T5GBOxVUhbVeiy9XroQR0QLd653CeEZQ=; b=CS8WBr77cgSegk+ZgAOXezIaj7 oJNUizCteQTRckOy5RqjSEXO5Kg3+fQ538e+BB0OjtnoLMDdZRWxZ7DmxsM1z0/lTcwWHL2iHERgu I7biPCTdZGQHslwW7WkO5foxFa173cvwZEQki4ZdVOBu6Hs6Cm0D5Bk1dYYFcel15Si8a8VV3gcxI oSYQPID8chkjHnmtMBLA9vQqlAsu;
- In-reply-to: <20170613100217.GA9529@chaz.gmail.com>
- List-help: <mailto:firstname.lastname@example.org>
- List-id: Zsh Workers List <zsh-workers.zsh.org>
- List-post: <mailto:email@example.com>
- Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
- Openpgp: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc
- References: <20170613100217.GA9529@chaz.gmail.com>
On 2017-06-13 at 11:02 +0100, Stephane Chazelas wrote:
> [[ $'a\0b' =~ 'a$' ]]
> returns true both with and without rematchpcre
Let's break this down, non-PCRE and PCRE, and consider appropriate
behaviour for each separately.
Without rematchpcre, this is ERE per POSIX APIs, which don't portably
support size-supplied strings, relying instead upon C-string
Current macOS has regnexec() but this is not in the system regexp
library I see on Ubuntu Trusty or FreeBSD 10.3. It appears to be an
extension from when they switched to the TRE implementation in macOS
Trying to support this would result in variations in behaviour across
systems in a way which I think might be undesirable. The whole point of
adding the non-PCRE implementation was to match Bash behaviour by
default, and Bash does the same thing.
So for non-PCRE, I think this current behaviour is the only sane choice.
For PCRE, I'm inclined to agree that we should be able to portably
supply the length and there would not be any cross-platform behavioural
variances. I think it's also reasonable that PCRE matching could
diverge from ERE matching even more. Others might disagree?
We've "always" used strlen here; the most recent change was to handle
meta/unmeta (by me), but the strlen usage has been present since the
pcre module was introduced in commit bff61cf9e1 in 2001.
Thus: do we want to change behaviour, after 16 years, to allow embedded
NUL for the PCRE case, being different from the ERE case?
There's enough room for disagreement here that I'm not rushing to write
a patch, but instead deferring to those with commit-bit. My personal
inclination is to handle NULL in the PCRE case. It should just be a
case of passing an int* instead of NULL as the second parameter to
Messages sorted by: