Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: please consider using PCRE_DOLLAR_ENDONLY (and PCRE_DOTALL) for rematchpcre



2018-01-22 00:28:29 -0500, Phil Pennock:
[...]
> Changing the default behavior of valid semantics risks hard-to-debug
> breakage of existing scripts and I am erring on the side of being
> against this change.  It's not hard opposition, but I'd like to see
> stronger justification before risking breaking changes.
> 
> I know that I myself have scripts which rely upon PCRE matching against
> multiline data behaving as per the defaults of pcrepattern(3).
> 
> In addition, while the DOTALL change can be turned off in-regex, the
> dollar-endonly one can't, AFAIK, so that becomes a breaking change which
> can't be worked around.
[...]

dollar-endonly is not really about multiline

[[ $'a\nb' =~ 'a$' ]]

will not match with or without it and

[[ $'a\nb' =~ '(?m)a$' ]]

will match with or without it.

It's more about single-line where the line delimiter happens to
be included (and you want the $ to match on the end of that line
as opposed to the end of the string).

$ matches before a trailing newline in a string in perl because
of how its <> operator works. perl is a text processing utility,
its regexps are primarily matched against single lines where the
newline is included (contrary to traditional text processing
utilities like sed/grep/awk where the record separator is not
included).

In:

    perl -pe 's/.$//'

(which calls <>).

you want to remove the last character of the line, not the
newline character.

That $ behaviour makes a lot of sense there. Even if you use:

   perl -lpe 's/.$//'

where that -l causes the delimiter to be removed on input and
added back on output like in sed/awk, that behaviour doesn't
harm because the record does *not* contain any newline
delimiter.

But zsh is not a text processing utility, and its "read" builtin
(the closest equivalent to perl's <>) does not include the
delimiter. It's actually hard to have a trailing newline when
processing text in shells given that $(...) strips them..

On the other hand, having

[[ $file =~ '\.txt$' ]]

match on files that don't end in .txt is a concern (and in my
experience, file names (as opposed to text lines with
delimiters) is the kind of thing I deal most often with in zsh).

And again, note that it only happens with pcrematch, it works as
expected with EREs.


-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author