Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

pcre_match() option "-n" broken under zsh 5.0.6



zsh 5.0.6 appears to have broken pcre_match() with respect to option "-n".
As a minimal length example, running a slightly embellished variant of the
"man zshmodules" example yields an infinite loop: e.g.,

    % string="The following zip codes: 78884 90210 99513"
    % pcre_compile -m "\d{5}"
    % accum=()
    % pcre_match -b -- $string
    % while [[ $? -eq 0 ]] do
    .     print "match: $MATCH; ZPCRE_OP: $ZPCRE_OP"
    .     b=($=ZPCRE_OP)
    .     accum+=$MATCH
    .     pcre_match -b -n $b[2] -- $string
    % done
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
    match: 78884; ZPCRE_OP: 25 30
                    .
                    .
                    .

The new behaviour seems rather... unproductive.

Under zsh 5.0.5, the same example successfully terminates with standard
output:

    match: 78884; ZPCRE_OP: 25 30
    match: 90210; ZPCRE_OP: 31 36
    match: 99513; ZPCRE_OP: 37 42

A unit test might prove helpful here.

While on the topic, it might also be helpful to note that the manpage
documentation for pcre_match() is rather incorrect. It reads:

    For example, a ZPCRE_OP set to "32 45" indicates that the matched
portion began on byte offset 32 and ended on byte offset 44. Here, byte
offset position 45 is the position directly after the matched portion.

But that isn't the case. The first word of ZPCRE_OP is the offset of the
byte preceding the first byte of the matched substring, while the second
word of ZPCRE_OP is the offset of the last byte of the matched substring --
the diametric opposite. Hence, such documentation should read:

    For example, a ZPCRE_OP set to "32 45" indicates that the matched
portion began on byte offset 33 and ended on byte offset 45. Here, byte
offset position 32 is the position directly before the matched portion.

Given that, one would assume line "pcre_match -b -n $b[2] -- $string" of
the manpage example to also be incorrect. Specifically, since "$b[2]" is
the offset of the last byte of the prior match, passing such offset to
option "-n" should force pcre_match() to begin searching one byte earlier
than intended.

But that isn't the case. pcre_match() searches correctly, as verifiable by
replacing "\d{5}" by "\d{2}" in such example. This implies option "-n" to
begin searching at the byte following the passed byte offset (rather than
at such offset), implying such option to also be incorrectly documented. It
reads:

    A -n option starts searching for a match from the byte offset position
in string.

Correcting for clarity and grammar, that should read:

    If the -n option is given, a match will be searched for starting at the
byte following the passed byte offset in the string.

In any case, thanks all for the continued grit, fortitude, and hard shell
work.

Humbly yours,
Cecil


Messages sorted by: Reverse Date, Date, Thread, Author