Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [Bug] S-flag imposes non-greedy match where it shouldn't

On Fri, 27 Dec 2019 at 06:30, Daniel Shahaf <d.s@xxxxxxxxxxxxxxxxxx> wrote:
> Sebastian Gniazdowski wrote on Thu, Dec 26, 2019 at 19:35:05 +0100:
> > +++ b/Doc/Zsh/expn.yo
> > @@ -1399,6 +1399,20 @@ from the beginning and with tt(%) start from the end of the string.
> >  With substitution via tt(${)...tt(/)...tt(}) or
> >  tt(${)...tt(//)...tt(}), specifies non-greedy matching, i.e. that the
> >  shortest instead of the longest match should be replaced.
> > +The substring search means that the pattern is matched skipping the
> > +parts of the input string starting from the direction set by the use
> > +of tt(#) or tt(%).
> I don't understand this sentence.  What does "skipping" mean?

It means that parts of the string are being skipped when they don't
match when moving to the other end. Does the sentence need an update?

> > +For example, to match a pattern starting from the
> > +end, one could use:
> > +
> > +example(str="abcXXXdefXXXghi"
> > +out=${(S)str%%(#b)([^X])X##}
> > +out=$out${match[1]}
> > +)
> > +
> > +The result is tt(abcXXXdefghi).
> That's not correct.  The output is abcXXXdefXXXghi (in 'zsh -f') or
> abcXXXdeghif (with extendedglob set), but not abcXXXdefghi.

I've sent an updated patch half hour before your email. It contains
the correct example.

> I doubt this example would clarify the meaning of ${(S)} to people who
> encounter it for the first time.  Please use a more minimal example.
> Specific issues:
>   - (...) This is documentation, not
>   a homework problem; the answer should be obvious.  Something like
>   «out="${out}+${match[1]}"» would address this — but…

I think that many examples in the man pages are like that – they don't
go the obvious path of just demonstrating the usage but instead, they
cover some edge case that, after (sometimes quite long) thinking
reveal something very peculiar about the feature. There are better
examples of this, however, the best that I've found currently is the
one used for the #b glob flag:

             foo="a string with a message"
             if [[ $foo = (a|an)' '(#b)(*)' '* ]]; then
               print ${foo[$mbegin[1],$mend[1]]}

The example prints `string with a', and the user has a "homework" of
untangling a few points:
- why it isn't "string with a message" (it's because the final ' '*
part that requires a space after the final word of the (*) part),
- why the answer isn't "message" (the same as above plus the fact that
there's no * before (a|an) and the greediness).

If not the homework-attitude of the examples in the man page, the
example would have been

             if [[ "a string with a message" = (#b)a' '(*) ]]; then

and would give the answer "string with a message". This would have
been the obvious-demonstration attitude that I've referred to.

> - … the use of advanced pattern matching features needlessly raises the
>   learning curve.

I can add the mention that the example needs EXTENDED_GLOB. Overall I
think that the example:
- is nice because it shows how to make the (S)...%% substitution
behave as the intuition would suggest,
- it's the only place in the documentation that uses the (#b) flag
with #/% substitution, showing that it's possible to use it in that
- it isn't that complex for someone that knows #b flag and the $match parameter.

> > It would have been tt(abcXXXdefXXghif)
> > +if not the tt([^X]) part, as despite the tt(%%) specifies a greedy
> > +match, the substring matching works by trying matches from right to
> > +left and stops at a first valid match.
> There are some grammatical errors here (e.g., s/(?<=specif)ies/ying/), but
> let's not worry about them until the rest of the patch isn't a moving target.

I think that grammar is correct here. Did you maybe misread the sentence?

Sebastian Gniazdowski
News: https://twitter.com/ZdharmaI
IRC: https://kiwiirc.com/client/chat.freenode.net:+6697/#zplugin
Blog: http://zdharma.org

Messages sorted by: Reverse Date, Date, Thread, Author