Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Issue with ${var#(*_)(#cN,M)}



On Tue, 27 Oct 2015 10:00:34 +0000
Peter Stephenson <p.stephenson@xxxxxxxxxxx> wrote:
> Original problem
> > } ~$ a='1_2_3_4_5_6'
> > } ~$ echo ${a#(*_)(#c2)}
> > } 2_3_4_5_6
> 
> On Tue, 20 Oct 2015 16:04:22 -0700
> Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> > What's messing it up is the "*" operator and the backtracking that is
> > implied because * can match anything.
> 
> Exactly.  What's backtracking over what in what order here is a bit of
> nightmare, and I'm not sure I'm likely to get my mind round it.
> 
> Unless someone does, you'll be better of sticking to
> 
> % a='1_2_3_4_5_6'
> % echo ${a#([^_]#_)(#c2)}
> 3_4_5_6
> 
> and then we don't have the "*" within the group to worry about.

Indeed, I've just noticed that with
% egrep --version
egrep (GNU grep) 2.8

the following:

% egrep '^(*_){2}$' <<<'1_2_'

fails to match completely, i.e the backtracking is too complicated
to handle, whereas

% egrep '^([^_]+_){2}$' <<<'1_2_'

succeeds.  At this point, I'm going to document the difficulty and
slowly retreat backwards from the dark corner.

pws

diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 5ea8610..49a0f0d 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -2192,6 +2192,16 @@ inclusive.  The form tt(LPAR()#c)var(N)tt(RPAR()) requires exactly tt(N)
 matches; tt(LPAR()#c,)var(M)tt(RPAR()) is equivalent to specifying var(N)
 as 0; tt(LPAR()#c)var(N)tt(,RPAR()) specifies that there is no maximum
 limit on the number of matches.
+
+Note that if the previous group of characters contains wildcards,
+results can be unpredictable to the point of being logically incorrect.
+It is recommended that the pattern be trimmed to match the minimum
+possible.  For example, to match a string of the form `tt(1_2_3_)', use
+a pattern of the form `tt(LPAR()[[:digit:]]##_+RPAR()LPAR()#c3+RPAR())', not
+`tt(LPAR()*_+RPAR()LPAR()#c3+RPAR())'.  This arises from the
+complicated interaction between attempts to match a number of
+repetitions of the whole pattern and attempts to match the wildcard
+`tt(*)'.
 )
 vindex(MATCH)
 vindex(MBEGIN)



Messages sorted by: Reverse Date, Date, Thread, Author