Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Pattern matching



"Andrej Borsenkow" wrote:
> Just using the occasion. I once suggested to replace ad hoc code with convers
> ion
> to standard regexps. I believe, now a days all systems have standard POSIX
> regexps available. This will be much more clean and I hope faster. Zsh patter
> ns
> are pretty close to normal regular experssions so it should be not a problem
> (even SAMBA takes this course for wildcard matching :-) 

I don't think zsh is close enough to be able to do this, even without the
possible vagaries of pattern matching on some of the machines zsh supports.
I'm converting some existing regexp code (it's Henry Spencer's, which has a
relaxed copyright) which should do this more smoothly, keeping pretty much
all of the existing behaviour.

> Just some points:
> 
>  - currently code scans pattern for every match. This may really be
> inefficient
> for globbing (even more so, as code has to dequote string every time). Using
> rgexps pattern can be compiled once - for recursive globbing quite a gain.

The current system *does* compile patterns.  The complete path for a glob
pattern is stored in a struct complist, and each segment in a struct comp,
which is also used for standard patterns; each pattern is compiled only
once every time it is encountered.  Maybe you mean there's no easy way of
duplicating patterns for future use, which is true, but there's currently
no real application for that since patterns have to be recompiled
separately for each command line after expansion anyway.  When I've
finished, the code will all be in a single string, so can be duplicated in
one go, but I still don't how this can easily be used.  You could cache
simple patterns, those which previous expansions (parameter, etc.) don't
change, and use the cached programme inside a function, but that's already
a lot of work to do properly.

>  - this may automatically solve the original problem of this post. Regexps ar
> e
> required to match the longest string, even if every subregexp is not the long
> est
> one.

Yes, but there's no equivalent for numerical matches in normal regexps, nor
any obvious way that I can see of telling a regexp how to match a numeric
range at all without some immensely complicated pattern.  That's why I
think the only solution is to write an ad-hoc pattern matcher, but do it
properly.  The rough basics are already working.  Globbing flags are going
to be harder to fit in --- but these would be quite impossible to do with a
standard regexp matcher anyway, particularly approximate matching.

-- 
Peter Stephenson <pws@xxxxxxxxxxxxxxxxx>       Tel: +39 050 844536
WWW:  http://www.ifh.de/~pws/
Dipartimento di Fisica, Via Buonarroti 2, 56127 Pisa, Italy



Messages sorted by: Reverse Date, Date, Thread, Author