Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Questions about completion matchers

X-seq: zsh-users 27228
From: Marlon Richert <marlon.richert@xxxxxxxxx>
To: Oliver Kiddle <opk@xxxxxxx>
Cc: Zsh Users <zsh-users@xxxxxxx>
Subject: Re: Questions about completion matchers
Date: Sun, 10 Oct 2021 23:14:22 +0300
Archived-at: <https://zsh.org/users/27228>
In-reply-to: <CAHLkEDueYdgTN9zvvwAoerRqFuD72iJFTpDnDPEshOxPkGu9Dg@mail.gmail.com>
List-id: <zsh-users.zsh.org>
References: <CAHLkEDuT6iGYEivrHcML-dEaTMJUiGy-p39=2=kDEw1gL0i=Ew@mail.gmail.com> <20296-1632661753.678317@ipjb.25sX.Whnd> <CAHLkEDv=+A7BXF6nKADNJDnd-zLW5baZ8xU3erp28it+jAKNrg@mail.gmail.com> <12742-1633816758.622067@mB95.qJqC.4--_> <CAHLkEDueYdgTN9zvvwAoerRqFuD72iJFTpDnDPEshOxPkGu9Dg@mail.gmail.com>

I have to say, after having processed both of your explanations, it
appears that r:lanchor||ranchor=tpat and l:lanchor||ranchor=tpat are
not working as intended. It intuitively feels like they should cover
this very common case:

If lanchor and ranchor are present and adjacent in the command line
string, then apply m:=tpat to the empty string between them. That is
to say: Enable completion between lanchor and ranchor, just like we
can enable completion to the left or right of an anchor.

In terms of syntax, this treats the void between || as an empty lpat,
just like it is in :|lanchor= or :ranchor|=. The || form (and indeed,
the | form) is essentially a conditional version of one of the other
matchers.

This actually extrapolates to a consistent interpretation of the
symbols in the matching syntax:
* lpat is always the substring whose meaning is "transformed": That is
to say, it (and only it) is made to be considered equal to any trial
substring matching tpat. It is permitted for lpat to be equal to the
empty string or the beginning/end of the command line string.
* Each |ranchor or lanchor| adds a constraint: A substring matching
them needs to be directly to the right or left of lpat -- or lpat's
meaning won't be "transformed". The meaning of the anchors themselves
is never "transformed": Any substring matching the anchor on the
command line needs to be matched literally in the trial string.
* For the first anchor in a matcher, the substring matching lpat will
not be considered equivalent to a trial substring that matches the
anchor. This clause is essentially there to prevent the matcher from
becoming too "greedy".
* For the second anchor, there is no such restriction. (Or otherwise,
the matcher could easily become too constrained and unable to match
any trial string at all.)

From this then follows the following meaning of each matcher:
* m:lpat=tpat - Treat each substring matching lpat on the command line
as being equal to any substring matching tpat in the trial string.
* r:lpat|ranchor=** - The same as m:lpat=*, but only if the substring
matching lpat has directly to its right a substring matching ranchor.
* r:lpat|ranchor=tpat - The same as m:lpat=tpat~ranchor, but only if
the substring matching lpat has directly to its right a substring
matching ranchor.
* r:lanchor||ranchor=tpat - The same as r:|ranchor=tpat, but only if
the substring matching ranchor is immediately preceded by a substring
matching lanchor.

One could even continue this pattern, as || is nothing more than
|lpat| with lpat equal to the empty string:
* r:lanchor|lpat|ranchor=tpat - The same as r:lpat|ranchor=tpat, but
only if the substring matching lpat is immediately preceded by a
substring matching lanchor.

However, in practice, the more constraints a matcher has, the more
likely it is to break consistency with this pattern. As a result, the
|| matchers no longer support the case for which it looks that they
were intended - to complete the missing substring between ranchor and
lanchor - which is now, unfortunately, a missing feature.

I would hope the implementation of the || matchers could be modified
to restore this feature -- which I assume must (or was intended to)
have been there at some point.

> On Sun, Sep 26, 2021 at 4:09 PM Oliver Kiddle <opk@xxxxxxx> wrote:
> >
> > With matching control, it is often easiest if you view it as converting
> > what is on the command-line into a regular expression. I haven't probed
> > the source code to get a precise view of how these are mapped. For my
> > own purposes, I keep a list but don't trust it in all cases because I've
> > found contradictory examples and tweaked it more than once, perhaps
> > making it less accurate in the process. So with the caveat that this
> > may contain errors, my current list is as follows:
> >
> > Not that that starting point is:
> >   [cursor position] → .*
> > Then:
> >   'm:a=b'       – a     → b             (* doesn't work on rhs)
> >   'r:|b=*'      – b     → [^b]*b
>
> The appearance of [^a] and [^b] in your patterns was a complete
> surprise to me. I would've expected * to work as * in a glob
> expression. This is not clear from the docs. Now that I know that the
> matcher syntax was based on regex, it makes more sense, but I still
> wouldn't have figured this out intuitively. A clearer explanation
> about this in the docs would be helpful. Yes, it's mentioned somewhere
> in the examples, but it should be explained more clearly earlier on.
>
> >   'r:a|b=*'     – ab    → [^b]*a?b
>
> This one looks incorrect to me as it does not match the example in the
> docs. From that example, it appears to me that it is supposed to work
> like this:
>  'r:a|b=*'     – b    → [^b]*ab
>
> >   'r:a|b=c'     - ab    → cb
> >   'l:a|=*'      – a     → [^a]*a
> >   'l:a|b=*'     – ab    → [^a]*ab?
> Shouldn't these last two result in a[^a]* and ab[^a]*, respectively,
> since the anchor goes to the left?
>
> >   'l:a|b=c'     – ab    → ac
> >   'b:a=*'       – ^a    → .*
>
> Oh, but here * does work like a * glob? So, I guess * behaves
> differently only when anchors are involved?
>
> >   'b:a=c'       – ^a    → ^c
> >   'e:a=*'       – a$    → .*
> >   'r:a||b=*'    – b     → [^a]*ab       (only * works on rhs, empty a or b has no use)
> >   'l:a||b=*'    – ^a    → a.*           (only * on rhs, empty a no use, b ignored?!)
>
> The comments on the last two items sound like bugs to me. Also,
> 'l:a||b=*' should work on just 'a' and not require '^a'.
>
>
> On Sun, Oct 10, 2021 at 12:59 AM Oliver Kiddle <opk@xxxxxxx> wrote:
> >
> > The difference between b: and l: with an empty anchor (or e/r) is not
> > encapsulated by my regular expressions. They only differ in how strict
> > the anchoring to the start of the match is where another matching
> > control allowed extra characters to be inserted at the beginning.
>
> So, does that mean then that matcher are not evaluated strictly left-to-right?
>
> > The example given when this was added was zsh option completion where
> > underscores are ignored and a prefix of NO is allowed.
>
> About that example, what exactly is the difference between L: and B:
> that lets B: complete '_NO_f' to '_NO_foo' and 'NONO_f' to 'NONO_f'
> but not L:? It's not clear from the example, let alone from the
> description of the matchers.
>
> > I took a look at the source code and dug out original -workers posts and
> > it does seem that the intention for the two anchor || forms was as I
> > thought. Even as designed I don't think either is ideal for camel case -
> > the l: form excludes characters from the wrong anchor for that.
> > The matching code looks a lot like regular expression matching with a
> > back tracking algorithm.
>
> Y02compmatch.ztst contains a lot of examples that could be added to
> the docs to better explain how the different matchers are intended to
> be used. It would help to better understand their workings.

References:
- Questions about completion matchers
  - From: Marlon Richert
- Re: Questions about completion matchers
  - From: Oliver Kiddle
- Re: Questions about completion matchers
  - From: Marlon Richert
- Re: Questions about completion matchers
  - From: Oliver Kiddle
- Re: Questions about completion matchers
  - From: Marlon Richert

Messages sorted by: Reverse Date, Date, Thread, Author