Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Questions about completion matchers

X-seq: zsh-users 27137
From: Oliver Kiddle <opk@xxxxxxx>
To: Marlon Richert <marlon.richert@xxxxxxxxx>
Cc: Zsh Users <zsh-users@xxxxxxx>
Subject: Re: Questions about completion matchers
Date: Sun, 26 Sep 2021 15:09:13 +0200
Archived-at: <https://zsh.org/users/27137>
In-reply-to: <CAHLkEDuT6iGYEivrHcML-dEaTMJUiGy-p39=2=kDEw1gL0i=Ew@mail.gmail.com>
List-id: <zsh-users.zsh.org>
References: <CAHLkEDuT6iGYEivrHcML-dEaTMJUiGy-p39=2=kDEw1gL0i=Ew@mail.gmail.com>

Marlon Richert wrote:
> How can I make a matcher that completes the right-most part (and only
> the right-most part) of each subword? That is, given a target
> completion 'abcDefGhi', how do I make a match specification that
> completes inputs

If you're trying to do camel-case matching, one option is:
  'r:|[A-Z]=* r:|=*'

The following was used by the original creator of matching control, it
works and breaks for the same cases as above in your example:
  'r:[^ A-Z0-9]||[ A-Z0-9]=* r:|=*'

These allow extra characters at the beginning. So in your example, D
and DG match the target. There are also oddities with consecutive runs
of upper case characters, consider e.g. completion after ssh -o where
there is, e.g. "TCPKeepAlive" as an option. TKA won't match but ideally
would.

With matching control, it is often easiest if you view it as converting
what is on the command-line into a regular expression. I haven't probed
the source code to get a precise view of how these are mapped. For my
own purposes, I keep a list but don't trust it in all cases because I've
found contradictory examples and tweaked it more than once, perhaps
making it less accurate in the process. So with the caveat that this
may contain errors, my current list is as follows:

Not that that starting point is:
  [cursor position] → .*
Then:
  'm:a=b'	– a	→ b		(* doesn't work on rhs)
  'r:|b=*'	– b	→ [^b]*b
  'r:a|b=*'	– ab	→ [^b]*a?b
  'r:a|b=c'     - ab    → cb
  'l:a|=*'	– a	→ [^a]*a
  'l:a|b=*'	– ab	→ [^a]*ab?
  'l:a|b=c'     – ab	→ ac
  'b:a=*'	– ^a	→ .*
  'b:a=c'	– ^a    → ^c
  'e:a=*'	– a$	→ .*
  'r:a||b=*'	– b	→ [^a]*ab	(only * works on rhs, empty a or b has no use)
  'l:a||b=*'	– ^a	→ a.*		(only * on rhs, empty a no use, b ignored?!)

Something like [A-Z] becomes it's concrete form from the command-line in the regex
For correspondence classes, the corresponding form goes in the regex and only work with m:/M: forms.
** is like * but with .* instead of [^x]*

In all cases, the original unchanged form also passes - a matching
control does not have to be used. I've excluded those in the regular
expressions above. But including them note the following potentially
useful effects with an empty lpat:

  'r:|b=c'	– b	→ c?b
  'l:a|=c'      – a	→ ac?

When composing multiple matching controls, it doesn't try to apply over
the results of the previous. You can consider it an alternation of the
effect of each matching control.

So 'r:a|b=* l:a|b=*' would be: ab → (ab|[^b]*a?b|[^a]*ab?)

For the most part there are certain common forms and if you stick to
those, you find fewer bugs than when being creative.

The || forms seem buggy to me. From the documentation, my assumption
would be that one means a[^a]*b and the other a[^b]*b
That could be more helpful for camel-case but I would need to generate
tests to say for sure.
b seems to even be ignored for the l form.

> Additionally, the following are unclear to me from the manual:
> * What is the exact difference between l:lanchor||ranchor=tpat and
> r:lanchor||ranchor=tpat ?

From the documentation and assuming some actual symmetry I would assume
the difference to be that lanchor needs to match the completion
candidate but not the command-line, while a tpat of * will not match
ranchor – swap l and r anchors for l and r forms in the description.
If that's what it did do, it might possibly bring us closer to a good
solution for camel-case matching.

But as the regex above indicates, that isn't the case. I don't really
see the logic of the l:lanchor||ranchor=tpat seeming to be anchored to
the beginning. I think those forms came about as an attempt to get
camel-case to work.

> * Why do the examples in the manual add r:|=* to the end of each
> matcher? This appears to make no difference at all.

For the case where the cursor is in the middle rather than the end. For
the example from the manual with Usenet group names like
comp.sources.unix, try c.s.u with the cursor after the s.

There are three components. Two have a dot anchor at the end. The final
has an end-of-string anchor.

> * It appears that the order of "match descriptions" in a matchers
> matters, but it is unclear to me in what way and it isn't mentioned in
> the manual. For example, the pairs of matchers below differ only in
> the order of their match descriptions, yet each produces a different
> behavior. How are the match descriptions inside a matcher evaluated
> and what causes the difference between these?

Order shouldn't really matter (apart from the x: matcher).

As I mention earlier, you can consider it as being the alternaton of all
of them - at every point in the command-line where one of them can do
something. So a single match may rely on more than one matching control
to be matched. I can imagine that order might matter where you have mixed
up anchors. An example would be interesting.

>   * 'r:|[[:punct:]]=** l:?|=[[:punct:]]' completes 'cd a/b' to 'cd
> a/bc', but 'l:?|=[[:punct:]] r:|[[:punct:]]=**' does not.

In my testing, neither do. Where is the cursor? You can think of the
matching as adding .* at the cursor position so a/b completes to a/bc
with no matching control if the cursor is at the end. The lack of other
candidate completions can also confuse testing of this because with
prefix completion, a/bc can be the only unambiguous match. Are you sure
you don't have other customisations that is allowing the first case to
match.

The l: pattern allows punctuation after any character so a/b becomes the
pattern a(|[[:punct:]])/(|[[:punct:]])b(|[[:punct:]])

The r: pattern allows anything before the punctuation so a/b becomes the
pattern a*/b

>   * Given two target completions 'a-b' and 'a_b', both 'l:?|=[-_]
> m:{-}={_}' and 'm:{-}={_} l:?|=[-_]' will insert 'a-b' as the
> unambiguous substring on the first try, but on the second try, only
> the former will then list both completions, whereas the latter will
> complete only 'a-b'.

I'm not sure I follow what you mean by the first and second try. If you
mean a second press of <tab>, matching is done completely anew with the
new command-line contents.

With just compadd -M 'l:?|=[_-]' - a-b a_b
ab<tab> offers both candidates as matches.
Adding 'm:-=_' in just means that completion after a-b will also match
a_b
Single element correspondence classes are pointless by the way.

Especially with the uppercase forms (L: etc) it is easy to create
situations where an unambiguous substring is inserted and the set of
candidate matches is quite different with the new command-line contents.
The effect can be somewhat jarring and has the appearance of a bug.

Oliver

Follow-Ups:
- Re: Questions about completion matchers
  - From: Marlon Richert

References:
- Questions about completion matchers
  - From: Marlon Richert

Messages sorted by: Reverse Date, Date, Thread, Author