Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: [RFC] Add xfail tests for || form of completion matchers

X-seq: zsh-workers 49516
From: Oliver Kiddle <opk@xxxxxxx>
To: Marlon Richert <marlon.richert@xxxxxxxxx>
Cc: Zsh hackers list <zsh-workers@xxxxxxx>
Subject: Re: [RFC] Add xfail tests for || form of completion matchers
Date: Fri, 22 Oct 2021 15:02:50 +0200
Archived-at: <https://zsh.org/workers/49516>
In-reply-to: <CAHLkEDsHrjbCBSJS0pYoCaCN3szKWsyh5xf6j9zVuJgh3yJTmQ@mail.gmail.com>
List-id: <zsh-workers.zsh.org>
References: <CAHLkEDvf+tX1U5V0Ap6HnSGyBiARJ_U35ry_6jBH9TqVhyft+A@mail.gmail.com> <CAHLkEDsHrjbCBSJS0pYoCaCN3szKWsyh5xf6j9zVuJgh3yJTmQ@mail.gmail.com>

On 12 Oct, Marlon Richert wrote:
> On Mon, Oct 11, 2021 at 5:34 PM Marlon Richert <marlon.richert@xxxxxxxxx> wrote:
> > The tests show how :||= matchers should behave in order to provide
> > completion features that cannot be implemented with :|= matchers.

Thanks. These confirm that we both have the same understanding on the intended
behaviour. That's of more importance than anything in the comments I've
added below. The change should at least clarify things for anyone else
while the || form remains buggy.

> I've now added an accompanying documentation update to the patch.

The documentation is also definitely an improvement. The separation out
of l:/r: with an empty anchor and use of the term coanchor definitely helps.

Once you've had a chance to consider the comments I made in 49488 and here, I'd
be happy to push the work to git. Regarding my comment about the term "trial
completions" in that mail, it occurs to me that the t in tpat stands for
"trial" so that may need changing if the term is changed.

I also attach a small patch here for the matching code: it bails out
early if the command line is too short for the anchors and patterns.
This doesn't include the coanchor but there's no reason why it
shouldn't. It's early enough in the matching code for me to still be
fairly confident of what's going on in the code at that stage and to
target the condition with a debugger.

> +tt(-M) option of the tt(compadd) builtin command.  Note that this is not used
> +if the command line contains a glob pattern and the shell
> +optiontt(GLOB_COMPLETE) is set or the tt(pattern_match) of the tt(compstate)

You're missing a space here which prevents tt() working for GLOB_COMPLETE
And the wording isn't right for pattern_match. Perhaps remove "the" before it
and change "of" to "in". Or perhaps use the word "key".

> +ifnzman(noderef(Completion Builtin
> +Commands))\
> +) requires a var(match-spec) as it argument, consisting of one or more matching

as "its" argument or as "an" argument.

> +descriptions separated by whitespace.  Each description consists of a letter,
> +followed by a colon, and then patterns describing which substrings on the
> +command line map onto which substrings in the trial completion.  Descriptions

Maybe add "can" before "map".

> +are evaluated from left to right and are cumulative.  An earlier mapping can
> +thus potentially change the outcome of a later mapping.  Finally, any unmapped
> +substrings will be mapped using the default mapping of identical substrings.

Identical strings will always match. Matching control only defines additional
ways. This last sentence might imply otherwise.

> +When using the completion system (see
> +ifzman(zmanref(zshcompsys))\
> +ifnzman(noderef(Completion System))\
> +), users can define match specifications that are to be used for specific
> +contexts by using the tt(matcher) and tt(matcher-list) styles.  The values for
> +the latter will be used everywhere.

matcher-list is not used "everywhere". It is looked up early but you can have
different values for different completers. Perhaps just remove that last
sentence, people can refer to zshcompsys for details.

> +Correspondence classes are defined like character classes, but with two
> +differences: They are delimited by a pair of braces, and negated classes are
> +not allowed, so the characters tt(!) and tt(^) have no special meaning directly
> +after the opening brace.  They indicate that a range of characters on the line

"They" here could be understood to refer to ! and ^ so we probably need to
spell out "Correspondence classes" again. This text was not your addition, it
was there before.

> +tt(r:|=*) lets (the empty substring at) the right edge of the command line
> +string be completed to any number of characters at the edge of each trial
> +completion.

Could add a note here that this would only have any effect if the cursor is not
already as the end of the command line.

> -preceded by the pattern var(lanchor).  The var(lanchor) can be blank to
> -anchor the match to the start of the command line string; otherwise the

Where "command line" is used as a compound adjective, I'd hyphenate it
("command-line"). There are other cases. Mostly, it occurs as a noun which I'd
leave with a space.

> +Let all substrings matching var(lpat) at the beginning (for tt(b:) and tt(B:))
> +or end (for tt(e:) and tt(E:)) of the command line be completed to the same
> +number of substrings matching var(tpat) in each trial completion in the same
> +relative position.

I'd restore some of the old sentence where we acknowledge that b/e are very
similar to l/r with an empty anchor just with scope for multiple applications
of the same or different matching controls.

> +
> +Example:
> +
> +tt(B:[nN][oO]=) adds all occurences of `tt(no)', `tt(nO)', `tt(No)' and

A more useful example is B:0= for initial zeros. A fairly good demonstration of
the differences is:
    compadd -M 'B:0= L:|-=' 1 2 3
A single minus sign that is really first is allowed. Multiple initial zeros,
including after the minus are allowed. So this allows -1 002 -03 but not
0-1 or --2.

> +`tt(NO)' at the beginning of the command line to the beginning of each trial
> +completion.  If tt(r:|=*) is added to this, then given a trial completion
> +`tt(foo)', it lets `tt(noNOf)' be completed to `tt(noNOfoo)'.

Not sure the r:|=* really helps understanding here. The cursor at the end of
"foo" would be just as good and it isn't a useful example to begin with.

> +xitem(tt(l:)var(anchor)tt(|)var(lpat)tt(=)var(tpat))
> +xitem(tt(L:)var(anchor)tt(|)var(lpat)tt(=)var(tpat))
> +xitem(tt(r:)var(lpat)tt(|)var(anchor)tt(=)var(tpat))
> +item(tt(R:)var(lpat)tt(|)var(anchor)tt(=)var(tpat))(
> +Let any command line substring, which is left/right-adjacent (respectively) to
> +a substring matching var(anchor) and which matches var(lpat), be completed to
> +any trial completion substring, which

I'd consider it to be the anchor which is left(/right)-adjacent to the
substring not the other way around. How about:

   Let any command-line substring matching var(lpat) complete to a trial
   completion substring matching var(tpat) where both are adjacent to an
   identical substring matching var(anchor). The l: and r: forms allow for
   anchors appearing to the left or right, respectively.

If you specify something like [.-] as the anchor, it can't be . on the line and
- in the candidate so noting that the anchors need to be identical is useful.

> +startitemize()
> +itemiz(\
> +is adjacent to the same substring and which
> +)
> +itemiz(\
> +matches var(tpat), but which
> +)
> +itemiz(\
> +does not contain any substrings matching var(anchor).

That is only applicable to *. Even ? can match the anchor.

> --- a/Test/Y02compmatch.ztst
> +++ b/Test/Y02compmatch.ztst

> - comptest $'tst c...pag\t'
> -0:Documentation example using input c...pag\t
> + comptest $'tst ...pag\t'
> +0:Documentation example using input ...pag

It is good to have tests matching exactly the examples in the documentation but
in some cases there could be value in preserving the essence of the old test
too. To get good test coverage, we want empty and partial components in both
the middle and beginning of the command line to be tested.

> + test_code $example4b_matcher example4_list
> + comptest $'tst ...pag\t^[bc\t^Fg^F^Fa\t'
> +0f:Documentation example using input ...pag with double anchor
> +>line: {tst .g.}{}

Don't think I follow how ...pag would be transformed to .g.
I assume this is copied from two tests before and should be unchanged.

> + example5b_matcher='r:[^.,_-]||[.,_-]=* r:|=*'
> + test_code $example5b_matcher example5_list
> + comptest $'tst  .c\t^[bv\t.h\t^[bv'
> +0f:Documentation example using input .c but with double anchor

The second tab doesn't really test the matcher because the cursor is located
where characters need to be added. And the final tab at the end seems to be
missing - and the same issue would apply if it is added.

>   example7_matcher="r:[^A-Z0-9]||[A-Z0-9]=** r:|=*"
>   example7_list=($example6_list)
>   test_code $example7_matcher example7_list
> - comptest $'tst H\t2\t'
> -0:Documentation example using "r:[^A-Z0-9]||[A-Z0-9]=** r:|=*"
> + comptest $'tst H\t^[bF\to2\t^[b5\tb\t'
                            ^
			    Is there meant to be a tab after that o?

Thanks again for this. You've also helped me to get a clearer picture of
matching control.

Oliver

diff --git a/Src/Zle/compmatch.c b/Src/Zle/compmatch.c
index cc4c3eca9..95eff1e92 100644
--- a/Src/Zle/compmatch.c
+++ b/Src/Zle/compmatch.c
@@ -693,8 +693,9 @@ match_str(char *l, char *w, Brinfo *bpp, int bc, int *rwlp,
 			alen = mp->ralen; aol = mp->lalen;
 		    }
 		    /* Give up if we don't have enough characters for the
-		     * line-string and the anchor. */
-		    if (ll < llen + alen || lw < alen)
+		     * line-string and the anchor, or for both anchors in
+		     * the case of the trial completion word. */
+		    if (ll < llen + alen || lw < alen + aol)
 			continue;

 		    if (mp->flags & CMF_LEFT) {

Follow-Ups:
- Re: [RFC] Add xfail tests for || form of completion matchers
  - From: Marlon Richert

References:
- [RFC] Add xfail tests for || form of completion matchers
  - From: Marlon Richert
- Re: [RFC] Add xfail tests for || form of completion matchers
  - From: Marlon Richert

Messages sorted by: Reverse Date, Date, Thread, Author