Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: compmatch behaviour



On Sunday 18 May 2008, Peter Stephenson wrote:
> 
> Plea for help.
> 
> I usually face the completion code on my own, but I'm just looking at
> trying to update the matching code to make it more general.  (I believe
> I'm already relying on a change of Andrey's that simplifies it in one
> respect.  Thank you.)
> 
> However, there's one bit that's got me stumped, and unfortunately it's
> the core of the whole business.  bld_line() in Src/Zle/compmatch.c works
> as follows:
> 
> - Input a "word pattern" (the test completion) and a "line pattern" (what
>   we're matching it against).
> - If we haven't yet got to the end of the line pattern
>   - If the line pattern is an equivalence class,

Nope, this is other way round. Equivalence (actually it is called
correspondence in documentation) class is simply reduced to standard
character class; this condition later returns a single match:

(lpat->equiv && c) ? (c == lpat->tab[i])

The problem we have is *really* with standard character class.

>                                                  then for *every* character 
>     that can match the character in the test word (yes, you read that
>     correctly---if we're looking at upper case characters, for example,
>     we will try every possible upper case character until it works)

Nope, we will check only uppercase variant of character in the commad line

>     - set the character in the string from the command line
>     - recurse to test with this character in place with the line
>       pattern advanced but the same word pattern
>     - if it succeeded, return success.
>   - If it's not an equivalence class, no problem: only one character
>     to try.  Try it (same recursive logic but no nasty loop).
> - If we've got to the end of the line pattern (i.e. have recursed to the
>   extent where we've got a complete string from the command line,
>   - try matching the test completion and the trial word
>   - return success or failure.  (This causes the loop above either
>     to return success or try with a new character in the equivalence
>     class.)
> 
> In other words, to detect a match we try every possible character that
> could possibly match and see if it does.  This is crazy.  Obviously this
> doesn't generalize to larger groups of characters.
> 
> I think the basic reason for this is something along the lines of the
> following (I realise this isn't particularly coherent but this is the
> best I've got for now): because we can have patterns associated with
> both the trial string and the word on the command line, we have got
> ourselves into a position where the logic is naturally qudratic: both
> sides can in principle change and consequently we need to change one
> side to see if it can match the other.
> 
> The code for bld_line() is in Src/Zle/compmatch.c.  If anyone can see a
> way out of this mess, I'd be glad to hear even tentative theories.

I tried to replace this function with something else and failed because I
still do not understand most of this file. Some clue what's going on may
give comment in join_strs() (where one use case for bld_line is):

/* This builds a string that may be put on the line that fully matches the
 * given strings. The return value is NULL if no such string could be built
 * or that string in local static memory, dup it. */

So we *actually* try to *build* line for whatever reasons; and this
perfectly explains what happens in bld_line(). But I failed to grasp full
sequence of transformations that happen in this file so I must admit I
do not understand why and when would we need to do this.

> Obviously, I will continue to look at it.  For now, however, I'm
> stumped.
> 
> Another problem is that the match code makes extensive use of lengths,
> which need to become character counts, which means that anything that
> touches this code needs to use wide characters,

Hmm ... but if we ever want to move to "normal" pattern mathing code it
operates on multibyte not wide characters, does not it? This would imply
quite a lot of coversions; and some functions from compmatch.c are also
used in very unexpected places.

> which is a lot of 
> tortuous code.  However, that problem is in principle soluble.  We need
> to get the first problem solved.
> 


Attachment: signature.asc
Description: This is a digitally signed message part.



Messages sorted by: Reverse Date, Date, Thread, Author