Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: compmatch behaviour

X-seq: zsh-workers 25069
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: zsh-workers@xxxxxxxxxx (Zsh hackers list)
Subject: Re: compmatch behaviour
Date: Sun, 18 May 2008 16:57:53 -0700
In-reply-to: <10710.1211137299@pws-pc>
Mailing-list: contact zsh-workers-help@xxxxxxxxxx; run by ezmlm
References: <10710.1211137299@pws-pc>

On May 18,  8:01pm, Peter Stephenson wrote:
}
} However, there's one bit that's got me stumped, and unfortunately it's
} the core of the whole business.  bld_line() in Src/Zle/compmatch.c works
} as follows:

So if I comprehend your question, it's not that you need help figuring
out what the code is doing.  You want help figuring out what it should
do instead, because the space of all possible wide characters is much
to big to brute-force it the way Sven originally did.

Right?

} ... because we can have patterns associated with both the trial
} string and the word on the command line, we have got ourselves into
} a position where the logic is naturally qudratic: both sides can in
} principle change and consequently we need to change one side to see if
} it can match the other.

I'm not sure this is quite right (so maybe it's just your coherency or
my comprehension that's off).  There are two situations being handled
simultaneously here, and maybe the first thing to do is to separate
them.  The first situation is where wpat is a correspondence class
and we need to select the corresponding position out of lpat.  The
second case is where lpat is an equivalence class and we need to try
every possible character in the class at line position *lp.

The two cases don't actually overlap as far as I can tell -- Sven has
branched the same loop that searches for c in lpat->tab in the first
case, to do double duty as the loop that acts on every character in
lpat->tab in the second case.

Either (but not both) of the two situations could occur at every value
of lp, which is what the recursion is covering.  This is limited by the
length of the line --- there might be an optimization opportunity in
testing sooner whether *(lp + 1) will be the end of line, but the depth
of the recursion is not related to the size of the character class.

I think the first think that's needed is to change the Cpattern struct
from a dense array indexed by the ascii value of a character, into ...
well, something else, so that it's not necessary to iterate over it
in the first case, and so the iteration is more sparse in the second
case.  Of course, that may make pattern_match() more complicated ....

I'm not sure there's any way to avoid iterating in the second case.
Besides handling two possible ways of interpreting the character class,
I think (without tracing very far up the call chain) that this is also
constructing the prefix string that's going to be shown to the user in
the event of a common prefix shared by ambiguous matches.  It's not
enough to check whether the test character is in the class, you might
have to know *which* character it is in the class, so to speak.

Follow-Ups:
- Re: compmatch behaviour
  - From: Peter Stephenson

References:
- compmatch behaviour
  - From: Peter Stephenson

Messages sorted by: Reverse Date, Date, Thread, Author