Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: filename completion with umlauts (again)



On Fri, 07 Jan 2011 23:10:48 -0800
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> On Jan 8, 12:35am, Andy Spiegl wrote:
> }
> } Uhm, too bad.  I am wondering whether case insensitivity in the
> } matcher could be achieved with a different trick?
> 
> As I understand it, the problem isn't case insensitivity.  The problem
> is (a) representing each set of characters in a managable syntax and
> (b) efficiently constructing a mapping between the two sets.
> 
> This is a tractable problem for single byte characters because there
> is a single fixed ordering and no more than 256 values in each set; for
> multibyte characters, not only is the number of values much larger,
> but also the user-expected collating order is not always the same as
> the numeric order of the underlying encoding.
> 
> (And now I fully expect someone to point out that I've got that entirely
> wrong and the trouble really is something else.)

The remaining problem is the multibyte one; the matcher code is heavily
tied to one character per array position in a way that doesn't make it
easy to turn multibyte into wide characters and back (and that doesn't
always make it obvious what the @*!@! it's actually doing with the
array).

The collating order might be potentially a problem if you use literal
characters, but that's already fixed in a general way by allowing the
syntax:

  m:{[:upper:][:lower:]}={[:lower:][:upper:]}

and similar --- basically, any use of {...} allows matching lower and
upper characters generically.

This already works for single byte locales using future-proof library
calls (i.e. things like iswupper() that operate on wide characters);
hence I'm reasonably confident that once we fix the multibyte problem
(if ever) the rest should fall naturally into place.

-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/



Messages sorted by: Reverse Date, Date, Thread, Author