Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Pattern changes, part 2



I did some more work on pattern matching over the weekend.  The main
idea is to make it easier to handle multibyte characters by using
the normal string representation whenever convenient.  All tests still
pass.

- The test string is now unmetafied for comparing against the pattern.
  Literal strings in the pattern are also unmetafied.  I've turned
  the METAINCs in the pattern matcher into CHARINCs where appropriate;
  this is currently a trivial increment but is a placeholder to say "go
  to next character".  (There is no change in places where the string
  remains metafied which will still need more thought.)  The new code
  should be significantly more efficient during pattern matching,
  since it doesn't have to test for Meta characters in many
  places, although I haven't benchmarked it.

- Character sets [...] are still metafied; we need the special
  characters to indicate ranges and Posix ctype names.

- Pure strings are still metafied.  (These are signalled by a special
  flag indicating the value stored is a string rather than the normal
  pattern programme.)  It became clear that changing this would be
  inefficient, particularly in globbing where we use the result of the
  pattern matcher to add to the (metafied) path buffer.  There are
  actually two cases:
  o We can spot immediately that the string doesn't have special
    characters.  This is the normal case and is handled fairly
    efficiently.
  o There are special characters around but nonetheless the string is a
    pure string.  There is one case where we need to handle this
    properly, which is when the string in question is ".." or ".", since
    those are never matched by globbing.  An example where this could
    occur would be a path segment (#i).. with extended globbing.  Here,
    we only find out we have a pure string after unmetafying into the
    pattern programme, so we need to metafy again.  This isn't so hot,
    but it's actually a rare corner case.

- The interface used by parameter substitutions has been tidied up.
  o The call patmatchlen() gets the length of the match, so that nothing
    outside pattern.c needs pointers into the test string.  This was
    necessary since the strings may now be reallocated, but is neater
    anyway.  (This is the metafied length, which is what the parameter
    code needs --- and this will probably continue, I don't thinks
    there's a case for unmetafying there.  There is some minor
    inefficiency in counting metafiable characters in the matched part
    of the trial string.)
  o The horrible global patoffset has disappeared.  Now the offset to
    be added to indices into parameters is passed as an argument.  I
    should have done it this way all along.

- Minor fix for numeric ranges: <num-> will now match any integer that
  is too large to represent in the internal integer type.  This has
  worked for <-> for some time, but it wasn't special-cased if there was a
  lower range.

I will commit this directly (with a ChangeLog entry, this time).

By the way, we really need a lot more tests which require the use of the
Meta character, and not just for pattern matching.  Adding this while
the character representation is in flux is probably not particularly
useful, however.

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR Ltd., Science Park, Milton Road,
Cambridge, CB4 0WH, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

www.mimesweeper.com
**********************************************************************



Messages sorted by: Reverse Date, Date, Thread, Author