Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: clarification on (#U) in pattern matching.



Sorry, this just went to Stephane.

pws

> On 07 February 2022 at 11:30 Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx> wrote:
> 
> 
> > On 06 February 2022 at 08:42 Stephane Chazelas <stephane@xxxxxxxxxxxx> wrote:
> > $ set -o extendedglob
> > $ a='Stéphane€'
> > $ print -rn -- ${a//(#U)?} | hd
> > 00000000  a9 82 ac                                          |...|
> > 00000003
> > 
> > It seems that with (#U) (and here in a locale using UTF-8 as
> > charmap), ? with (#U) matches only on the first byte of
> > multibyte characters. Is that how it's meant to be?
> 
> I think what you're hitting is probably, as you suspected, a
> difference between the pattern matching code and the substitution
> code.  The underlying pattern matching really is byte by byte,
> but this doesn't force any substitution such as // to behave
> in the same way.  As far as I know, the MULTIBYTE option is
> the only higher level consistency measure we have.
> 
> I think there might be a parameter matching flag that you can
> also set that would help.  I'd have to look in more detail.
> 
> pws




Messages sorted by: Reverse Date, Date, Thread, Author