Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

[bug] locale ctype not always honoured properly in pcre matching



$ locale charmap
UTF-8
$ set -o rematchpcre
$ LC_ALL=C [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes
yes

OK, in C locale, those two bytes are considered as two characters.

$ [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes
$

OK, in UTF-8, those two bytes form one é character

$ LC_ALL=C [ $'\xc3\xa9' '=~' '^..\z' ] && echo yes
$

Same command as above, but now it doesn't match (?!) and instead:

$ LC_ALL=C [ $'\xc3\xa9' '=~' '^.\z' ] && echo yes
yes

Behaves  as if doing a match in UTF-8.

Same goes with:

$ PS1='$ ' zsh -f
$ set -o rematchpcre
$ (LC_ALL=C; [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes )
yes
$ [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes
$ (LC_ALL=C; [[ $'\xc3\xa9' =~ '^..\z' ]] && echo yes )
$

-- 
Stephane





Messages sorted by: Reverse Date, Date, Thread, Author