Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

[PATCH] [[:blank:]] only matches on SPC and TAB



I noticed that [[:blank:]] was not matching on non-ASCII blank
characters. In a typical UTF-8 GNU locale, [[:blank:]] normally
includes

 U+0009 CHARACTER TABULATION
 U+0020 SPACE
 U+1680 OGHAM SPACE MARK
 U+2000 EN QUAD
 U+2001 EM QUAD
 U+2002 EN SPACE
 U+2003 EM SPACE
 U+2004 THREE-PER-EM SPACE
 U+2005 FOUR-PER-EM SPACE
 U+2006 SIX-PER-EM SPACE
 U+2008 PUNCTUATION SPACE
 U+2009 THIN SPACE
 U+200A HAIR SPACE
 U+205F MEDIUM MATHEMATICAL SPACE
 U+3000 IDEOGRAPHIC SPACE

On FreeBSD:

 U+0009 CHARACTER TABULATION
 U+0020 SPACE
 U+00A0 NO-BREAK SPACE
 U+FEFF ZERO WIDTH NO-BREAK SPACE

(Strangely enough U+00A0 is not classified as blank in single
byte charsets like ISO8859-1 there)

The code indeed matches on SPC and TAB explicitly both in the
multibyte and singlebyte cases (the non-breaking space is one
non-ASCII character that appears in a few singlebyte charsets
and is considered as blank on some systems (not GNU ones)).

In case that was not intentional, this patch should fix it:

diff --git a/Src/pattern.c b/Src/pattern.c
index fc7c737..d3eac44 100644
--- a/Src/pattern.c
+++ b/Src/pattern.c
@@ -3605,7 +3605,7 @@ mb_patmatchrange(char *range, wchar_t ch, int zmb_ind, wint_t *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == L' ' || ch == L'\t')
+		if (iswblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:
@@ -3840,7 +3840,7 @@ patmatchrange(char *range, int ch, int *indptr, int *mtp)
 		    return 1;
 		break;
 	    case PP_BLANK:
-		if (ch == ' ' || ch == '\t')
+		if (isblank(ch))
 		    return 1;
 		break;
 	    case PP_CNTRL:

-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author