Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: PATCH: character sets for internal zsh tests



Bart Schaefer wrote:
> Just in case I wasn't clear before, in all-caps I think [:IFS:] and
> [:IFSSPACE:] are OK.  I didn't like lower-case [:ifs:] or [:ifsw:].

OK, let's keep the link between [:IFS:] and $IFS explicit.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.53
diff -u -r1.53 expn.yo
--- Doc/Zsh/expn.yo	24 Apr 2005 18:38:04 -0000	1.53
+++ Doc/Zsh/expn.yo	28 Apr 2005 16:08:18 -0000
@@ -1224,19 +1224,82 @@
 first character in the list.
 cindex(character classes)
 There are also several named classes of characters, in the form
-`tt([:)var(name)tt(:])' with the following meanings:  `tt([:alnum:])'
-alphanumeric, `tt([:alpha:])' alphabetic,
-`tt([:ascii:])' 7-bit,
-`tt([:blank:])' space or tab,
-`tt([:cntrl:])' control character, `tt([:digit:])' decimal
-digit, `tt([:graph:])' printable character except whitespace,
-`tt([:lower:])' lowercase letter, `tt([:print:])' printable character,
-`tt([:punct:])' printable character neither alphanumeric nor whitespace,
-`tt([:space:])' whitespace character, `tt([:upper:])' uppercase letter, 
-`tt([:xdigit:])' hexadecimal digit.  These use the macros provided by
+`tt([:)var(name)tt(:])' with the following meanings.
+The first set use the macros provided by
 the operating system to test for the given character combinations,
-including any modifications due to local language settings:  see
-manref(ctype)(3).  Note that the square brackets are additional
+including any modifications due to local language settings, see
+manref(ctype)(3):
+
+startitem()
+item(tt([:alnum:]))(
+The character is alphanumeric
+)
+item(tt([:alpha:]))
+(
+The character is alphabetic
+)
+item(tt([:ascii:]))(
+The character is 7-bit, i.e. is a single-byte character without
+the top bit set.
+)
+item(tt([:blank:]))(
+The character is either space or tab
+)
+item(tt([:cntrl:]))(
+The character is a control character
+)
+item(tt([:digit:]))(
+The character is a decimal digit
+)
+item(tt([:graph:]))(
+The character is a printable character other than whitespace
+)
+item(tt([:lower:]))(l
+The character is a lowercase letter
+)
+item(tt([:print:]))(
+The character is printable
+)
+item(tt([:punct:]))(
+The character is printable but neither alphanumeric nor whitespace
+)
+item(tt([:space:]))(
+The character is whitespace
+)
+item(tt([:upper:]))(
+The character is an uppercase letter
+)
+item(tt([:xdigit:]))(
+The character is a hexadecimal digit
+)
+enditem()
+
+Another set of named classes is handled internally by the shell and
+is not sensitive to the locale:
+
+startitem()
+item(tt([:IDENT:]))(
+The character is allowed to form part of a shell identifier, such
+as a parameter name
+)
+item(tt([:IFS:]))(
+The character is used as an input field separator, i.e. is contained in the
+tt(IFS) parameter
+)
+item(tt([:IFSSPACE:]))(
+The character is an IFS white space character; see the documentation
+for tt(IFS) in
+ifzman(the zmanref(zshparams) manual page)\
+ifnzman(noderef(Parameters Used By The Shell))\
+.
+)
+item(tt([:WORD:]))(
+The character is treated as part of a word; this test is sensitive
+to the value of the tt(WORDCHARS) parameter
+)
+enditem()
+
+Note that the square brackets are additional
 to those enclosing the whole set of characters, so to test for a
 single alphanumeric character you need `tt([[:alnum:]])'.  Named
 character sets can be used alongside other types,
Index: Src/pattern.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v
retrieving revision 1.26
diff -u -r1.26 pattern.c
--- Src/pattern.c	26 Apr 2005 09:51:29 -0000	1.26
+++ Src/pattern.c	28 Apr 2005 16:08:18 -0000
@@ -193,8 +193,12 @@
 #define PP_SPACE  11
 #define PP_UPPER  12
 #define PP_XDIGIT 13
-#define PP_UNKWN  14
-#define PP_RANGE  15
+#define PP_IDENT  14
+#define PP_IFS    15
+#define PP_IFSSPACE   16
+#define PP_WORD   17
+#define PP_UNKWN  18
+#define PP_RANGE  19
 
 #define	P_OP(p)		((p)->l & 0xff)
 #define	P_NEXT(p)	((p)->l >> 8)
@@ -1118,6 +1122,14 @@
 			    ch = PP_UPPER;
 			else if (!strncmp(patparse, "xdigit", len))
 			    ch = PP_XDIGIT;
+			else if (!strncmp(patparse, "IDENT", len))
+			    ch = PP_IDENT;
+			else if (!strncmp(patparse, "IFS", len))
+			    ch = PP_IFS;
+			else if (!strncmp(patparse, "IFSSPACE", len))
+			    ch = PP_IFSSPACE;
+			else if (!strncmp(patparse, "WORD", len))
+			    ch = PP_WORD;
 			else
 			    ch = PP_UNKWN;
 			patparse = nptr + 2;
@@ -2724,6 +2736,22 @@
 		if (isxdigit(ch))
 		    return 1;
 		break;
+	    case PP_IDENT:
+		if (iident(ch))
+		    return 1;
+		break;
+	    case PP_IFS:
+		if (isep(ch))
+		    return 1;
+		break;
+	    case PP_IFSSPACE:
+		if (iwsep(ch))
+		    return 1;
+		break;
+	    case PP_WORD:
+		if (iword(ch))
+		    return 1;
+		break;
 	    case PP_RANGE:
 		range++;
 		r1 = STOUC(UNMETA(range));
Index: Test/D02glob.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D02glob.ztst,v
retrieving revision 1.9
diff -u -r1.9 D02glob.ztst
--- Test/D02glob.ztst	16 Mar 2005 11:51:15 -0000	1.9
+++ Test/D02glob.ztst	28 Apr 2005 16:08:18 -0000
@@ -323,3 +323,28 @@
  print glob.tmp/ra=1.0_et=3.5/???
 0:Bug with intermediate paths with plain strings but tokenized characters
 >glob.tmp/ra=1.0_et=3.5/foo
+
+ doesmatch() {
+   setopt localoptions extendedglob
+   print -n $1 $2\ 
+   if [[ $1 = $~2 ]]; then print yes; else print no; fi;
+ }
+ doesmatch MY_IDENTIFIER '[[:IDENT:]]##'
+ doesmatch YOUR:IDENTIFIER '[[:IDENT:]]##'
+ IFS=$'\n' doesmatch $'\n' '[[:IFS:]]'
+ IFS=' ' doesmatch $'\n' '[[:IFS:]]'
+ IFS=':' doesmatch : '[[:IFSSPACE:]]'
+ IFS=' ' doesmatch ' ' '[[:IFSSPACE:]]'
+ WORDCHARS="" doesmatch / '[[:WORD:]]'
+ WORDCHARS="/" doesmatch / '[[:WORD:]]'
+0:Named character sets handled internally
+>MY_IDENTIFIER [[:IDENT:]]## yes
+>YOUR:IDENTIFIER [[:IDENT:]]## no
+>
+> [[:IFS:]] yes
+>
+> [[:IFS:]] no
+>: [[:IFSSPACE:]] no
+>  [[:IFSSPACE:]] yes
+>/ [[:WORD:]] no
+>/ [[:WORD:]] yes

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

**********************************************************************



Messages sorted by: Reverse Date, Date, Thread, Author