Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: ${a[(i)pattern]} if a=()



On Tue, 18 Mar 2008 08:47:28 -0700
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> Looking at documenation for this, I was reminded about this recent bit:
> 
>      Note that in subscripts with both `r' and `R' pattern characters
>      are active even if they were substituted for a parameter
>      (regardless of the setting of GLOB_SUBST which controls this
>      feature in normal pattern matching).  It is therefore necessary to
>      quote pattern characters for an exact string match.
> 
> Maybe we could press the (e) flag into service here?  I haven't looked
> at how hard that would be to do, but it's semantically similar to the
> existing use

Yes, that seems perfectly reasonable, and it was easy to do (except I've
just got back from holiday so it's appeared a week late).  It might look
a little bizarre that in one case we untokenize() and in the other case
we tokenize():  you might think we'd need just one or the other.  The
difference occurs if the substitution is inside double quotes: if so, we
need to tokenize to do pattern matching, while if not we need to
untokenize to make sure we don't.

It's still necessary to use a parameter as the key to guarantee all
characters are interpreted literally.  The issue is that we don't do
full argument parsing on the subscript; it's handled a bit like a
special case of double quoting (but with a different terminator), so
single and double quotes don't have their quoting effect there.  I don't
think we want to change this in a hurry.

I noticed meanwhile that the optimization for pattern-character-free
strings was being confused by multibyte mode; the only difference is
speed, so it's unlikely anybody would have noticed.

Index: Doc/Zsh/params.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/params.yo,v
retrieving revision 1.41
diff -u -r1.41 params.yo
--- Doc/Zsh/params.yo	25 Oct 2007 09:33:01 -0000	1.41
+++ Doc/Zsh/params.yo	25 Mar 2008 17:08:38 -0000
@@ -227,16 +227,14 @@
 If tt(KSH_ARRAYS) is in effect, the tt(-le) should be replaced by tt(-lt).
 
 Note that in subscripts with both `tt(r)' and `tt(R)' pattern characters
-are active even if they were substituted for a parameter (regardless
-of the setting of tt(GLOB_SUBST) which controls this feature in normal
-pattern matching).  It is therefore necessary to quote pattern characters
-for an exact string match.  Given a string in tt($key), and assuming
-the tt(EXTENDED_GLOB) option is set, the following is sufficient to
-match an element of an array tt($array) containing exactly the value of
-tt($key):
+are active even if they were substituted for a parameter (regardless of the
+setting of tt(GLOB_SUBST) which controls this feature in normal pattern
+matching).  The flag `tt(e)' can be added to inhibit pattern matching.  As
+this flag does not inhibit other forms of substitution, care is still
+required; using a parameter to hold the key has the desired effect:
 
-example(key2=${key//(#m)[\][+LPAR()+RPAR()\\*?#<>~^]/\\$MATCH}
-print ${array[(R)$key2]})
+example(key2='original key'
+print ${array[(Re)$key2]})
 )
 item(tt(R))(
 Like `tt(r)', but gives the last match.  For associative arrays, gives
@@ -283,11 +281,15 @@
 The delimiter character tt(:) is arbitrary; see above.
 )
 item(tt(e))(
-This flag has no effect and for ordinary arrays is retained for backward
-compatibility only.  For associative arrays, this flag can be used to
-force tt(*) or tt(@) to be interpreted as a single key rather than as a
-reference to all values.  This flag may be used on the left side of an
-assignment.
+This flag causes any pattern matching that would be performed on the
+subscript to use plain string matching instead.  Hence
+`tt(${array[(re)*]})' matches only the array element whose value is tt(*).
+Note that other forms of substitution such as parameter substitution are
+not inhibited.
+
+This flag can also be used to force tt(*) or tt(@) to be interpreted as
+a single key rather than as a reference to all values.  It may be used
+for either purpose on the left side of an assignment.
 )
 enditem()
 
Index: Src/params.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/params.c,v
retrieving revision 1.141
diff -u -r1.141 params.c
--- Src/params.c	10 Jan 2008 10:25:31 -0000	1.141
+++ Src/params.c	25 Mar 2008 17:08:38 -0000
@@ -1007,7 +1007,7 @@
     int hasbeg = 0, word = 0, rev = 0, ind = 0, down = 0, l, i, ishash;
     int keymatch = 0, needtok = 0, arglen, len;
     char *s = *str, *sep = NULL, *t, sav, *d, **ta, **p, *tt, c;
-    zlong num = 1, beg = 0, r = 0;
+    zlong num = 1, beg = 0, r = 0, quote_arg = 0;
     Patprog pprog = NULL;
 
     ishash = (v->pm && PM_TYPE(v->pm->node.flags) == PM_HASHED);
@@ -1058,8 +1058,7 @@
 		sep = "\n";
 		break;
 	    case 'e':
-		/* Compatibility flag with no effect except to prevent *
-		 * special interpretation by getindex() of `*' or `@'. */
+		quote_arg = 1;
 		break;
 	    case 'n':
 		t = get_strarg(++s, &arglen);
@@ -1286,7 +1285,10 @@
 	    }
 	}
 	if (!keymatch) {
-	    tokenize(s);
+	    if (quote_arg)
+		untokenize(s);
+	    else
+		tokenize(s);
 	    remnulargs(s);
 	    pprog = patcompile(s, 0, NULL);
 	} else
Index: Src/pattern.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/pattern.c,v
retrieving revision 1.41
diff -u -r1.41 pattern.c
--- Src/pattern.c	23 Oct 2007 16:09:10 -0000	1.41
+++ Src/pattern.c	25 Mar 2008 17:08:38 -0000
@@ -511,7 +511,7 @@
 
     if (!(patflags & PAT_ANY)) {
 	/* Look for a really pure string, with no tokens at all. */
-	if (!patglobflags
+	if (!(patglobflags & ~GF_MULTIBYTE)
 #ifdef __CYGWIN__
 	    /*
 	     * If the OS treats files case-insensitively and we
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.32
diff -u -r1.32 D04parameter.ztst
--- Test/D04parameter.ztst	11 Mar 2008 10:00:39 -0000	1.32
+++ Test/D04parameter.ztst	25 Mar 2008 17:08:43 -0000
@@ -282,6 +282,7 @@
   print ${(P)bar}
 0:${(P)...}
 >I'm nearly out of my mind with tedium
+#' deconfuse emacs
 
   foo=(I could be watching that programme I recorded)
   print ${(o)foo}
@@ -375,6 +376,7 @@
   print ${(QX)foo}
 1:${(QX)...}
 ?(eval):2: unmatched "
+# " deconfuse emacs
 
   array=(characters in an array)
   print ${(c)#array}
@@ -411,6 +413,7 @@
   print ${(pl.10..\x22..X.)foo}
 0:${(pl...)...}
 >Xresulting """"Xwords roariously """Xpadded
+#" deconfuse emacs
 
   print ${(l.5..X.r.5..Y.)foo}
   print ${(l.6..X.r.4..Y.)foo}
@@ -870,6 +873,7 @@
 0:Parameters associated with backreferences
 >match 12 16 match
 >1 1 1
+#' deconfuse emacs
 
   string='and look for a MATCH in here'
   if [[ ${(S)string%%(#m)M*H} = "and look for a  in here" ]]; then
@@ -1010,3 +1014,36 @@
 >fields
 >in
 >it
+
+  array=('%' '$' 'j' '*' '$foo')
+  print ${array[(i)*]} "${array[(i)*]}"
+  print ${array[(ie)*]} "${array[(ie)*]}"
+  key='$foo'
+  print ${array[(ie)$key]} "${array[(ie)$key]}"
+  key='*'
+  print ${array[(ie)$key]} "${array[(ie)$key]}"
+0:Matching array indices with and without quoting
+>1 1
+>4 4
+>5 5
+>4 4
+
+# Ordering of associative arrays is arbitrary, so we need to use
+# patterns that only match one element.
+  typeset -A assoc_r
+  assoc_r=(star '*' of '*this*' and '!that!' or '(the|other)')
+  print ${(kv)assoc_r[(re)*]}
+  print ${(kv)assoc_r[(re)*this*]}
+  print ${(kv)assoc_r[(re)!that!]}
+  print ${(kv)assoc_r[(re)(the|other)]}
+  print ${(kv)assoc_r[(r)*at*]}
+  print ${(kv)assoc_r[(r)*(ywis|bliss|kiss|miss|this)*]}
+  print ${(kv)assoc_r[(r)(this|that|\(the\|other\))]}
+0:Reverse subscripting associative arrays with literal matching
+>star *
+>of *this*
+>and !that!
+>or (the|other)
+>and !that!
+>of *this*
+>or (the|other)


-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/



Messages sorted by: Reverse Date, Date, Thread, Author