Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: field splitting with empty fields



On Tue, 30 Oct 2007 12:30:25 +0000
Stephane Chazelas <Stephane_Chazelas@xxxxxxxx> wrote:
> Best would probably be to do a quick search for (f) and (s:...:)
> in /usr/share/zsh to see if any of them rely on that.
> 
> I can see for instance:
> 
> 4.3.4/functions/Completion/Unix/_java_class:for i in "${(s.:.)classpath}"; do

It does look suspiciously like we can't rely on people expecting the
behaviour I'd like them to expect, although in that particular case it
wouldn't matter.

Here's a patch so that an explicit (@) forces it to do the right thing,
together with some documentation and a test.  Luckily it's quite a simple
change.

It's not impossible that this breaks something, somewhere, but I don't have
a great deal of sympathy in that case since the code would be entirely
against the spirit of "$@"-style substitution.

Index: Doc/Zsh/expn.yo
===================================================================
RCS file: /cvsroot/zsh/zsh/Doc/Zsh/expn.yo,v
retrieving revision 1.82
diff -u -r1.82 expn.yo
--- Doc/Zsh/expn.yo	11 Oct 2007 09:06:20 -0000	1.82
+++ Doc/Zsh/expn.yo	30 Oct 2007 13:07:12 -0000
@@ -960,6 +960,17 @@
 characters means that all of them must match in sequence; this differs from
 the treatment of two or more characters in the tt(IFS) parameter.
 See also the tt(=) flag and the tt(SH_WORD_SPLIT) option.
+
+For historical reasons, the usual behaviour that empty array elements
+are retained inside double quotes is disabled for arrays generated
+by splitting; hence the following:
+
+example(line="one::three"
+print -l "${(s.:.)line}")
+
+produces two lines of output for tt(one) and tt(three) and elides the
+empty field.  To override this behaviour, supply the "(@)" flag as well,
+i.e.  tt("${(@s.:.)line}").
 )
 enditem()
 
Index: Src/subst.c
===================================================================
RCS file: /cvsroot/zsh/zsh/Src/subst.c,v
retrieving revision 1.79
diff -u -r1.79 subst.c
--- Src/subst.c	27 Jun 2007 13:56:11 -0000	1.79
+++ Src/subst.c	30 Oct 2007 13:07:14 -0000
@@ -1260,13 +1260,16 @@
      * parameter (the value v) to storing them in val and aval.
      * However, sometimes you find v reappearing temporarily.
      *
-     * The values -1 and 2 are special to isarr.  It looks like 2 is
-     * some kind of an internal flag to do with whether the array's been
-     * copied, in which case I don't know why we don't use the copied
-     * flag, but they do both occur close together so they presumably
-     * have different effects.  The value -1 is used to force us to
-     * keep an empty array.  It's tested in the YUK chunk (I mean the
-     * one explicitly marked as such).
+     * The values -1 and 2 are special to isarr.  The value -1 is used
+     * to force us to keep an empty array.  It's tested in the YUK chunk
+     * (I mean the one explicitly marked as such).  The value 2
+     * indicates an array has come from splitting a scalar.  We use
+     * that to override the usual rule that in double quotes we don't
+     * remove empty elements (so "${(s.:):-foo::bar}" produces two
+     * words).  This seems to me to be quite the wrong thing to do,
+     * but it looks like code may be relying on it.  So we require (@)
+     * as well before we keep the empty fields (look for assignments
+     * like "isarr = nojoin ? 1 : 2").
      */
     int isarr = 0;
     /*
@@ -2453,7 +2456,7 @@
 		    char *arr[2], **t, **a, **p;
 		    if (spsep || spbreak) {
 			aval = sepsplit(val, spsep, 0, 1);
-			isarr = 2;
+			isarr = nojoin ? 1 : 2;
 			l = arrlen(aval);
 			if (l && !*(aval[l-1]))
 			    l--;
@@ -2772,7 +2775,7 @@
 	    else if (!aval[1])
 		val = aval[0];
 	    else
-		isarr = 2;
+		isarr = nojoin ? 1 : 2;
 	}
 	if (isarr)
 	    l->list.flags |= LF_ARRAY;
@@ -2974,7 +2977,7 @@
 	    val = getdata(firstnode(list));
 	else {
 	    aval = hlinklist2array(list, 0);
-	    isarr = 2;
+	    isarr = nojoin ? 1 : 2;
 	    l->list.flags |= LF_ARRAY;
 	}
 	copied = 1;
Index: Test/D04parameter.ztst
===================================================================
RCS file: /cvsroot/zsh/zsh/Test/D04parameter.ztst,v
retrieving revision 1.28
diff -u -r1.28 D04parameter.ztst
--- Test/D04parameter.ztst	23 Aug 2007 22:04:25 -0000	1.28
+++ Test/D04parameter.ztst	30 Oct 2007 13:07:14 -0000
@@ -942,3 +942,35 @@
 >some
 >sunny
 >day
+
+  foo="line:with::missing::fields:in:it"
+  print -l ${(s.:.)foo}
+0:Removal of empty fields in unquoted splitting
+>line
+>with
+>missing
+>fields
+>in
+>it
+
+  foo="line:with::missing::fields:in:it"
+  print -l "${(s.:.)foo}"
+0:Hacky removal of empty fields in quoted splitting with no "@"
+>line
+>with
+>missing
+>fields
+>in
+>it
+
+  foo="line:with::missing::fields:in:it"
+  print -l "${(@s.:.)foo}"
+0:Retention of empty fields in quoted splitting with "@"
+>line
+>with
+>
+>missing
+>
+>fields
+>in
+>it

-- 
Peter Stephenson <pws@xxxxxxx>                  Software Engineer
CSR PLC, Churchill House, Cambridge Business Park, Cowley Road
Cambridge, CB4 0WZ, UK                          Tel: +44 (0)1223 692070



Messages sorted by: Reverse Date, Date, Thread, Author