Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [PATCH] Support the mksh's ${|func;} substitution



On Mon, Jul 3, 2023 at 6:55 PM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
>
> Also in that thread Stephane points out that it would be ideal if
> (paraphrasing his example) "${| echo { }" were parsed similarly to "{
> echo { }" and that to do so would require changes to the parser

More specifically, it's going to require changes to the tokenizer.
There would need to be an analog of skipcomm() that knows how to stop
at the appropriately significant Outbrace instead of at Outpar, and
similarly a skipparens() analog that does the same for a string.  For
the nonce it gets most of the way there to require that braces inside
${|...} are either balanced or quoted.

One of the harder things to get right is the "%_" prompt replacement
for this.  I ended up with "braceparam cursh" because that's otherwise
impossible, as opposed to "braceparam cmdsubst" or just "cmdsubst" or
even "cmdsubst cursh" any of which could result from other valid
syntaxes.

As an aside ... compatibility-wise we're stuck with using braces for
this, but why not parens like $(|...) instead, by analogy with
$(<...)?  Parsing would be so much ... saner ... and the whole
fake-it-with $REPLY bit could be skipped, go straight to capturing
stdout.

On Tue, Jul 4, 2023 at 10:32 PM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
>
> -- REPLY is treated as a local, i.e., it's value gets saved and
> restored around the substitution

I've implemented this, but after experimenting with it both ways, I've
done so only for REPLY, that is, other cases of $VAR as in ${|VAR|...}
remain non-localized.  Again I wonder whether instead of being
initialized as unset, REPLY should be initialized to the value from
the calling scope, so that ${+REPLY} is usable.  The drawback of
course is that the called code must then explicitly clobber $REPLY if
an empty result is desired.

> -- "local" works inside the substitution as it would inside a function
> body, but $@ refers to the calling environment.

Implemented this.  Combined with the foregoing, this means that
${|VAR|local VAR; ...} substitutes the value of $VAR from the caller,
not from the command (so, don't do that unless you mean it).

> -- "return" also behaves as if in a function body.

Also implemented this.  Following Chet's lead, though, "exit" does
exit the shell.  I could instead make "exit" work like ${notset?error}
and thus exit only if the shell is not interactive.  Thoughts?

I have not yet implemented the anonymous tempfile for ${
capture_stdout }.  However, this equivalent construction does work
(extra newlines for clarity):
  ${|
    () {
      capture_stdout > $1
      REPLY=$(<$1)
    } =(</dev/null)
  }

Both $(<...) and =(<...) are non-forking shortcuts already, so this
acts as proof of concept.

Should test cases go in D04parameter, or D08cmdsubst, or a new D10 file?

Attached patch should apply in either order with the named references
patch from workers/51945 although there will be line-number fuzz in
the docs.
diff --git a/Doc/Zsh/expn.yo b/Doc/Zsh/expn.yo
index 7bc736470..44867e655 100644
--- a/Doc/Zsh/expn.yo
+++ b/Doc/Zsh/expn.yo
@@ -1875,23 +1875,51 @@ sect(Command Substitution)
 cindex(command substitution)
 cindex(substitution, command)
 A command enclosed in parentheses preceded by a dollar sign, like
-`tt($LPAR())...tt(RPAR())', or quoted with grave
-accents, like `tt(`)...tt(`)', is replaced with its standard output, with
-any trailing newlines deleted.
-If the substitution is not enclosed in double quotes, the
-output is broken into words using the tt(IFS) parameter.
+`tt($LPAR())...tt(RPAR())', or quoted with grave accents, like
+`tt(`)...tt(`)', is executed in a subshell and replaced by its
+standard output, with any trailing newlines deleted.  If the
+substitution is not enclosed in double quotes, the output is broken
+into words using the tt(IFS) parameter.
 vindex(IFS, use of)
 
 The substitution `tt($LPAR()cat) var(foo)tt(RPAR())' may be replaced
 by the faster `tt($LPAR()<)var(foo)tt(RPAR())'.  In this case var(foo)
 undergoes single word shell expansions (em(parameter expansion),
 em(command substitution) and em(arithmetic expansion)), but not
-filename generation.
+filename generation.  No subshell is created.
 
 If the option tt(GLOB_SUBST) is set, the result of any unquoted command
 substitution, including the special form just mentioned, is eligible for
 filename generation.
 
+A command with a leading pipe character, enclosed in braces prefixed by
+a dollar sign, as in `tt(${|)...tt(})', is executed in the current shell
+context, rather than in a subshell, and is replaced by the value of the
+parameter tt(REPLY) at the end of the command.  There em(must not) be
+any whitespace between the opening brace and the pipe character.  Any
+prior value of tt($REPLY) is saved and restored around this substitution,
+in the manner of a function local parameter.  Other parameters declared
+within the substitution also behave as locals, as if in a function,
+unless `tt(typeset -g)' is used.  Trailing newlines are em(not) deleted
+from the final replacement in this case, and it is subject to filename
+generation in the same way as `tt($LPAR())...tt(RPAR())' but is em(not)
+split on tt(IFS) unless the tt(SH_WORD_SPLIT) option is set.
+
+Substitutions of the form `tt(${|)var(param)tt(|)...tt(})' are similar,
+except that the substitution is replaced by the value of the parameter
+named by var(param).  No implicit save or restore applies to var(param)
+except as noted for tt(REPLY), and var(param) should em(not) be declared
+within the command.  If var(param) names an array, array expansion rules
+apply.
+
+COMMENT(To be implemented later:
+A command enclosed in braces preceded by a dollar sign, and set off from
+the braces by whitespace, like `tt(${ )...tt( })', is replaced by its
+standard output.  Like `tt(${|)...tt(})' and unlike
+`tt($LPAR())...tt(RPAR())', the command executes in the current shell
+context with function local behaviors and does not create a subshell.
+)
+
 texinode(Arithmetic Expansion)(Brace Expansion)(Command Substitution)(Expansion)
 sect(Arithmetic Expansion)
 cindex(arithmetic expansion)
diff --git a/Src/lex.c b/Src/lex.c
index 2f7937410..73d94264e 100644
--- a/Src/lex.c
+++ b/Src/lex.c
@@ -937,7 +937,7 @@ static enum lextok
 gettokstr(int c, int sub)
 {
     int bct = 0, pct = 0, brct = 0, seen_brct = 0, fdpar = 0;
-    int intpos = 1, in_brace_param = 0;
+    int intpos = 1, in_brace_param = 0, cmdsubst = 0;
     int inquote, unmatched = 0;
     enum lextok peek;
 #ifdef DEBUG
@@ -1157,8 +1157,11 @@ gettokstr(int c, int sub)
 	    if (in_brace_param) {
 		cmdpop();
 	    }
-	    if (bct-- == in_brace_param)
-		in_brace_param = 0;
+	    if (bct-- == in_brace_param) {
+		if (cmdsubst)
+		    cmdpop();
+		in_brace_param = cmdsubst = 0;
+	    }
 	    c = Outbrace;
 	    break;
 	case LX2_COMMA:
@@ -1405,10 +1408,15 @@ gettokstr(int c, int sub)
        }
        add(c);
        c = hgetc();
-	if (intpos)
+       if (intpos)
 	    intpos--;
-	if (lexstop)
+       if (lexstop)
 	    break;
+       if (!cmdsubst && in_brace_param && act == LX2_STRING &&
+	   (c == '|' || c == Bar || inblank(c))) {
+	   cmdsubst = in_brace_param;
+	   cmdpush(CS_CURSH);
+       }
     }
   brk:
     if (errflag) {
@@ -1459,7 +1467,7 @@ gettokstr(int c, int sub)
 static int
 dquote_parse(char endchar, int sub)
 {
-    int pct = 0, brct = 0, bct = 0, intick = 0, err = 0;
+    int pct = 0, brct = 0, bct = 0, intick = 0, err = 0, cmdsubst = 0;
     int c;
     int math = endchar == ')' || endchar == ']' || infor;
     int zlemath = math && zlemetacs > zlemetall + addedx - inbufct;
@@ -1529,11 +1537,21 @@ dquote_parse(char endchar, int sub)
 		c = Qstring;
 	    }
 	    break;
+	case '{':
+	    if (cmdsubst && !intick) {
+		/* In nofork substitution, tokenize as if unquoted */
+		c = Inbrace;
+		bct++;
+	    }
+	    break;
 	case '}':
 	    if (intick || !bct)
 		break;
 	    c = Outbrace;
-	    bct--;
+	    if (bct-- == cmdsubst) {
+		cmdpop();
+		cmdsubst = 0;
+	    }
 	    cmdpop();
 	    break;
 	case '`':
@@ -1588,12 +1606,24 @@ dquote_parse(char endchar, int sub)
 	if (err || lexstop)
 	    break;
 	add(c);
+	if (!cmdsubst && c == Inbrace) {
+	    /* Check for ${|...} nofork command substitution */
+	    if ((c = hgetc())) {
+		if (c == '|' || inblank(c)) {
+		    cmdsubst = bct;
+		    cmdpush(CS_CURSH);
+		}
+		hungetc(c);
+	    }
+	}
     }
     if (intick == 2)
 	ALLOWHIST
     if (intick) {
 	cmdpop();
     }
+    if (bct && bct == cmdsubst)
+	cmdpop();
     while (bct--)
 	cmdpop();
     if (lexstop)
diff --git a/Src/subst.c b/Src/subst.c
index 14947ae36..92a53d99a 100644
--- a/Src/subst.c
+++ b/Src/subst.c
@@ -1860,6 +1860,8 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
      * joining the array into a string (for compatibility with ksh/bash).
      */
     int quoted_array_with_offset = 0;
+    /* Indicates ${|...;} */
+    char *rplyvar = NULL;
 
     *s++ = '\0';
     /*
@@ -1887,8 +1889,104 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
      * flags in parentheses, but also one ksh hack.
      */
     if (c == Inbrace) {
+	/* The command string to be run by ${|...;} */
+	char *cmdarg = NULL;
+	size_t slen = 0;
 	inbrace = 1;
 	s++;
+
+        /* Short-path for the nofork command substitution ${|cmd;}
+	 * See other comments about kludges for why this is here.
+	 *
+         * The command string is extracted and executed, and the
+         * substitution assigned. There's no (...)-flags processing,
+         * i.e. no ${|(U)cmd;}, because it looks quite awful and
+         * should not be part of command substitution in any case.
+         * Use ${(U)${|cmd;}} as you would for ${(U)$(cmd;)}.
+	 */
+	if (*s == '|' || *s == Bar) {
+	    char *outbracep = s;
+	    char sav = *s;
+	    *s = Inbrace;
+	    if (skipparens(Inbrace, Outbrace, &outbracep) == 0) {
+		slen = outbracep - s - 1;
+		if ((*s = sav) != Bar) {
+		    sav = *outbracep;
+		    *outbracep = '\0';
+		    tokenize(s);
+		    *outbracep = sav;
+		}
+	    }
+	}
+	if (slen > 1) {
+	    char *outbracep = s + slen;
+	    if (*outbracep == Outbrace) {
+		if ((rplyvar = itype_end(s+1, INAMESPC, 0))) {
+		    if (*rplyvar == Inbrack &&
+			(rplyvar = parse_subscript(++rplyvar, 1, ']')))
+			++rplyvar;
+		}
+		if (rplyvar == s+1 && *rplyvar == Bar) {
+		    /* Is ${||...} a subtitution error or a syntax error?
+		    zerr("bad substitution");
+		    return NULL;
+		    */
+		    rplyvar = NULL;
+		}
+		if (rplyvar && *rplyvar == Bar) {
+		    cmdarg = dupstrpfx(rplyvar+1, outbracep-rplyvar-1);
+		    rplyvar = dupstrpfx(s+1,rplyvar-s-1);
+		} else {
+		    cmdarg = dupstrpfx(s+1, outbracep-s-1);
+		    rplyvar = "REPLY";
+		}
+		s = outbracep;
+	    }
+	}
+
+	if (rplyvar) {
+	    Param pm;
+	    /* char *rplyval = getsparam("REPLY"); */
+	    startparamscope(); /* "local" behaves as if in a function */
+	    pm = createparam("REPLY", PM_LOCAL|PM_UNSET);
+	    if (pm)	/* Shouldn't createparam() do this? */
+		pm->level = locallevel;
+	    /* if (rplyval) setsparam("REPLY", ztrdup(rplyval)); */
+	}
+
+	if (rplyvar && cmdarg && *cmdarg) {
+	    /* Execute the shell command */
+	    untokenize(cmdarg);
+	    execstring(cmdarg, 1, 0, "cmdsubst");
+	}
+
+	if (rplyvar) {
+	    if (strcmp(rplyvar, "REPLY") == 0) {
+		if ((val = ztrdup(getsparam("REPLY"))))
+		    vunset = 0;
+		else {
+		    vunset = 1;
+		    val = dupstring("");
+		}
+	    } else {
+		s = dyncat(rplyvar, s);
+		rplyvar = NULL;
+	    }
+	    endparamscope();
+	    if (exit_pending) {
+		if (mypid == getpid()) {
+		    /*
+		     * paranoia: don't check for jobs, but there
+		     * shouldn't be any if not interactive.
+		     */
+		    stopmsg = 1;
+		    zexit(exit_val, ZEXIT_NORMAL);
+		} else
+		    _exit(exit_val);
+	    }
+	    retflag = 0; /* "return" behaves as if in a function */
+	}
+
 	/*
 	 * In ksh emulation a leading `!' is a special flag working
 	 * sort of like our (k).  This is true only for arrays or
@@ -2583,14 +2681,14 @@ paramsubst(LinkList l, LinkNode n, char **str, int qt, int pf_flags,
 	 * we let fetchvalue set the main string pointer s to
 	 * the end of the bit it's fetched.
 	 */
-	if (!(v = fetchvalue(&vbuf, (subexp ? &ov : &s),
-			     (wantt ? -1 :
-			      ((unset(KSHARRAYS) || inbrace) ? 1 : -1)),
-			     scanflags)) ||
-	    (v->pm && (v->pm->node.flags & PM_UNSET)) ||
-	    (v->flags & VALFLAG_EMPTY))
+	if (!rplyvar &&
+	    (!(v = fetchvalue(&vbuf, (subexp ? &ov : &s),
+			      (wantt ? -1 :
+			       ((unset(KSHARRAYS) || inbrace) ? 1 : -1)),
+			      scanflags)) ||
+	     (v->pm && (v->pm->node.flags & PM_UNSET)) ||
+	     (v->flags & VALFLAG_EMPTY)))
 	    vunset = 1;
-
 	if (wantt) {
 	    /*
 	     * Handle the (t) flag: value now becomes the type


Messages sorted by: Reverse Date, Date, Thread, Author