Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

[PATCH] zformat: add -qQ options to auto-escape %s



we were discussing w/54573 last week and mikael suggested that zformat
could have an option that quotes %s in the specs for you, for when the
result will be subject to further expansion (as is often the case in
completion functions)

i like that idea because: (1) the little ${...//\%/%%} dance is annoying
and (2) using that method to escape can actually break width specifiers
and ternary test numbers in the format string. extremely contrived
example:

  % d=%x
  % zformat -F REPLY '2 chars i wanted: %.2d' d:${d//\%/%%}
  % echo $REPLY
  2 chars i wanted: %%  # oops, garbage

with these new options, zformat will quote %s in the specs and adjust
width specifiers and ternary test numbers to account for the extra
characters:

  % d=%x
  % zformat -qF REPLY '2 chars i wanted: %.2d' d:$d
  % echo $REPLY
  2 chars i wanted: %%x  # extra % ignored, will have 2 chars as
                         # expected after subsequent processing

the -qn form can be used to implement oliver's suggestion in w/54578
that some specs be left un-escaped. though i haven't fully thought that
idea through yet

basically this is phase one of an alternate solution for w/54573

as part of this change i made zformat use the normal option-parsing
facilities. this shouldn't cause any issues for existing scripts

dana


diff --git a/Doc/Zsh/mod_zutil.yo b/Doc/Zsh/mod_zutil.yo
index a36104167..0bda242c9 100644
--- a/Doc/Zsh/mod_zutil.yo
+++ b/Doc/Zsh/mod_zutil.yo
@@ -158,11 +158,13 @@ var(pattern) matches at least one of the strings in the value.
 enditem()
 )
 findex(zformat)
-xitem(tt(zformat -f) var(param) var(format) var(spec) ...)
-xitem(tt(zformat -F) var(param) var(format) var(spec) ...)
+xitem(tt(zformat -f) [ tt(-qQ) [ var(n) ] ] var(param) var(format) var(spec) ...)
+xitem(tt(zformat -F) [ tt(-qQ) [ var(n) ] ] var(param) var(format) var(spec) ...)
 item(tt(zformat -a) var(array) var(sep) var(spec) ...)(
-This builtin provides different forms of formatting. The first form
-is selected with the tt(-f) option. In this case the var(format)
+This builtin provides different forms of formatting.
+
+The first form is selected with the tt(-f) option.
+In this case the var(format)
 string will be modified by replacing sequences starting with a percent 
 sign in it with strings from the var(spec)s.  Each var(spec) should be
 of the form `var(char)tt(:)var(string)' which will cause every
@@ -176,9 +178,16 @@ width makes the result be padded with spaces to the right if the
 var(string) is shorter than the requested width.  Padding to the left
 can be achieved by giving a negative minimum field width.  If a maximum 
 field width is specified, the var(string) will be truncated after that
-many characters.  After all `tt(%)' sequences for the given var(spec)s
-have been processed, the resulting string is stored in the parameter
-var(param).  The sequence `tt(%%)' can be used to produce a literal tt(%).
+many characters.
+
+Any `tt(%)' sequence in tt(format) that does not match a given var(spec)
+(or one of the special sequences described below) is output as-is.  If
+desired, these sequences may be processed by a second round of
+formatting, by prompt expansion, etc. DASH()- see also tt(-q).
+
+The sequence `tt(%%)' can be used to produce a literal tt(%).
+After all `tt(%)' sequences have been processed, the resulting string is
+stored in the parameter var(param).
 
 The tt(%)-escapes also understand ternary expressions in the form used by
 prompts.  The tt(%) is followed by a `tt(LPAR())' and then an ordinary
@@ -213,7 +222,36 @@ number the condition is true when the width is em(greater than) that
 number, and with a negative number the condition is true when the width
 is em(less than or equal to) the absolute value of that number.
 
-The form, using the tt(-a) option, can be used for aligning
+With tt(-q), `tt(%)' characters in the var(spec)s are escaped as they
+are inserted into the formatted string, and pre-escaped `tt(%)'
+characters in the format string are left as they are.  For example:
+
+example(zformat -qF REPLY '%%foo%% %B%d%b' d:%bar%)
+
+outputs `tt(%%foo%% %B%%bar%%%b)' to tt(REPLY).  This is useful when the
+formatted string will undergo further expansion DASH()- in this example
+the tt(%B)...tt(%b) sequences could be used with prompt expansion to
+produce bold text.  One notable use case is formatting a description to
+be passed to tt(compadd -x) in a completion function.
+
+tt(-q) may be followed by an optional integer argument var(n) to escape
+only the first var(n) var(spec)s.  For example,
+
+example(zformat -Fq1 REPLY '%D %d' d:%foo% D:%bar%)
+
+outputs `tt(%bar% %%foo%%)' to tt(REPLY).  This is the only case where
+the order that var(spec)s are given in is significant.
+
+Since the output with tt(-q) is expected to be subject to further
+processing, width specifiers don't count the extra escape characters,
+ensuring that the widths are correct em(after) that processing.
+Additionally, with tt(-F), ternary-expression test numbers are compared
+against the em(pre-escaped) spec lengths.
+
+tt(-Q) is like tt(-q) except that it interprets pre-escaped `tt(%)'
+characters in the format string as normal.
+
+The form using the tt(-a) option can be used for aligning
 strings.  Here, the var(spec)s are of the form
 `var(left)tt(:)var(right)' where `var(left)' and `var(right)' are
 arbitrary strings.  These strings are modified by replacing the colons
diff --git a/Src/Modules/zutil.c b/Src/Modules/zutil.c
index 53b3abe72..0367669cf 100644
--- a/Src/Modules/zutil.c
+++ b/Src/Modules/zutil.c
@@ -809,11 +809,13 @@ bin_zstyle(char *nam, char **args, UNUSED(Options ops), UNUSED(int func))
  *   olenp	*olenp is the size allocated for *outp
  *   endchar    Terminator character in addition to `\0' (may be '\0')
  *   presence   -F: Ternary expressions test emptyness instead
+ *   quote      -q: >0 if instr should be %-quoted
+ *   qspecs     qspecs[c] is the number of %s added by %-quoting
  *   skip	If 1, don't output, just parse.
  */
 static char *zformat_substring(char* instr, char **specs, char **outp,
 			       int *ousedp, int *olenp, int endchar,
-			       int presence, int skip)
+			       int presence, int quote, int *qspecs, int skip)
 {
     char *s;
 
@@ -848,7 +850,12 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
 	    // literally
 	    if (!testit && (!*s || *s == '%' || *s == ')' || *s == '-' || *s == '.')) {
 		// but swallow the % if this is %% or %)
-		start += (s - start == 1 && (*s == '%' || *s == ')'));
+		if (!quote) {
+		   start += (s - start == 1 && (*s == '%' || *s == ')'));
+		// if quoting, only swallow with %). admittedly this is confusing
+		} else {
+		   start += (s - start == 1 && *s == ')');
+		}
 		s = start;
 	    }
 
@@ -869,6 +876,8 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
 				actval = strlen(specs[(unsigned char) *s]);
 		        else
 			    actval = 1;
+			// don't count extra %s from quoting when testing this
+			actval -= qspecs[(unsigned char) *s];
 			actval = right ? (testval < actval) : (testval >= actval);
 		    } else {
 			if (right) /* put the sign back */
@@ -887,20 +896,36 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
 		 * Either skip true text and output false text, or
 		 * vice versa... unless we are already skipping.
 		 */
-		if (!(s = zformat_substring(s+1, specs, outp, ousedp,
-			    olenp, endcharl, presence, skip || actval)) || !*s)
+		if (!(s = zformat_substring(s+1, specs, outp, ousedp, olenp,
+			    endcharl, presence, quote, qspecs,
+			    skip || actval)) || !*s)
 		    return NULL;
-		if (!(s = zformat_substring(s+1, specs, outp, ousedp,
-			    olenp, ')', presence, skip || !actval)) || !*s)
+		if (!(s = zformat_substring(s+1, specs, outp, ousedp, olenp,
+			    ')', presence, quote, qspecs,
+			    skip || !actval)) || !*s)
 		    return NULL;
 	    } else if (skip) {
 		continue;
 	    } else if ((spec = specs[(unsigned char) *s])) {
-		int len;
+		int len, smin = min, smax = max;
+
+		// the assumption with quoted specs is that the output will be
+		// subject to further % expansion -- adjust width specifiers so
+		// so that the result will be correct *after* that expansion
+		if ((smin > 0 || smax > 0) && qspecs[(unsigned char) *s]) {
+		    int i;
+		    for (i = 0; spec[i]; i++) {
+			if (spec[i] == '%') {
+			    smin += (smin > 0 && i < smin) ? 1 : 0;
+			    smax += (smax > 0 && i < smax) ? 1 : 0;
+			    i++;
+			}
+		    }
+		}
 
-		if ((len = strlen(spec)) > max && max >= 0)
-		    len = max;
-		outl = (min >= 0 ? (min > len ? min : len) : len);
+		if ((len = strlen(spec)) > smax && smax >= 0)
+		    len = smax;
+		outl = (smin >= 0 ? (smin > len ? smin : len) : len);
 
 		if (*ousedp + outl >= *olenp) {
 		    int nlen = *olenp + outl + 128;
@@ -960,40 +985,91 @@ static char *zformat_substring(char* instr, char **specs, char **outp,
 }
 
 static int
-bin_zformat(char *nam, char **args, UNUSED(Options ops), UNUSED(int func))
+bin_zformat(char *nam, char **args, Options ops, UNUSED(int func))
 {
-    char opt;
-    int presence = 0;
+    unsigned char qopt = OPT_ISSET(ops, 'q') ? 'q' : OPT_ISSET(ops, 'Q') ? 'Q' : 0;
+    int presence = 0, quote = INT_MAX;
 
-    if (args[0][0] != '-' || !(opt = args[0][1]) || args[0][2]) {
-	zwarnnam(nam, "invalid argument: %s", args[0]);
+    if (OPT_ISSET(ops, 'q') && OPT_ISSET(ops, 'Q')) {
+	zwarnnam(nam, "only one of -qQ allowed");
+	return 1;
+    }
+    // the error here is more meaningful than the following ones with e.g. -q1F
+    if (OPT_HASARG(ops, qopt)) {
+	char *qptr;
+	quote = (int) zstrtol(OPT_ARG(ops, qopt), &qptr, 10);
+	if (quote < 0 || *qptr) {
+	    zwarnnam(nam, "bad argument to -%c: %s", qopt, OPT_ARG(ops, qopt));
+	    return 1;
+	}
+    }
+    if (OPT_ISSET(ops, 'a') + OPT_ISSET(ops, 'f') + OPT_ISSET(ops, 'F') < 1) {
+	zwarnnam(nam, "one of -afF expected");
+	return 1;
+    }
+    if (OPT_ISSET(ops, 'a') + OPT_ISSET(ops, 'f') + OPT_ISSET(ops, 'F') > 1) {
+	zwarnnam(nam, "only one of -afF allowed");
+	return 1;
+    }
+    if (OPT_ISSET(ops, 'a') && OPT_ISSET(ops, 'q')) {
+	zwarnnam(nam, "-q not allowed with -a");
 	return 1;
     }
-    args++;
 
-    switch (opt) {
+    switch (OPT_ISSET(ops, 'a') ? 'a' : OPT_ISSET(ops, 'f') ? 'f' : 'F') {
     case 'F':
 	presence = 1;
 	/* fall-through */
     case 'f':
 	{
 	    char **ap, *specs[256] = {0}, *out;
-	    int olen, oused = 0;
+	    int i, olen, oused = 0;
+	    int qspecs[256] = {0};
 
 	    /* Parse the specs in argv. */
-	    for (ap = args + 2; *ap; ap++) {
+	    for (i = 1, ap = args + 2; *ap; i++, ap++) {
 		if (!ap[0][0] || ap[0][0] == '-' || ap[0][0] == '.' ||
 		    ap[0][0] == '%' || ap[0][0] == ')' ||
 		    idigit(ap[0][0]) || ap[0][1] != ':') {
 		    zwarnnam(nam, "invalid spec: %s", *ap);
 		    return 1;
 		}
-		specs[(unsigned char) ap[0][0]] = ap[0] + 2;
+
+		// need to quote specs here because zformat_substring() won't
+		// know the order
+		if (qopt && quote >= i) {
+		    int len = 0, pct = 0;
+		    char *aptr, *sptr, *spec = *ap + 2;
+
+		    for (aptr = *ap + 2; *aptr; aptr++, len++) {
+			if (*aptr == '%') {
+			    len++, pct++;
+			}
+		    }
+
+		    if (pct) {
+			spec = (char *) zhalloc(len + 1);
+			sptr = spec;
+			for (aptr = *ap + 2; *aptr; aptr++) {
+			    *sptr++ = *aptr;
+			    if (*aptr == '%') {
+				*sptr++ = *aptr;
+			    }
+			}
+			*sptr = '\0';
+		    }
+
+		    specs[(unsigned char) ap[0][0]] = spec;
+		    qspecs[(unsigned char) ap[0][0]] = pct;
+		} else {
+		    specs[(unsigned char) ap[0][0]] = ap[0] + 2;
+		}
 	    }
+
 	    out = (char *) zhalloc(olen = 128);
 
 	    if (!zformat_substring(args[1], specs, &out, &oused, &olen, '\0',
-			presence, 0)) {
+			presence, qopt == 'q', qspecs, 0)) {
 		zwarnnam(nam, "malformed format string: %s", args[1]);
 		return 1;
 	    }
@@ -1093,7 +1169,7 @@ bin_zformat(char *nam, char **args, UNUSED(Options ops), UNUSED(int func))
 	}
 	break;
     }
-    zwarnnam(nam, "invalid option: -%c", opt);
+    DPUTS(1, "BUG: unhandled option");
     return 1;
 }
 
@@ -2069,7 +2145,7 @@ bin_zparseopts(char *nam, char **args, Options ops, UNUSED(int func))
 }
 
 static struct builtin bintab[] = {
-    BUILTIN("zformat", 0, bin_zformat, 3, -1, 0, NULL, NULL),
+    BUILTIN("zformat", 0, bin_zformat, 2, -1, 0, "afFq:%Q:%", NULL),
     BUILTIN("zparseopts", 0, bin_zparseopts, 0, -1, 0, "a:A:DEFGKMn:v:", NULL),
     BUILTIN("zregexparse", 0, bin_zregexparse, 3, -1, 0, "c", NULL),
     BUILTIN("zstyle", 0, bin_zstyle, 0, -1, 0, NULL, NULL),
diff --git a/Test/V13zformat.ztst b/Test/V13zformat.ztst
index 545d5e615..32245b7e2 100644
--- a/Test/V13zformat.ztst
+++ b/Test/V13zformat.ztst
@@ -93,8 +93,53 @@
   zformat REPLY ''
   zformat REPLY '' x:
 1:one of -f -F -a required
+?(eval):zformat:1: one of -afF expected
+?(eval):zformat:2: one of -afF expected
+
+  zformat -fff REPLY ''
+  zformat -FFF REPLY ''
+  zformat -aaa reply .
+0:duplicate -f -F -a ignored
+
+  zformat -af REPLY ''
+  zformat -fF REPLY '' x:
+1:more than one of -f -F -a not allowed
+?(eval):zformat:1: only one of -afF allowed
+?(eval):zformat:2: only one of -afF allowed
+
+  zformat -f
+  zformat -f REPLY
+  zformat -F
+  zformat -F REPLY
+1:-f and -F: param and format string required
+?(eval):zformat:1: not enough arguments
+?(eval):zformat:2: not enough arguments
+?(eval):zformat:3: not enough arguments
+?(eval):zformat:4: not enough arguments
+
+  zformat -a
+  zformat -a reply
+1:-a: param and separator required
 ?(eval):zformat:1: not enough arguments
-?(eval):zformat:2: invalid argument: REPLY
+?(eval):zformat:2: not enough arguments
+
+  zformat -a reply '' a:b && print -rl - $reply
+0:-a with empty separator
+>ab
+
+  zformat -F REPLY '<%1d>'  'd:é' && print -r - $REPLY
+  zformat -F REPLY '<%2d>'  'd:é' && print -r - $REPLY
+  zformat -F REPLY '<%3d>'  'd:é' && print -r - $REPLY
+  zformat -F REPLY '<%.1d>' 'd:é' && print -r - $REPLY
+  zformat -F REPLY '<%.2d>' 'd:é' && print -r - $REPLY
+  zformat -F REPLY '<%.3d>' 'd:é' && print -r - $REPLY
+0f:width specifier is multi-byte-aware
+><é>
+><é >
+><é  >
+><é>
+><é>
+><é>
 
   zformat -F REPLY %B  && print -r - $REPLY
   zformat -F REPLY %3B && print -r - $REPLY
@@ -227,3 +272,79 @@
 0:ternary expression returning literal % or )
 >%
 >)
+
+  zformat -qa reply .
+  zformat -aq reply .
+1:-a + -q not allowed
+?(eval):zformat:1: -q not allowed with -a
+?(eval):zformat:2: -q not allowed with -a
+
+  zformat -Fq     REPLY F && print -r - $REPLY
+  zformat -qF     REPLY F && print -r - $REPLY
+  zformat -FqF    REPLY F && print -r - $REPLY
+  zformat -q1 -F  REPLY F && print -r - $REPLY
+  zformat -q1F    REPLY F && print -r - $REPLY
+  zformat -q-1 -F REPLY F && print -r - $REPLY
+1:optional argument to -q
+>F
+>F
+>F
+>F
+?(eval):zformat:5: bad argument to -q: 1F
+?(eval):zformat:6: bad option: --
+
+# the spec order in the format string differs from the order in the arguments
+# here to make sure we're testing -qn's effects on the latter
+  for 1 in '' 0 1 2 3; do
+    zformat -Fq$1   REPLY '%%x %) %. %X %D %d' d:%foo% D:%bar% && print -r - $REPLY
+  done
+0:-q with and without optarg
+>%%x ) %. %X %%bar%% %%foo%%
+>%%x ) %. %X %bar% %foo%
+>%%x ) %. %X %bar% %%foo%%
+>%%x ) %. %X %%bar%% %%foo%%
+>%%x ) %. %X %%bar%% %%foo%%
+
+  zformat -Fq REPLY '%(x.%%/%d.%%/%D)' x:1 d:%foo% D:%bar% && print -r - $REPLY
+  zformat -Fq REPLY '%(X.%%/%d.%%/%D)' x:1 d:%foo% D:%bar% && print -r - $REPLY
+0:-q with ternary expression
+>%%/%%foo%%
+>%%/%%bar%%
+
+  zformat -Fq REPLY '<%1d>'  d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '<%5d>'  d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '<%6d>'  d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '<%7d>'  d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '<%-7d>' d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '<%-7d>' d:foo   && print -r - $REPLY
+0:-q: min-width specifier ignores extra %s
+><%%foo%%>
+><%%foo%%>
+><%%foo%% >
+><%%foo%%  >
+><  %%foo%%>
+><    foo>
+
+  zformat -Fq REPLY '%.1d' d:foo   && print -r - $REPLY
+  zformat -Fq REPLY '%.1d' d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '%.4d' d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '%.5d' d:%foo% && print -r - $REPLY
+0:-q: max-width specifier ignores extra %s
+>f
+>%%
+>%%foo
+>%%foo%%
+
+  zformat -Fq REPLY '%4(d.t.f) %5(d.t.f) %6(d.t.f)' d:%foo% && print -r - $REPLY
+  zformat -Fq REPLY '%-6(d.t.f) %-5(d.t.f) %-4(d.t.f)' d:%foo% && print -r - $REPLY
+0:-q: ternary width test ignores extra %s
+>t f f
+>t t f
+
+  zformat -FQ  REPLY '%%x %) %. %X %D %d' d:%foo% D:%bar% && print -r - $REPLY
+  zformat -FQ0 REPLY '%%x %) %. %X %D %d' d:%foo% D:%bar% && print -r - $REPLY
+  zformat -FQ1 REPLY '%%x %) %. %X %D %d' d:%foo% D:%bar% && print -r - $REPLY
+0:-Q, -Q0, -Q1
+>%x ) %. %X %%bar%% %%foo%%
+>%x ) %. %X %bar% %foo%
+>%x ) %. %X %bar% %%foo%%




Messages sorted by: Reverse Date, Date, Thread, Author