Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author
[PATCH] Add zsh/re2 module with conditions

X-seq: zsh-workers 39234
From: Phil Pennock <zsh-workers+phil.pennock@xxxxxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: [PATCH] Add zsh/re2 module with conditions
Date: Thu, 8 Sep 2016 00:15:57 -0400
Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=spodhuis.org; s=d201608; h=Content-Type:MIME-Version:Message-ID:Subject:To: From:Date; bh=soI/w+Dl0J0Ty5Po38J2ktdgATvlbpWXB6jL+URray8=; b=oc3tLV8FxYmBCD0 GqsIbNHs8anYbvDZuCUPqs9mpVD8Nn5nhBm1oOQzaD3urYPtURtjm46pHwrvFqDdIa48xgvQdomTn KU73w2R9KrRa5Yi9FWaqcQ7JAbyd+sDEKfRdj4scevEnKbu+N0FJ1+EXWvzcTVhAajG1fNAJ8qxvU j8NTKAqhfLJRL2pWsTsMvZl2XYokiLljlxoJ1O3;
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mail-followup-to: zsh-workers@xxxxxxx
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
Openpgp: url=https://www.security.spodhuis.org/PGP/keys/0x4D1E900E14C1CC04.asc
Folks,

I tend to get automatically kicked off the -workers list by ezmlm
because I reject mails which are self-declared as spam, so please CC
replies to me.  Also: my commit-bit is currently
surrended-for-safekeeping because I've not been doing much with Zsh, so
someone else will need to merge this, if it's accepted.

RE2 is a regular expression library, written in C++, from Google.  It
offers most of the features of PCRE, excluding those which can't be
handled without backtracking.  It's BSD-licensed.  This patch adds the
zsh/re2 module.  It used the `cre` library to have C-language bindings.

At this point, I haven't done anything about rebinding =~ to handle
this.  It's purely new infix-operators based on words.  I'm thinking
perhaps something along the lines of $zsh_reop_modules=(regex), with
`setopt rematch_pcre` becoming a compatibility interface that acts as
though `pcre` were prepended to that list and

  zsh_reop_modules=(pcre regex)

having the same effect.  Then I could use `zsh_reop_modules=(re2 regex)`.
Does this seem sane?  Anyone have better suggestions?  I do want to have
=~ able to use this module, but the current work stands alone and should
be merge-able as-is.

Is there particular interest in having command-forms too?  There's no
"study" concept, but I suppose compiling a hairy regexp only once might
be good in some situations (but why use shell for those?)

This has been tested on MacOS 10.10.5.

My ulterior motive is that I want "better than zsh/regex" available by
default on MacOS, where Apple build without GPL modules for the system
Zsh.  I hope that by offering this option, Apple's engineers might
incorporate this one day and I can be happier. :)

I've also pushed this code to a GitHub repo, philpennock/zsh-code on the
re2 branch: https://github.com/philpennock/zsh-code/tree/re2

Tested with re2 20160901 installed via Brew, cre2 installed via:

    git clone https://github.com/marcomaggi/cre2
    cd cre2
      LIBTOOLIZE=glibtoolize sh ./autogen.sh
      CXX=g++-6 CC=gcc-6 ./configure --prefix=/opt/regexps
      make doc/stamp-vti
      make
      make install

and Zsh configured with:

    CPPFLAGS=-I/opt/regexps/include LDFLAGS=-L/opt/regexps/lib \
      ./configure --prefix=/opt/zsh-devel --enable-pcre --enable-re2 \
         --enable-cap --enable-multibyte --enable-zsh-secure-free \
         --with-tcsetpgrp --enable-etcdir=/etc

Feedback welcome.
(Oh, I can't spell "tough", it seems; deferring fix for now).

Regards,
-Phil

----------------------------8< git patch >8-----------------------------
Add support for Google's BSD-licensed RE2 library, via the cre
C-language bindings (also BSD-licensed).

Guard with --enable-re2 for now.

Adds 4 infix conditions.  Currently no commands, no support for changing
how =~ binds.

Includes tests & docs
---
 Doc/Makefile.in     |   2 +-
 Doc/Zsh/mod_re2.yo  |  65 +++++++++++
 INSTALL             |   8 ++
 Src/Modules/re2.c   | 324 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 Src/Modules/re2.mdd |   5 +
 Test/V11re2.ztst    | 170 +++++++++++++++++++++++++++
 configure.ac        |  14 +++
 7 files changed, 587 insertions(+), 1 deletion(-)
 create mode 100644 Doc/Zsh/mod_re2.yo
 create mode 100644 Src/Modules/re2.c
 create mode 100644 Src/Modules/re2.mdd
 create mode 100644 Test/V11re2.ztst

diff --git a/Doc/Makefile.in b/Doc/Makefile.in
index 2752096..8c00876 100644
--- a/Doc/Makefile.in
+++ b/Doc/Makefile.in
@@ -65,7 +65,7 @@ Zsh/mod_datetime.yo Zsh/mod_db_gdbm.yo Zsh/mod_deltochar.yo \
 Zsh/mod_example.yo Zsh/mod_files.yo Zsh/mod_langinfo.yo \
 Zsh/mod_mapfile.yo Zsh/mod_mathfunc.yo Zsh/mod_newuser.yo \
 Zsh/mod_parameter.yo Zsh/mod_pcre.yo Zsh/mod_private.yo \
-Zsh/mod_regex.yo Zsh/mod_sched.yo Zsh/mod_socket.yo \
+Zsh/mod_re2.yo Zsh/mod_regex.yo Zsh/mod_sched.yo Zsh/mod_socket.yo \
 Zsh/mod_stat.yo  Zsh/mod_system.yo Zsh/mod_tcp.yo \
 Zsh/mod_termcap.yo Zsh/mod_terminfo.yo \
 Zsh/mod_zftp.yo Zsh/mod_zle.yo Zsh/mod_zleparameter.yo \
diff --git a/Doc/Zsh/mod_re2.yo b/Doc/Zsh/mod_re2.yo
new file mode 100644
index 0000000..5527440
--- /dev/null
+++ b/Doc/Zsh/mod_re2.yo
@@ -0,0 +1,65 @@
+COMMENT(!MOD!zsh/re2
+Interface to the RE2 regular expression library.
+!MOD!)
+cindex(regular expressions)
+cindex(re2)
+The tt(zsh/re2) module makes available the following test conditions:
+
+startitem()
+findex(re2-match)
+item(var(expr) tt(-re2-match) var(regex))(
+Matches a string against an RE2 regular expression.
+On successful match,
+matched portion of the string will normally be placed in the tt(MATCH)
+variable.  If there are any capturing parentheses within the regex, then
+the tt(match) array variable will contain those.
+If the match is not successful, then the variables will not be altered.
+
+In addition, the tt(MBEGIN) and tt(MEND) variables are updated to point
+to the offsets within var(expr) for the beginning and end of the matched
+text, with the tt(mbegin) and tt(mend) arrays holding the beginning and
+end of each substring matched.
+
+If tt(BASH_REMATCH) is set, then the array tt(BASH_REMATCH) will be set
+instead of all of the other variables.
+
+Canonical documentation for this syntax accepted by this regular expression
+engine can be found at:
+uref(https://github.com/google/re2/wiki/Syntax)
+)
+enditem()
+
+startitem()
+findex(re2-match-posix)
+item(var(expr) tt(-re2-match-posix) var(regex))(
+Matches as per tt(-re2-match) but configuring the RE2 engine to use
+POSIX syntax.
+)
+enditem()
+
+startitem()
+findex(re2-match-posixperl)
+item(var(expr) tt(-re2-match-posixperl) var(regex))(
+Matches as per tt(-re2-match) but configuring the RE2 engine to use
+POSIX syntax, with the Perl classes and word-boundary extensions re-enabled
+too.
+
+This thus adds support for:
+tt(\d), tt(\s), tt(\w), tt(\D), tt(\S), tt(\W), tt(\b), and tt(\B).
+)
+enditem()
+
+startitem()
+findex(re2-match-longest)
+item(var(expr) tt(-re2-match-longest) var(regex))(
+Matches as per tt(-re2-match) but configuring the RE2 engine to find
+the longest match, instead of the left-most.
+
+For example, given
+
+example([[ abb -re2-match-longest ^a+LPAR()b|bb+RPAR() ]])
+
+This will match the right-branch, thus tt(abb), where tt(-re2-match) would
+instead match only tt(ab).
+)
+enditem()
diff --git a/INSTALL b/INSTALL
index 99895bd..887dd8e 100644
--- a/INSTALL
+++ b/INSTALL
@@ -558,6 +558,14 @@ only be searched for if the option --enable-pcre is passed to configure.
 
 (Future versions of the shell may have a better fix for this problem.)
 
+--enable-re2:
+
+The RE2 library is written in C++, so a C-library shim layer is needed for
+use by Zsh.  We use https://github.com/marcomaggi/cre2 for this, which is
+currently at version 0.3.1.  Both re2 and cre2 need to be installed for
+this option to successfully enable the zsh/re2 module.  The Zsh
+functionality is currently experimental.
+
 --enable-cap:
 
 This searches for POSIX capabilities; if found, the `cap' library
diff --git a/Src/Modules/re2.c b/Src/Modules/re2.c
new file mode 100644
index 0000000..e542723
--- /dev/null
+++ b/Src/Modules/re2.c
@@ -0,0 +1,324 @@
+/*
+ * re2.c
+ *
+ * This file is part of zsh, the Z shell.
+ *
+ * Copyright (c) 2016 Phil Pennock
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, without written agreement and without
+ * license or royalty fees, to use, copy, modify, and distribute this
+ * software and to distribute modified versions of this software for any
+ * purpose, provided that the above copyright notice and the following
+ * two paragraphs appear in all copies of this software.
+ *
+ * In no event shall Phil Pennock or the Zsh Development Group be liable
+ * to any party for direct, indirect, special, incidental, or consequential
+ * damages arising out of the use of this software and its documentation,
+ * even if Phil Pennock and the Zsh Development Group have been advised of
+ * the possibility of such damage.
+ *
+ * Phil Pennock and the Zsh Development Group specifically disclaim any
+ * warranties, including, but not limited to, the implied warranties of
+ * merchantability and fitness for a particular purpose.  The software
+ * provided hereunder is on an "as is" basis, and Phil Pennock and the
+ * Zsh Development Group have no obligation to provide maintenance,
+ * support, updates, enhancements, or modifications.
+ *
+ */
+
+/* This is heavily based upon my earlier regex module, with Peter's fixes
+ * for the tought stuff I had skipped / gotten wrong. */
+
+#include "re2.mdh"
+#include "re2.pro"
+
+/*
+ * re2 itself is a C++ library; zsh needs C language bindings.
+ * These come from <https://github.com/marcomaggi/cre2>.
+ */
+#include <cre2.h>
+
+/* the conditions we support */
+#define ZRE2_COND_RE2		0
+#define ZRE2_COND_POSIX		1
+#define ZRE2_COND_POSIXPERL	2
+#define ZRE2_COND_LONGEST	3
+
+/**/
+static int
+zcond_re2_match(char **a, int id)
+{
+    cre2_regexp_t *rex;
+    cre2_options_t *opt;
+    cre2_string_t *m, *matches = NULL;
+    char *lhstr, *lhstr_zshmeta, *rhre, *rhre_zshmeta;
+    char **result_array, **x;
+    char *s;
+    char **mbegin, **mend, **bptr, **eptr;
+    size_t matchessz = 0;
+    int return_value, ncaptures, matched, nelem, start, n, indexing_base;
+    int remaining_len, charlen;
+    zlong offs;
+
+    return_value = 0; /* 1 => matched successfully */
+
+    lhstr_zshmeta = cond_str(a,0,0);
+    rhre_zshmeta = cond_str(a,1,0);
+    lhstr = ztrdup(lhstr_zshmeta);
+    unmetafy(lhstr, NULL);
+    rhre = ztrdup(rhre_zshmeta);
+    unmetafy(rhre, NULL);
+
+    opt = cre2_opt_new();
+    if (!opt) {
+	zwarn("re2 opt memory allocation failure");
+	goto CLEANUP_UNMETAONLY;
+    }
+    /* nb: we can set encoding here; re2 assumes UTF-8 by default */
+    cre2_opt_set_log_errors(opt, 0); /* don't hit stderr by default */
+    if (!isset(CASEMATCH)) {
+	cre2_opt_set_case_sensitive(opt, 0);
+    }
+
+    /* "The following options are only consulted when POSIX syntax is enabled;
+     * when POSIX syntax is disabled: these features are always enabled and
+     * cannot be turned off."
+     * Seems hard to mis-parse, but I did.  Okay, Perl classes \d,\w and friends
+     * always on normally, can _also_ be enabled in POSIX mode. */
+
+    switch (id) {
+    case ZRE2_COND_RE2:
+	/* nothing to do, this is default */
+	break;
+    case ZRE2_COND_POSIX:
+	cre2_opt_set_posix_syntax(opt, 1);
+	break;
+    case ZRE2_COND_POSIXPERL:
+	cre2_opt_set_posix_syntax(opt, 1);
+	/* we enable Perl classes (\d, \s, \w, \D, \S, \W)
+	 * and boundaries/not (\b \B) */
+	cre2_opt_set_perl_classes(opt, 1);
+	cre2_opt_set_word_boundary(opt, 1);
+	break;
+    case ZRE2_COND_LONGEST:
+	cre2_opt_set_longest_match(opt, 1);
+	break;
+    default:
+	DPUTS(1, "bad re2 option");
+	goto CLEANUP_UNMETAONLY;
+    }
+
+    rex = cre2_new(rhre, strlen(rhre), opt);
+    if (!rex) {
+	zwarn("re2 regular expression memory allocation failure");
+	goto CLEANUP_OPT;
+    }
+    if (cre2_error_code(rex)) {
+	zwarn("re2 rexexp compilation failed: %s", cre2_error_string(rex));
+	goto CLEANUP;
+    }
+
+    ncaptures = cre2_num_capturing_groups(rex);
+    /* the nmatch for cre2_match follows the usual pattern of index 0 holding
+     * the entire matched substring, index 1 holding the first capturing
+     * sub-expression, etc.  So we need ncaptures+1 elements. */
+    matchessz = (ncaptures + 1) * sizeof(cre2_string_t);
+    matches = zalloc(matchessz);
+
+    matched = cre2_match(rex,
+			 lhstr, strlen(lhstr), /* text to match against */
+			 0, strlen(lhstr), /* substring of text to consider */
+			 CRE2_UNANCHORED, /* user should explicitly anchor */
+			 matches, (ncaptures+1));
+    if (!matched)
+	goto CLEANUP;
+    return_value = 1;
+
+    /* We have a match, we will return success, we have array of cre2_string_t
+     * items, each with .data and .length fields pointing into the matched text,
+     * all in unmetafied format.
+     *
+     * We need to collect the results, put together various arrays and offset
+     * variables, while respecting options to change the array set, the indexing
+     * of that array and everything else that 26 years of history has endowed
+     * upon us. */
+    /* option BASHREMATCH set:
+     *    set $BASH_REMATCH instead of $MATCH/$match
+     *    entire matched portion in index 0 (useful with option KSH_ARRAYS)
+     * option _not_ set:
+     *    $MATCH scalar gets entire string
+     *    $match array gets substrings
+     *    $MBEGIN $MEND scalars get offsets of entire match
+     *    $mbegin $mend arrays get offsets of substrings
+     *    all of the offsets depend upon KSHARRAYS to determine indexing!
+     */
+
+    if (isset(BASHREMATCH)) {
+	start = 0;
+	nelem = ncaptures + 1;
+    } else {
+	start = 1;
+	nelem = ncaptures;
+    }
+    result_array = NULL;
+    if (nelem) {
+	result_array = x = (char **) zalloc(sizeof(char *) * (nelem + 1));
+	for (m = matches + start, n = start; n <= ncaptures; ++n, ++m, ++x) {
+	    /* .data is (const char *), metafy can modify in-place so takes
+	     * (char *) but doesn't modify given META_DUP, so safe to drop
+	     * the const. */
+	    *x = metafy((char *)m->data, m->length, META_DUP);
+	}
+	*x = NULL;
+    }
+
+    if (isset(BASHREMATCH)) {
+	setaparam("BASH_REMATCH", result_array);
+	goto CLEANUP;
+    }
+
+    indexing_base = isset(KSHARRAYS) ? 0 : 1;
+
+    setsparam("MATCH", metafy((char *)matches[0].data, matches[0].length, META_DUP));
+    /* count characters before the match */
+    s = lhstr;
+    remaining_len = matches[0].data - lhstr;
+    offs = 0;
+    MB_CHARINIT();
+    while (remaining_len) {
+	offs++;
+	charlen = MB_CHARLEN(s, remaining_len);
+	s += charlen;
+	remaining_len -= charlen;
+    }
+    setiparam("MBEGIN", offs + indexing_base);
+    /* then the characters within the match */
+    remaining_len = matches[0].length;
+    while (remaining_len) {
+	offs++;
+	charlen = MB_CHARLEN(s, remaining_len);
+	s += charlen;
+	remaining_len -= charlen;
+    }
+    /* zsh ${foo[a,b]} is inclusive of end-points, [a,b] not [a,b) */
+    setiparam("MEND", offs + indexing_base - 1);
+    if (!nelem) {
+	goto CLEANUP;
+    }
+
+    bptr = mbegin = (char **)zalloc(sizeof(char *)*(nelem+1));
+    eptr = mend = (char **)zalloc(sizeof(char *)*(nelem+1));
+    for (m = matches + start, n = 0;
+	 n < nelem;
+	 ++n, ++m, ++bptr, ++eptr)
+    {
+	char buf[DIGBUFSIZE];
+	if (m->data == NULL) {
+	    /* FIXME: have assumed this is the API for non-matching substrings; confirm! */
+	    *bptr = ztrdup("-1");
+	    *eptr = ztrdup("-1");
+	    continue;
+	}
+	s = lhstr;
+	remaining_len = m->data - lhstr;
+	offs = 0;
+	/* Find the start offset */
+	MB_CHARINIT();
+	while (remaining_len) {
+	    offs++;
+	    charlen = MB_CHARLEN(s, remaining_len);
+	    s += charlen;
+	    remaining_len -= charlen;
+	}
+	convbase(buf, offs + indexing_base, 10);
+	*bptr = ztrdup(buf);
+	/* Continue to the end offset */
+	remaining_len = m->length;
+	while (remaining_len) {
+	    offs++;
+	    charlen = MB_CHARLEN(s, remaining_len);
+	    s += charlen;
+	    remaining_len -= charlen;
+	}
+	convbase(buf, offs + indexing_base - 1, 10);
+	*eptr = ztrdup(buf);
+    }
+    *bptr = *eptr = NULL;
+
+    setaparam("match", result_array);
+    setaparam("mbegin", mbegin);
+    setaparam("mend", mend);
+
+CLEANUP:
+    if (matches)
+	zfree(matches, matchessz);
+    cre2_delete(rex);
+CLEANUP_OPT:
+    cre2_opt_delete(opt);
+CLEANUP_UNMETAONLY:
+    free(lhstr);
+    free(rhre);
+    return return_value;
+}
+
+
+static struct conddef cotab[] = {
+    CONDDEF("re2-match", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_RE2),
+    CONDDEF("re2-match-posix", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_POSIX),
+    CONDDEF("re2-match-posixperl", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_POSIXPERL),
+    CONDDEF("re2-match-longest", CONDF_INFIX, zcond_re2_match, 0, 0, ZRE2_COND_LONGEST),
+};
+
+
+static struct features module_features = {
+    NULL, 0,
+    cotab, sizeof(cotab)/sizeof(*cotab),
+    NULL, 0,
+    NULL, 0,
+    0
+};
+
+
+/**/
+int
+setup_(UNUSED(Module m))
+{
+    return 0;
+}
+
+/**/
+int
+features_(Module m, char ***features)
+{
+    *features = featuresarray(m, &module_features);
+    return 0;
+}
+
+/**/
+int
+enables_(Module m, int **enables)
+{
+    return handlefeatures(m, &module_features, enables);
+}
+
+/**/
+int
+boot_(UNUSED(Module m))
+{
+    return 0;
+}
+
+/**/
+int
+cleanup_(Module m)
+{
+    return setfeatureenables(m, &module_features, NULL);
+}
+
+/**/
+int
+finish_(UNUSED(Module m))
+{
+    return 0;
+}
diff --git a/Src/Modules/re2.mdd b/Src/Modules/re2.mdd
new file mode 100644
index 0000000..b20838c
--- /dev/null
+++ b/Src/Modules/re2.mdd
@@ -0,0 +1,5 @@
+name=zsh/re2
+link='if test "x$enable_re2" = xyes && test "x$ac_cv_lib_cre2_cre2_version_string" = xyes; then echo dynamic; else echo no; fi'
+load=no
+
+objects="re2.o"
diff --git a/Test/V11re2.ztst b/Test/V11re2.ztst
new file mode 100644
index 0000000..d6e327c
--- /dev/null
+++ b/Test/V11re2.ztst
@@ -0,0 +1,170 @@
+%prep
+
+  if ! zmodload -F zsh/re2 C:re2-match 2>/dev/null
+  then
+    ZTST_unimplemented="the zsh/re2 module is not available"
+    return 0
+  fi
+# Load the rest of the builtins
+  zmodload zsh/re2
+  ##FIXME#setopt rematch_pcre
+# Find a UTF-8 locale.
+  setopt multibyte
+# Don't let LC_* override our choice of locale.
+  unset -m LC_\*
+  mb_ok=
+  langs=(en_{US,GB}.{UTF-,utf}8 en.UTF-8
+	 $(locale -a 2>/dev/null | egrep 'utf8|UTF-8'))
+  for LANG in $langs; do
+    if [[ é = ? ]]; then
+      mb_ok=1
+      break;
+    fi
+  done
+  if [[ -z $mb_ok ]]; then
+    ZTST_unimplemented="no UTF-8 locale or multibyte mode is not implemented"
+  else
+    print -u $ZTST_fd Testing RE2 multibyte with locale $LANG
+    mkdir multibyte.tmp && cd multibyte.tmp
+  fi
+
+%test
+
+  [[ 'foo→bar' -re2-match .([^[:ascii:]]). ]]
+  print $MATCH
+  print $match[1]
+0:Basic non-ASCII regexp matching
+>o→b
+>→
+
+  [[ alphabeta -re2-match a([^a]+)a ]]
+  echo "$? basic"
+  print $MATCH
+  print $match[1]
+  [[ ! alphabeta -re2-match a(.+)a ]]
+  echo "$? negated op"
+  [[ alphabeta -re2-match ^b ]]
+  echo "$? failed match"
+# default matches on first, then takes longest substring
+# -longest keeps looking
+  [[ abb -re2-match a(b|bb) ]]
+  echo "$? first .${MATCH}.${match[1]}."
+  [[ abb -re2-match-longest a(b|bb) ]]
+  echo "$? longest .${MATCH}.${match[1]}."
+  [[ alphabeta -re2-match ab ]]; echo "$? unanchored"
+  [[ alphabeta -re2-match ^ab ]]; echo "$? anchored"
+  [[ alphabeta -re2-match '^a(\w+)a$' ]]
+  echo "$? perl class used"
+  echo ".${MATCH}. .${match[1]}."
+  [[ alphabeta -re2-match-posix '^a(\w+)a$' ]]
+  echo "$? POSIX-mode, should inhibit Perl class"
+  [[ alphabeta -re2-match-posixperl '^a(\w+)a$' ]]
+  echo "$? POSIX-mode with Perl classes enabled .${match[1]}."
+  unset MATCH match
+  [[ alphabeta -re2-match ^a([^a]+)a([^a]+)a$ ]]
+  echo "$? matched, set vars"
+  echo ".$MATCH. ${#MATCH}"
+  echo ".${(j:|:)match[*]}."
+  unset MATCH match
+  [[ alphabeta -re2-match fr(.+)d ]]
+  echo "$? unmatched, not setting MATCH/match"
+  echo ".$MATCH. ${#MATCH}"
+  echo ".${(j:|:)match[*]}."
+0:Basic matching & result codes
+>0 basic
+>alpha
+>lph
+>1 negated op
+>1 failed match
+>0 first .ab.b.
+>0 longest .abb.bb.
+>0 unanchored
+>1 anchored
+>0 perl class used
+>.alphabeta. .lphabet.
+>1 POSIX-mode, should inhibit Perl class
+>0 POSIX-mode with Perl classes enabled .lphabet.
+>0 matched, set vars
+>.alphabeta. 9
+>.lph|bet.
+>1 unmatched, not setting MATCH/match
+>.. 0
+>..
+
+  m() {
+    unset MATCH MBEGIN MEND match mbegin mend
+    [[ $2 -re2-match $3 ]]
+    print $? $1: m:${MATCH}: ma:${(j:|:)match}: MBEGIN=$MBEGIN MEND=$MEND mbegin="(${mbegin[*]})" mend="(${mend[*]})"
+  }
+  data='alpha beta gamma delta'
+  m uncapturing $data '\b\w+\b'
+  m capturing $data '\b(\w+)\b'
+  m 'capture 2' $data '\b(\w+)\s+(\w+)\b'
+  m 'capture repeat' $data '\b(?:(\w+)\s+)+(\w+)\b'
+0:Beginning and end testing
+>0 uncapturing: m:alpha: ma:: MBEGIN=1 MEND=5 mbegin=() mend=()
+>0 capturing: m:alpha: ma:alpha: MBEGIN=1 MEND=5 mbegin=(1) mend=(5)
+>0 capture 2: m:alpha beta: ma:alpha|beta: MBEGIN=1 MEND=10 mbegin=(1 7) mend=(5 10)
+>0 capture repeat: m:alpha beta gamma delta: ma:gamma|delta: MBEGIN=1 MEND=22 mbegin=(12 18) mend=(16 22)
+
+
+  unset match mend
+  s=$'\u00a0'
+  [[ $s -re2-match '^.$' ]] && print OK
+  [[ A${s}B -re2-match .(.). && $match[1] == $s ]] && print OK
+  [[ A${s}${s}B -re2-match A([^[:ascii:]]*)B && $mend[1] == 3 ]] && print OK
+  unset s
+0:Raw IMETA characters in input string
+>OK
+>OK
+>OK
+
+  [[ foo -re2-match f.+ ]] ; print $?
+  [[ foo -re2-match x.+ ]] ; print $?
+  [[ ! foo -re2-match f.+ ]] ; print $?
+  [[ ! foo -re2-match x.+ ]] ; print $?
+  [[ foo -re2-match f.+ && bar -re2-match b.+ ]] ; print $?
+  [[ foo -re2-match x.+ && bar -re2-match b.+ ]] ; print $?
+  [[ foo -re2-match f.+ && bar -re2-match x.+ ]] ; print $?
+  [[ ! foo -re2-match f.+ && bar -re2-match b.+ ]] ; print $?
+  [[ foo -re2-match f.+ && ! bar -re2-match b.+ ]] ; print $?
+  [[ ! ( foo -re2-match f.+ && bar -re2-match b.+ ) ]] ; print $?
+  [[ ! foo -re2-match x.+ && bar -re2-match b.+ ]] ; print $?
+  [[ foo -re2-match x.+ && ! bar -re2-match b.+ ]] ; print $?
+  [[ ! ( foo -re2-match x.+ && bar -re2-match b.+ ) ]] ; print $?
+0:Regex result inversion detection
+>0
+>1
+>1
+>0
+>0
+>1
+>1
+>1
+>1
+>1
+>0
+>1
+>0
+
+# Subshell because crash on failure
+  ( [[ test.txt -re2-match '^(.*_)?(test)' ]]
+    echo $match[2] )
+0:regression for segmentation fault (pcre, dup for re2), workers/38307
+>test
+
+  setopt BASH_REMATCH KSH_ARRAYS
+  unset MATCH MBEGIN MEND match mbegin mend BASH_REMATCH
+  [[ alphabeta -re2-match '^a([^a]+)(a)([^a]+)a$' ]]
+  echo "$? bash_rematch"
+  echo "m:${MATCH}: ma:${(j:|:)match}:"
+  echo MBEGIN=$MBEGIN MEND=$MEND mbegin="(${mbegin[*]})" mend="(${mend[*]})"
+  echo "BASH_REMATCH=[${(j:, :)BASH_REMATCH[@]}]"
+  echo "[0]=${BASH_REMATCH[0]} [1]=${BASH_REMATCH[1]}"
+0:bash_rematch works
+>0 bash_rematch
+>m:: ma::
+>MBEGIN= MEND= mbegin=() mend=()
+>BASH_REMATCH=[alphabeta, lph, a, bet]
+>[0]=alphabeta [1]=lph
+
diff --git a/configure.ac b/configure.ac
index 0e0bd53..9c23691 100644
--- a/configure.ac
+++ b/configure.ac
@@ -442,6 +442,11 @@ AC_ARG_ENABLE(pcre,
 AC_HELP_STRING([--enable-pcre],
 [enable the search for the pcre library (may create run-time library dependencies)]))
 
+dnl Do you want to look for re2 support?
+AC_ARG_ENABLE(re2,
+AC_HELP_STRING([--enable-re2],
+[enable the search for cre2 C-language bindings and re2 library]))
+
 dnl Do you want to look for capability support?
 AC_ARG_ENABLE(cap,
 AC_HELP_STRING([--enable-cap],
@@ -683,6 +688,15 @@ if test "x$ac_cv_prog_PCRECONF" = xpcre-config; then
 fi
 fi
 
+if test x$enable_re2 = xyes; then
+AC_CHECK_LIB([re2],[main],,
+  [AC_MSG_FAILURE([test for RE2 library failed])])
+AC_CHECK_LIB([cre2],[cre2_version_string],,
+  [AC_MSG_FAILURE([test for CRE2 library failed])])
+AC_CHECK_HEADERS([cre2.h],,
+  [AC_MSG_ERROR([test for RE2 header failed])])
+fi
+
 AC_CHECK_HEADERS(sys/time.h sys/times.h sys/select.h termcap.h termio.h \
 		 termios.h sys/param.h sys/filio.h string.h memory.h \
 		 limits.h fcntl.h libc.h sys/utsname.h sys/resource.h \
-- 
2.10.0
Attachment: signature.asc
Description: Digital signature
Follow-Ups:
- Re: [PATCH] Add zsh/re2 module with conditions
  - From: Oliver Kiddle
- [PATCH] re2: fix clean-up path; fix two comments
  - From: Phil Pennock
Messages sorted by: Reverse Date, Date, Thread, Author