Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Match length and multibyte characters



2015/09/10 20:35, Erik Bernstein <erik@xxxxxxxxxxx> wrote:
> % array=(a ä a)
> % print ${${(O)array//(#m)*/${#MATCH}}[1]} ${${(ON)array%%*}[1]}
> 1 2
> 
> Can maybe someone shed some light on whether the second version is
> supposed to work with multibyte characters and,

The second version returns 2 because ä is a 2 byte character in UTF-8.
This is a bug of the current zsh; all the flags N, B and E do not work
well with multibyte characters in ${...#...}, ${...%...} etc.

The patch below may fix the bug.

BTW, in your example, it is better to replace the flag (O) by (On)
so that the length is sorted in numerical order. Otherwise, 10 comes
before 2.


diff --git a/Src/glob.c b/Src/glob.c
index dea1bf5..43d135b 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -2491,17 +2491,17 @@ get_match_ret(char *s, int b, int e, int fl, char *replstr,
 	ll += 1 + (l - (e - b));
     if (fl & SUB_BIND) {
 	/* position of start of matched portion */
-	sprintf(buf, "%d ", b + 1);
+	sprintf(buf, "%d ", MB_METASTRLEN2END(s, 0, s+b) + 1);
 	ll += (bl = strlen(buf));
     }
     if (fl & SUB_EIND) {
 	/* position of end of matched portion */
-	sprintf(buf + bl, "%d ", e + 1);
+	sprintf(buf + bl, "%d ", MB_METASTRLEN2END(s, 0, s+e) + 1);
 	ll += (bl = strlen(buf));
     }
     if (fl & SUB_LEN) {
 	/* length of matched portion */
-	sprintf(buf + bl, "%d ", e - b);
+	sprintf(buf + bl, "%d ", MB_METASTRLEN2END(s+b, 0, s+e));
 	ll += (bl = strlen(buf));
     }
     if (bl)







Messages sorted by: Reverse Date, Date, Thread, Author