Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: read -d $'\200' doesn't work with set +o multibyte (and [PATCH])



Jun T wrote:
> Thanks, I think it fixes the problem for the '#ifdef MULTIBYTE_SUPPORT' section.
>
> When MULTIBYTE_SUPPORT is not defined, delim is char, so we need
> STOUC() not when assigning to delim but when using delim.
> But instead of adding STOUC() to every use of delim (in nondef
> MULTIBYTE_SUPPORT section), it would be easier to define delim as int.

At least in my testing, it appears to also work to define delim as
unsigned char which I would find less confusing.

> + print -n $'first line\x80second line\x80' |
> + while read -d $'\x80' line; do print $line; done
> +0:read with a delimeter >= 0x80

There's a typo in "delimiter"

The patch below needs to be applied on top of your patch. It adds a few
more test cases, documents (and tests) the empty string being an
alternative way to set the delimiter to NUL. It also addresses the
additional problem I was hitting when trying to reproduce the original
problem. Rather than follow the 0xdc00 + byte suggestion it was
easier to simply set a separate flag variable and follow the
!isset(MULTIBYTE) path through the later code.

Oliver

diff --git a/Doc/Zsh/builtins.yo b/Doc/Zsh/builtins.yo
index b6217f66d..56428a714 100644
--- a/Doc/Zsh/builtins.yo
+++ b/Doc/Zsh/builtins.yo
@@ -1589,7 +1589,8 @@ Input is read from the coprocess.
 )
 item(tt(-d) var(delim))(
 Input is terminated by the first character of var(delim) instead of
-by newline.
+by newline.  For compatibility with other shells, if var(delim) is an
+empty string, input is terminated at the first NUL.
 )
 item(tt(-t) [ var(num) ])(
 Test if input is available before attempting to read.  If var(num)
diff --git a/Src/builtin.c b/Src/builtin.c
index a6fadb622..09d0ca2f0 100644
--- a/Src/builtin.c
+++ b/Src/builtin.c
@@ -6282,6 +6282,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
     long izle_timeout = 0;
 #ifdef MULTIBYTE_SUPPORT
     wchar_t delim = L'\n', wc;
+    int rawbyte = 0;
     mbstate_t mbs;
     char *laststart;
     size_t ret;
@@ -6412,9 +6413,11 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
 	    wi = WEOF;
 	if (wi != WEOF)
 	    delim = (wchar_t)wi;
-	else
+	else {
 	    delim = (wchar_t)STOUC((delimstr[0] == Meta) ?
 			      delimstr[1] ^ 32 : delimstr[0]);
+	    rawbyte = 1;
+	}
 #else
         delim = STOUC((delimstr[0] == Meta) ? delimstr[1] ^ 32 : delimstr[0]);
 #endif
@@ -6841,7 +6844,7 @@ bin_read(char *name, char **args, Options ops, UNUSED(int func))
 		break;
 	    }
 	    *bptr = (char)c;
-	    if (isset(MULTIBYTE)) {
+	    if (isset(MULTIBYTE) && !rawbyte) {
 		ret = mbrtowc(&wc, bptr, 1, &mbs);
 		if (!ret)	/* NULL */
 		    ret = 1;
diff --git a/Test/B04read.ztst b/Test/B04read.ztst
index a2f03c9b3..f50c43682 100644
--- a/Test/B04read.ztst
+++ b/Test/B04read.ztst
@@ -82,6 +82,10 @@
 >Testing the
 >null hypothesis
 
+ read -ed '' <<<$'one\0two'
+0:empty delimiter terminates at nulls
+>one
+
  print -n $'first line\x80second line\x80' |
  while read -d $'\x80' line; do print $line; done
 0:read with a delimeter >= 0x80
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index 6909346cb..413c4fe73 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -212,6 +212,20 @@
 >first
 >second
 
+  read -ed £
+0:read with multibyte delimiter where bytes of delimiter also occur in input
+<one¤twoãthree£four
+>one¤twoãthree
+
+  read -ed $'\xa0' <<<$'first\xa0second'
+0:read delimited by a byte that isn't a valid multibyte character
+>first
+
+  read -ed $'\xc2'
+0:read delimited by a single byte terminates if the byte is part of a multibyte character
+<one£two
+>one
+
   (IFS=«
   read -d » -A array
   print -l $array)




Messages sorted by: Reverse Date, Date, Thread, Author