Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: A comment about "slurp" and -o multibyte



On Wed, Jan 17, 2024 at 4:46 AM Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
>
> On Sun, Jan 14, 2024 at 2:34 AM Roman Perepelitsa
> <roman.perepelitsa@xxxxxxxxx> wrote:
> >
> >     function slurp() {
> >       emulate -L zsh -o no_multibyte
> > [...]
> >       typeset -g REPLY=${(j::)content}
> >     }
>
> Although the function faithfully reads the input stream into $REPLY,
> later references to $REPLY with the multibyte option back in effect
> will (re-)interpret the content as multibyte characters.  This may not
> be what's desired.
>
> % slurp < =zsh
> % () {
> print $#REPLY
> print ${(m)#REPLY}
> print ${(mm)#REPLY}
> setopt localoptions nomultibyte
> print $#REPLY
> }
> 872903  <-- number of characters
> 873259  <-- width of printable characters
> 872383  <-- number of glyphs
> 878288  <-- actual number of bytes
>
> (Of course those first three numbers are all garbage because it's just
> interpreting an executable as wide character text.)

To me this behavior looks as expected. It's consistent with `read`,
`sysread` and process substitution.

    % head -c $((1 << 20)) </dev/urandom | tr '\0' x >1MB
    % slurp <1MB
    % IFS= read -rd '' read <1MB
    % sysread -s $((1 << 20)) sysread <1MB
    % procsubst=${"$(<1MB; print -n .)"%.}
    % () {
      print -r -- $#REPLY $#read $#sysread $#procsubst
      setopt local_options no_multibyte
      print -r -- $#REPLY $#read $#sysread $#procsubst
    }
    1008389 1008389 1008389 1008389
    1048576 1048576 1048576 1048576

Roman.




Messages sorted by: Reverse Date, Date, Thread, Author