Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: [bug] busyloop upon $=var with NULs when $IFS contains both NUL and a byte > 0x7f



> 2022/11/18 23:27, Stephane Chazelas <stephane@xxxxxxxxxxxx> wrote:
> 
> $ LC_ALL=C zsh -c 'IFS=é$IFS; echo $=IFS'
> ^C
> 
> (busy loop had to be interrupted with ^C).

This not simple to solve. The basic question is:
   What should we do if IFS contains invalid characters?

When IFS changes, ifssetfn() calls inittyptab(), and it then calls
set_widearay() (at line 4172 in utils.c) to set the structure
ifs_wide. The origin of the problem seems to be in this function
(also in utils.c):

  95             mblen = mb_metacharlenconv(mb_array, &wci);
  ..
  99             /* No good unless all characters are convertible */
 100             if (wci == WEOF)
 101                 return;

mb_array is the current IFS (metafied), and it contains
é = \xc3\xa9. In the C locale (and at least on Linux), \xc3 is
an invalid character, and wci is set to WEOF. Then the function
returns without setting ifs_wide (ifs_wide.chars=NULL and
ifs_wide.len=0).

The comment at line 99 may look reasonable, but leaving ifs_wide
empty is equally 'no good', I think.

Due to this empty ifs_wide, itype_end() (and wcsitype()) doesn't
work as expected (for character >= \x80).

The 'busy loop' is in wordcount() (utils.c):

3834         for (; *s; r++) {                                                 
3835             char *ie = itype_end(s, ISEP, 1);               
3836             if (ie != s) {                                             
3837                 s = ie;                                                
....                          
3840             }                                          
3841             (void)findsep(&s, NULL, 0);
....
3845         }

Here, the pointer s already points to a ISEP (\x83\x20 = metafied Nul),
but itype_end() can't find the next ISEP (ie == s) due to the empty
ifs_wide, and findsep() does not move s because *s is already ISEP,
resulting in infinite-loop with the same s.

So the basic question is:
What should we do if IFS contains invalid character(s)?

I think, at least if MULTIBYTE option is ON, it would be better to
force reset IFS to the default, rather than leaving ifs_wide empty.

Or store only valid characters in ifs_side.chars?

BTW, in set_widearay():

  89             if (STOUC(*mb_array) <= 0x7f) {
  90                 mb_array++;
  91                 *wcptr++ = (wchar_t)*mb_array;

I think the lines 90,91 should be
	*wcptr++ = (wchar_t)*mb_array++;
But fixing this does not solve the current problem.



Messages sorted by: Reverse Date, Date, Thread, Author