Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: fatal flaw zsh 4.0.1 on irix 6.3 & 6.5: suspend "ls -l|less" then resume hangs



On Tue, 12 Jun 2001 10:15:59 +0200 (MET DST), Sven Wischnowsky <wischnow@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
> Timothy Miller wrote:
> 
> > I continue to not be a subscriber to this list :-)
> > 
> > If I invoke zsh as "zsh-4.0.1 -f" and then run "ls -l|less" on irix 6.3
> > or 6.5, control-z to suspend, and then "fg" to resume, the shell prints out
> > 
> > [1]  + done       ls -l | 
> >        continued  less
> > 
> > and then hangs, unresponsive to all input (ctrl-c, ctrl-z, ctrl-\, other keys,
> > etc). I include the results of Util/reporter at the end of this email. This
> > bug does not happen on Solaris 2.7, AIX 4.3.2, or redhat 7.0 linux 2.2.16.
> > The version of less I'm using is 290 on irix 6.3 and 332 on irix 6.5,
> > solaris, and ix, and 358 on linux.
> 
> Hm, that's weird -- it's not even one of the complicated cases. The
> reporter output isn't of much help here. Hence some questions:
> 
> - What does the output of `ps j' (or equivalent, showing pids and parent
>   pids) show when the job hangs? (ps output with signal masks might
>   help, too.)

On irix 6.3 (where there doesn't seem to be any way to get ps to report
signal masks):
Running zsh 3.1.6 with -f, then ls -l | less with both procs left running,
ps -fjl:

  F S      UID   PID  PPID  PGID   SID  C PRI NI  P    SZ:RSS      WCHAN    STIME TTY     TIME CMD
 b0 S      tsm 13041 12927 13041 12927  0  39 20  *   666:227   8039d510 09:58:18 ttyq10  0:00 zsh-beta -f 
 b0 S      tsm 13043 13041 13042 12927  0  28 20  *   450:124   8039dc80 09:58:46 ttyq10  0:00 less 

after suspend:

 b0 S      tsm 13041 12927 13041 12927  0  28 20  *   666:227   8039dc80 09:58:18 ttyq10  0:00 zsh-beta -f 
 b0 T      tsm 13043 13041 13042 12927  0  60 20  *   450:124          - 09:58:46 ttyq10  0:00 less 

after resume:

 b0 S      tsm 13041 12927 13041 12927  0  39 20  *   666:239   8039d510 09:58:18 ttyq10  0:00 zsh-beta -f 
 b0 S      tsm 13043 13041 13042 12927  0  28 20  *   451:125   8039dc80 09:58:46 ttyq10  0:00 less 

Running zsh 4.0.1 with -f, both procs still running:
 b0 S      tsm 13097 12927 13097 12927  4  39 20  *   722:285   8039d510 10:07:57 ttyq10  0:00 zsh-4.0.1 -f 
 b0 S      tsm 13100 13097 13100 12927  0  28 20  *   450:124   8039dc80 10:08:00 ttyq10  0:00 less 

after suspend:

 b0 S      tsm 13097 12927 13097 12927  0  28 20  *   722:286   8039dc80 10:07:57 ttyq10  0:00 zsh-4.0.1 -f 
 b0 T      tsm 13100 13097 13100 12927  0  60 20  *   450:124          - 10:08:00 ttyq10  0:00 less 

after resume:

 b0 S      tsm 13097 12927 13097 12927  0  39 20  *   722:288   8039d510 10:07:57 ttyq10  0:00 zsh-4.0.1 -f 
 b0 T      tsm 13100 13097 13100 12927  0  60 20  *   450:124          - 10:08:00 ttyq10  0:00 less 

On irix 6.5, zsh 3.1.6, both running:

  F S      UID        PID       PPID       PGID        SID  C PRI NI  P    SZ:RSS      WCHAN    STIME TTY     TIME CMD
  0 S      tsm      35634      35740      35634      35740  0  20 20  *   186:131   23f900b8 10:20:30 ttyq4   0:00 zsh-beta -f
  0 S      tsm      31635      35634      36065      35740  0  20 20  *   130:84    203fe018 10:20:32 ttyq4   0:00 less

after suspend:
  0 S      tsm      35634      35740      35634      35740  0  20 20  *   186:131   203fe018 10:20:30 ttyq4   0:00 zsh-beta -f
 40 T      tsm      31635      35634      36065      35740  0  20 20  *   131:85           - 10:20:32 ttyq4   0:00 less

after resume:
  0 S      tsm      35634      35740      35634      35740  0  20 20  *   186:134   23f900b8 10:20:30 ttyq4   0:00 zsh-beta -f
  0 S      tsm      31635      35634      36065      35740  0  20 20  *   131:85    203fe018 10:20:32 ttyq4   0:00 less

zsh 4.0.1 both running:
  0 S      tsm      35817      35740      35817      35740  0  20 20  *   220:145   23f900b8 10:22:06 ttyq4   0:00 zsh-4.0.1 -f
  0 S      tsm      36183      35817      36183      35740  0  20 20  *   130:84    203fe018 10:22:09 ttyq4   0:00 less

after suspend:
  0 S      tsm      35817      35740      35817      35740  0  20 20  *   220:145   203fe018 10:22:06 ttyq4   0:00 zsh-4.0.1 -f
 40 T      tsm      36183      35817      36183      35740  0  20 20  *   131:85           - 10:22:09 ttyq4   0:00 less

after resume:
  0 S      tsm      35817      35740      35817      35740  0  20 20  *   220:145   23f900b8 10:22:06 ttyq4   0:00 zsh-4.0.1 -f
 40 T      tsm      36183      35817      36183      35740  0  20 20  *   131:85           - 10:22:09 ttyq4   0:00 less

The odd thing here is the difference in flags and flags behavior between the 
two machines. The documented meaning for the flags is the same on both machines:
     F     (l)      Flags (hexadecimal and additive) associated with the
                    process:

                    001   Process is a system (resident) process.
                    002   Process is being traced.
                    004   Stopped process has been given to parent via
                          wait(2).
                    008   Process is sleeping at a non-interruptible priority.
                    010   Process is in core.
                    020   Process user area is in core.
                    040   Process has enabled atomic operator emulation.
                    080   Process in stream poll or select.
                    100   Process is a kernel thread.

>From the evidence, though, I suspect that this is an error for 6.5.

On both systems, zsh 4.0.1 was run as a subshell under zsh 3.1.6, hence the
different session id.

I wrote a small program to get the pending and held signal masks as well as
a bit of other information:

For irix 6.3, zsh 3.1.6 with -f, at prompt:
zsh: no signals held or pending, asleep on syscall 4

while ls -l|less running:
zsh: asleep on syscall 166, all signals from 1 to 64 held EXCEPT 1, 9, 18,
23.
less: asleep on syscall 4, no signals held or pending

after suspend:
zsh asleep no syscall 4, no sigs held or pending
less: stopped, no sigs

after resume:
zsh back to while running
less back to while running

zsh 4.0.1 with -f, at prompt:
zsh: asleep on syscall 4

after ls -l|less:
zsh: asleep on syscall 166, all sigs from 1 to 64 held except 1, 9, 18, 23.
less asleep on syscall 4

after suspend:
zsh: asleep on syscall 4 no sigs held or pending
less: stopped

after resume:
zsh: asleep on syscall 166, all sigs 1-64 held except 1, 9, 18, 23.
less: stopped

The only info I can find on syscall numbers seems to say that they start
with 1000, which doesn't seem to be the case, but if I subtract 1000 from
them it claims syscall 4 is write() and 166 is poll(). Signal 1 is hup, 9 is 
kill, 18 is chld, 23 is stop.

> - Have you tried it with earlier versions of zsh? Does it work there?

Yes, works with 3.1.6, 3.0.6 and all the earlier versions of zsh I've
installed on irix (unfortunately I can't recall exactly which).

> - Does it work if you replace `less' with another program that doesn't
>   program the terminal (so much), e.g. `more' and `cat'?

more fails as well. I can't type fast enough to suspend cat before it exits!
   Tim



Messages sorted by: Reverse Date, Date, Thread, Author