Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: 5.0.8 regression when waiting for suspended jobs

On Jul 31,  8:56am, Bart Schaefer wrote:
} zsh-5.0.7
}  - "wait $!" blocks (looping on repeated wait3() nonzero)
}  - "wait %1" returns immediately
}  - "wait" returns immediately
} zsh-5.0.8
}  - "wait $!" loops but also printing status every time
}  - "wait %1" returns immediately
}  - "wait" returns immediately

I still only suspect what changed to make 5.0.8 different from 5.0.7 in
this regard, but here's what's going on:

- "wait $!" -

bin_fg() calls waitforpid() which discovers the job is stopped and goes
into a loop calling kill(pid, SIGCONT) to try to get the job to run
again.  In the 5.0.8 case, each time this happens the job briefly wakes
up, gets stopped with SIGTTIN, thus causes another SIGCHLD to go to the
parent zsh, which then prints the "suspended" message and loops right
back to kill(pid, SIGCONT) again.

All of this is exactly the same as in 5.0.7 except that because of the
SIGCONT change in workers/35032 we notice the stopped -> continued ->
stopped again status change and therefore print the new status even
though it's actually the same as the last time we printed the status,
because we skipped printing the "continued" status.  Or so I surmise.

- wait %1" -

bin_fg() calls zwaitjob() which does NOT do kill(pid, SIGCONT) instead
simply blocking forever waiting for a SIGCHLD that will never arrive.

If a signal *is* received and the waiting shell is a subshell, *then*
the awaited job is SIGCONT'd, but I don't recall why and it doesn't
matter for this bug anyway.

This does however raise the question of why zwaitjob() is not calling
waitforpid().  If it did so, we'd have the ksh behavior for all three
cases of "wait", and we could even add the bit where interrupting the
wait sends the signal through to the waited-for job.

- "wait" -

bin_fg() goes into a loop calling zwaitjob() on every entry in the job
table; i.e., identical to "wait %1" repeated for every job number.


So what do we do about this?  Skip the SIGCONT in waitforpid()?  Only
try SIGCONT once in waitforpid() rather than every time around the
loop?  Some other thing involving the WIFCONTINUED() test?  Assuming
we work that out, should zwaitjob() be changed to use waitforpid(), or
do we think someone is relying on the bash-like immediate return?

Messages sorted by: Reverse Date, Date, Thread, Author