Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: 5.0.8 regression when waiting for suspended jobs



Preface:  I think I've figured out why zwaitjob() does not use the same
logic as waitforpid(); zwaitjob() may be waiting for an entire pipeline,
needing to record status of multiple actual processes which may exit in
any order, and only finish when all the processes are complete.

On Aug 12, 10:43am, Peter Stephenson wrote:
} Subject: Re: 5.0.8 regression when waiting for suspended jobs
}
} On Tue, 11 Aug 2015 16:56:55 -0700
} Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
} > } zsh-5.0.7
} > }  - "wait $!" blocks (looping on repeated wait3() nonzero)
} > } zsh-5.0.8
} > }  - "wait $!" loops but also printing status every time
} > 
} > bin_fg() calls waitforpid() which discovers the job is stopped and goes
} > into a loop calling kill(pid, SIGCONT) to try to get the job to run
} > again.
} > 
} > All of this is exactly the same as in 5.0.7 except that because of the
} > SIGCONT change in workers/35032 we notice the stopped -> continued ->
} > stopped again status change and therefore print the new status
} 
} So you might have thought the right thing to do was note it had been
} stopped immediately, possibly warn the user, and not try to continue it
} again without further user action?  Is that easy?

No, not really.  I suppose we could do something baroque like examine
the rusage cputime but otherwise the CHLD could be arriving at any point.

We could special-case the SIGTT* signals, we obviously know (from the
status that's printed) which signal stopped the job.

} Clearly there's a race in the real world
} where the programme could get SIGTTIN at any time, but in the general
} case (i.e. where a background process got SIGTTIN when the foreground
} was doing something irrelevant) you clearly *don't* want it to continue
} every time.

This only happens for the "wait" command, not for handling the signal
while something else is in the foreground.  There might be some weird
edge case where you could cause it to happen with command substitution
(the only other place waitforpid() is used) but I can't come up with it.

} Do we even understand what the loop with SIGCONT is doing for us?  Under
} what circumstances would this help?

It would seem that it's trying very hard not to have "wait" either fail
immediately (bash behavior) or block forever (ksh behavior).  Doing the
ksh thing makes a bit of sense when "wait" will propagate the signals
(so interrupting wait also interrupts the stopped job).

} Some (other sort of) race where something else (what? Not zsh and
} not the process that's suspended) takes a while to get going, so the
} SIGCONT only succeeds after a few attempts?

Reasoning lost to history, I fear (predates source code control).

} > - wait %1" -
} > 
} > bin_fg() calls zwaitjob() which does NOT do kill(pid, SIGCONT) instead
} > simply blocking forever waiting for a SIGCHLD that will never arrive.

I actually got this one wrong -- yes, zwaitjob() would block forever if
it reached that signal_suspend() call, but in fact it won't even try if
the job status is STAT_STOPPED.  It just silently returns.

-- 
Barton E. Schaefer



Messages sorted by: Reverse Date, Date, Thread, Author