Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Bug#482346: zsh doesn't always wait for its children (-> zombie)



On 2008-05-25 at 23:37 +0200, Vincent Lefevre wrote:
> On 2008-05-24 18:23:21 -0700, Phil Pennock wrote:
> > Then I'd be inclined to start looking into hardware issues, since
> > _something's_ probably getting stuck in disk IO; I'll suspect that
> > before kernel bugs, but it might also be worth seeing if there are other
> > problems with threaded programs on powerpc, if init really can't reap
> > something that has already become a zombie.
> 
> I've looked at /var/log/kern.log and there's something each time
> I interrupted vlc, e.g.
> 
> May 24 14:33:36 ay kernel: Unable to handle kernel paging request for data at address 0x481e7000
> May 24 14:33:36 ay kernel: Faulting instruction address: 0xc00131e8
> May 24 14:33:36 ay kernel: Oops: Kernel access of bad area, sig: 11 [#1]

That's a segfault; the kernel's then oopsing whilst trying to page in
memory to write the coredump; looks like a problem in the MMU logic for
the powerpc.

So, the problems are:

 * vlc is segfaulting when it receives SIGINT;

 * the powerpc Linux kernel has a bug whereby it's ending up not letting
   the parent wait on it (from what I understand of the details so far)
   in some cases, so it looks like the process isn't actually ending and
   transitioning to zombie status; it might be worth talking to the
   architecture maintainers for your distribution, to see about known
   issues; note that even init is unable to reclaim these processes;
   have you tried sending a SIGKILL to force-exit the vlc, to see if
   either zsh or init can reap the process then?

 * zsh is somehow tickling the kernel bug and it might be worth having
   configure logic to deal with this, even after the problem is fixed,
   once we know what it is that's tickling this.

> May 24 14:33:36 ay kernel: note: vlc[21850] exited with preempt_count 1

My nasty suspicious mind thinks that special kernel logic for handling a
weird exit condition, and logging it, is less tested code that's already
doing something different to the default, so this is likely close to the
root cause; no powerpc available for me to test, though.

It seems unlikely that there'd be enough bugs to also have a zombie
contributing to load average, so I suspect that the process has not in
fact exited yet, it's still running, that's where the load comes from.
Does ps(1) actually show the 'Z' for zombie?

-Phil



Messages sorted by: Reverse Date, Date, Thread, Author