Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: deadlock caused by gettext usage in a signal handler



On Fri, 30 Nov 2007 20:35:34 +0100
Guillaume Chazarain <guichaz@xxxxxxxx> wrote:
> I just had a Zsh process (using zsh-4.2.6-6.fc7) deadlock, the
> backtrace seems to show it is initializing the gettext infrastructure
> to print "Input/output error" in a signal handler.

OK, so after Guillaume's last point I've been looking at this at a more
fundamental level.

The shell tries to queue signals any time it might be doing something
that could causes problems in the signal handler.  This seems to be
reasonable; at least it's a long time since we had an obvious bug with
this (we've had plenty in signals more widely).

Looking to see why what was going on here wasn't safe I noticed...

...
> #12 0x0000000000465540 in zhandler (sig=17) at signals.c:521
> #13 <signal handler called>
> #14 0x0000003c3f030afa in *__GI___sigsuspend (set=0x7fff630adc60)
>     at ../sysdeps/unix/sysv/linux/sigsuspend.c:63
...
> #23 0x000000000040ddb6 in zexit (val=1, from_where=0) at builtin.c:4187
> #24 0x0000000000465637 in zhandler (sig=-4) at signals.c:540
> #25 <signal handler called>
> #26 0x0000003c3f097642 in __libc_fork ()
>     at ../nptl/sysdeps/unix/sysv/linux/fork.c:127

The shell is running at a supposedly not critical point (actually
forking) when it gets a signal.  I don't know what -4 is supposed to be,
but possibly it's SIGINT with some extra flags (only SIGHUP, SIGINT and
SIGALRM call zexit()).  Then it tries to exit, running the exit
scripts.  The problem happens when it's handling a SIGCHLD from
something it's running.

I still don't understand why that's hairy here, however.  The first
zhandler() has basically finished what it's doing and handed over to
zexit() to exit the shell.  That leaves me wondering if forking might be
the problem; do we need to queue signals around there?  It's not obvious
why that would be.

There remains my simple plan B of running strerror() once immediately
after setting the locale to do any one-off initialization, but I'm
starting to think the issue is more widespread.

-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/



Messages sorted by: Reverse Date, Date, Thread, Author