Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Deadlock when receiving kill-signal from child process



First off, great work on the last patch (zsh-workers 36022) Bart! I've
been testing now with the latest git master and applied said patch on
top. At first I thought there were absolutely no more deadlocks, but I
managed to repeatedly produce one (when _not_ disowning the child
processes). I'm wondering if I'm not getting the complete picture
though since there are some "in ?? ()" also in there (I did build with
debugging enabled though).

My configure options for reference:

--enable-cap
--enable-multibyte
--enable-maildir-support
--with-tcsetpgrp
--enable-zsh-debug
--enable-zsh-mem-debug
--enable-zsh-mem-warning
--enable-zsh-secure-free
--enable-zsh-hash-debug

Trace:

#0  0x00007fff8abf95da in syscall_thread_switch ()
#1  0x00007fff853a982d in _OSSpinLockLockSlow ()
#2  0x00007fff896e1635 in szone_force_lock ()
#3  0x00007fff896e15e6 in _malloc_fork_prepare ()
#4  0x00007fff82cb8097 in fork ()
#5  0x0000000105d3f960 in get_match_ret ()
#6  0x0000000105d45463 in savehistfile ()
#7  0x0000000105d42b4f in iaddtoline ()
#8  0x0000000105d3d69f in gmatchcmp ()
#9  0x0000000105d3c6ce in qualisdev ()
#10 0x0000000105d3bff5 in zglob ()
#11 0x0000000105d423e8 in histstrcmp ()
#12 0x0000000105d41db3 in printreswdnode ()
#13 0x0000000105dba726 in ?? ()
#14 0x0000000105dbac89 in ?? ()
#15 0x0000000105db91ef in ?? ()
#16 0x0000000105db8839 in ?? ()
#17 <signal handler called>
#18 0x00007fff896db4fe in tiny_free_list_remove_ptr ()
#19 0x00007fff896d9b2e in szone_free_definite_size ()
#20 0x0000000105d82ef6 in getredirs ()
#21 0x0000000105d70231 in patmatch ()
#22 0x0000000105d6fd7b in patmatch ()
#23 0x0000000105d6f8d7 in pattryrefs ()
#24 0x0000000105d6e105 in patcompile ()
#25 0x0000000105db8d7d in ?? ()
#26 0x0000000105db85bf in ?? ()
#27 0x0000000105d3cf36 in scanner ()
#28 0x0000000105d7bf51 in paramsubst ()
#29 0x0000000105d47389 in chrealpath ()
#30 0x0000000105d42b4f in iaddtoline ()
#31 0x0000000105d3d69f in gmatchcmp ()
#32 0x0000000105d3c6ce in qualisdev ()
#33 0x0000000105d7ce45 in paramsubst ()
#34 0x0000000105d47389 in chrealpath ()
#35 0x0000000105d42b4f in iaddtoline ()
#36 0x0000000105d3d69f in gmatchcmp ()
#37 0x0000000105d3c6ce in qualisdev ()
#38 0x0000000105d3bff5 in zglob ()
#39 0x0000000105d66d6a in histcharssetfn ()
#40 0x0000000105d6b0f2 in par_cond_2 ()
#41 0x0000000105d1e9b2 in _mh_execute_header ()
#42 0x00007fff8610c5c9 in start ()

On Fri, Aug 7, 2015 at 8:39 AM, Bart Schaefer wrote:
}
} Specifially ferror() is attempting to acquire a mutex lock when the
} signal arrives, and then the handler calls fputc() which tries to
} acquire the same lock, and: clunk.

Aha, ok.

}
} So anytime I find myself using dont_queue_signals() I worry that there
} is a calling scope reason avoid signal handlers.  On the other hand,
} if it's safe to run shell code at all (e.g., runshfunc()), it should
} also be safe to run signal traps.

Thanks for the thorough walkthrough, I understand it a bit better now!

}
} The stack must go further than this?  What called __sigsuspend() ?

Actually that is all that I got out from gdb, I'll try to dig deeper
somehow (if it's possible) next time it happens.

}
} This trace must also go further?  The part shown is calling a handler
} while during fflush() in a previous handler, but the trace doesn't
} go back far enough to see where the first handler was called.

Same as previous, this was all that I got out from gdb.



Messages sorted by: Reverse Date, Date, Thread, Author