Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: fatal flaw zsh 4.0.1 on irix 6.3 & 6.5: suspend "ls -l|less" then resume hangs



[Debugging the execution code is already hard enough on the machine one
sits on front of...]


Anyway, thanks for your help in trying to find the problem!


Timothy Miller wrote:

> ...
> 
>   F S      UID   PID  PPID  PGID   SID  C PRI NI  P    SZ:RSS      WCHAN    STIME TTY     TIME CMD
>  b0 S      tsm 13097 12927 13097 12927  4  39 20  *   722:285   8039d510 10:07:57 ttyq10  0:00 zsh-4.0.1 -f 
>  b0 S      tsm 13100 13097 13100 12927  0  28 20  *   450:124   8039dc80 10:08:00 ttyq10  0:00 less 

Ok, here's the reason why the thing can't be put in the foreground. 
`less' is in its own process group, but should be in the one of the now
deceased `ls'.

First I though this was some race condition, with the `ls' exiting too
fast, but in 14865:

>   F S      UID   PID  PPID  PGID   SID  C PRI NI  P    SZ:RSS      WCHAN    STIME TTY     TIME CMD
>  b0 S      tsm 13962 13961 13962 13905  0  60 20  *   394:114   c06afbc0 11:55:29 ttyq13  0:01 ls -lR / 
>  b0 T      tsm 13963 13961 13963 13905  0  60 20  *    45:25           - 11:55:29 ttyq13  0:00 cat 
>  b0 S      tsm 13961 13905 13961 13905  0  39 20  *   732:297   8039d510 11:55:26 ttyq13  0:00 zsh-4.0.1 -f 

So the question is why pipeline-tails get their own process group
instead of using that of the pipeline leader.

We can't easily look at a diff between 3.1.6 and 4.0.1 to find out which
change might be causing this because there were so many changes.

I can suspect two things: it might be a problem with list_pipe_job or a
similar problem to the one on FreeBSD that was fixed by 11247 (which
*was* a problem showing the same results).

So as a somewhat wild guess, I've build the two patches below -- which
are not to be applied by anyone whose zsh is working.  If you find the
time to try them, I'd like to hear if one of them fixes the problem. 
The first one reverses 11247 (it's the first one because the problem
looks so suspicously similar) and the second one reverses both 14327 and
14503 -- the last changes I made to list_pipe_job.  Please try them
separately (or one after another and then both of them together).


Bye
  Sven

First patch:

diff -u -u -r ../ooz/Src/exec.c ./Src/exec.c
--- ../ooz/Src/exec.c	Thu Jun 14 05:10:12 2001
+++ ./Src/exec.c	Thu Jun 14 05:11:14 2001
@@ -2502,8 +2502,8 @@
 	}
     } else if (thisjob != -1 && cl) {
 	if (jobtab[list_pipe_job].gleader && (list_pipe || list_pipe_child)) {
-	    if (setpgrp(0L, jobtab[list_pipe_job].gleader) == -1 ||
-		killpg(jobtab[list_pipe_job].gleader, 0) == -1) {
+	    if (killpg(jobtab[list_pipe_job].gleader, 0) == -1 ||
+		setpgrp(0L, jobtab[list_pipe_job].gleader) == -1) {
 		jobtab[list_pipe_job].gleader =
 		    jobtab[thisjob].gleader = (list_pipe_child ? mypgrp : getpid());
 		setpgrp(0L, jobtab[list_pipe_job].gleader);


Second patch:

diff -u -u -r ../ooz/Src/exec.c ./Src/exec.c
--- ../ooz/Src/exec.c	Thu Jun 14 05:10:12 2001
+++ ./Src/exec.c	Thu Jun 14 05:15:22 2001
@@ -973,19 +973,16 @@
      * stopped, the top-level execpline() didn't get the pid for the
      * sub-shell because it was overwritten. */
     if (!pline_level++) {
+	list_pipe_job = newjob;
         list_pipe_pid = 0;
 	nowait = 0;
 	simple_pline = (WC_PIPE_TYPE(code) == WC_PIPE_END);
-	list_pipe_job = newjob;
     }
     lastwj = lpforked = 0;
     execpline2(state, code, how, opipe[0], ipipe[1], last1);
     pline_level--;
     if (how & Z_ASYNC) {
 	lastwj = newjob;
-
-        if (thisjob == list_pipe_job)
-            list_pipe_job = 0;
 	jobtab[thisjob].stat |= STAT_NOSTTY;
 	if (slflags & WC_SUBLIST_COPROC) {
 	    zclose(ipipe[1]);

-- 
Sven Wischnowsky                         wischnow@xxxxxxxxxxxxxxxxxxxxxxx



Messages sorted by: Reverse Date, Date, Thread, Author