Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: How to misplace an entire pipeline

Two patches inline with other discussion below.

On Aug 7,  2:43pm, Bart Schaefer wrote:
} (By the way, arguably "wait" ought to continue the job if it's stopped,
} which it does if you wait for it by job number but does not if you wait
} for it by PID.)

Patch for that follows; I'm not sure the makerunning() is correct:

--- ../zsh-forge/current/Src/jobs.c	2011-06-14 19:54:57.000000000 -0700
+++ Src/jobs.c	2011-08-07 18:47:58.000000000 -0700
@@ -1933,12 +1933,19 @@
 	    Process p;
 	    if (findproc(pid, &j, &p, 0)) {
-		/*
-		 * returns 0 for normal exit, else signal+128
-		 * in which case we should return that status.
-		 */
-		retval = waitforpid(pid, 1);
-		if (!retval)
+		if (j->stat & STAT_STOPPED) {
+		    retval = (killjb(j, SIGCONT) != 0);
+		    if (retval == 0)
+			makerunning(j);
+		}
+		if (retval == 0) {
+		    /*
+		     * returns 0 for normal exit, else signal+128
+		     * in which case we should return that status.
+		     */
+		    retval = waitforpid(pid, 1);
+		}
+		if (retval == 0)
 		    retval = lastval2;
 	    } else if (isset(POSIXJOBS) &&
 		       pid == lastpid && lastpid_status >= 0L) {

} Now, here's an interesting tidbit:
} torch% jobs %?foo
} jobs: job not found: ?foo
} torch% jobs %?sleep
} jobs: %?sleep: no such job
} Note the difference?  The latter message means that getjob() found the
} pipeline, but either it's _not_ STAT_INUSE or it _is_ STAT_NOPRINT.  So
} I think what we have here is a simple failure to communicate.

The following is clearly not a complete fix and maybe is even wrong if
a different problem is fixed elsewhere, but this at least allows the
suspended pipeline to be manipulated with jobs/fg/bg/wait.

--- ../zsh-forge/current/Src/exec.c	2011-07-27 01:13:48.000000000 -0700
+++ Src/exec.c	2011-08-07 19:07:59.000000000 -0700
@@ -2845,7 +2845,9 @@
 	/* This is a current shell procedure that didn't need to fork.    *
 	 * This includes current shell procedures that are being exec'ed, *
 	 * as well as null execs.                                         */
-	jobtab[thisjob].stat |= STAT_CURSH|STAT_NOPRINT;
+	jobtab[thisjob].stat |= STAT_CURSH;
+	if (!jobtab[thisjob].procs)
+	    jobtab[thisjob].stat |= STAT_NOPRINT;
     } else {
 	/* This is an exec (real or fake) for an external command.    *
 	 * Note that any form of exec means that the subshell is fake *

Even with this patch, "... | read" followed by ^Z results in "read" left
waiting for a pipeline that has been stopped.  What seems to be going on
here is that a job table entry was created for the pipeline and the
forked-off left side made the group leader, but then that job table
entry is overloaded to represent the current shell builtin that is also
running as the right-hand-side.

When the terminal driver generates a TSTP it hits the forked-off left
side (via the group leader).  This comes through zhandler() into
update_job() which does notice that the job is STAT_CURSH, but in this
case list_pipe == 0 because "read" is an ordinary builtin, not a loop
or other compound construct; so STAT_CURSH is ignored, and the group
leader is attached to the tty (I believe that's a no-op as it already
is attached).

The other side-effect of this snippet:

	if (jn->stat & STAT_CURSH)
	    inforeground = 1;
	else if (job == thisjob) {
	    lastval = val;
	    inforeground = 2;

is that inforeground != 2, so at the end of update_job() when checking
to see if the current shell should pretend to have been signaled, the
test fails; but that doesn't matter for TSTP because only INT and QUIT
are handled specially.

When "read" is the tail of the pipe, the above all happens behind the
scenes and then the I/O system call gets restarted, which is how the
shell ends up stuck.  I'm not sure how to escape from that, except
maybe to have zhandler() kill the shell with a different signal from
which the system call will not recover.

When something like "true" is the tail of the pipe, we return into
execpline at line 1500 (from waitjobs()), where list_pipe_job is set
but list_pipe and list_pipe_child are not.  If all three were nonzero,
a dummy shell would be forked off as PWS described to act as the
suspended job, but instead execpline() simply returns because the
last job in the pipeline has exited.

The only obvious thing I can think to do here is to note in zhandler()
that we have STAT_CURSH but not list_pipe, and therefore SIGCONT the
left-hand-side immediately and return as if no signal had occurred
(possibly printing a warning about not being able to suspend the job,
which is what happens elsewhere if pipe() or fork() fails).  However,
that could lead to a serious busy-loop if somehow TTIN or TTOU was
the signal instead of TSTP.

Messages sorted by: Reverse Date, Date, Thread, Author