Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Shell job structure

X-seq: zsh-workers 31803
From: Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
To: zsh-workers@xxxxxxx
Subject: Shell job structure
Date: Tue, 8 Oct 2013 21:44:37 +0100
In-reply-to: <131007074049.ZM32707@torch.brasslantern.com>
List-help: <mailto:zsh-workers-help@zsh.org>
List-id: Zsh Workers List <zsh-workers.zsh.org>
List-post: <mailto:zsh-workers@zsh.org>
Mailing-list: contact zsh-workers-help@xxxxxxx; run by ezmlm
References: <CADv1Z=pM7H4Xg9+GyWd4zw0cv0mXfbJvqip6vc7e_yrXRN=1sg@mail.gmail.com> <20131005223159.25fea6a0@pws-pc.ntlworld.com> <131006173621.ZM31831@torch.brasslantern.com> <20131007102529.5354f342@pwslap01u.europe.root.pri> <131007074049.ZM32707@torch.brasslantern.com>

On Mon, 07 Oct 2013 07:40:49 -0700
Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx> wrote:
> - because external jobs can exit and be reaped in arbitrary order, even
>   in a pipeline, the job table is used to keep track of which position
>   in the array to update
> 
> - jobs that run in the current shell don't have a complete job table entry
>   [cf. all the gyrations to suspend/background a loop] and aren't "reaped"
>   in the same code path as external jobs, so the wrong array position (or
>   none at all) may get updated
> 
> - the incorrect update depends on whether the external job exited AND got
>   reaped before the current-shell job has completed, because of the way
>   reaping updates the job table, so correctness is unpredictable
> 
> - complex commands "in the current shell" may have external subjobs that
>   need a separate pipestatus (this applies only at end of a pipeline)

This may be pie in the sky, but probably what we need is for jobs to be
hierarchical.  A job at the top level would always be treated as
separate from another job at the top level for the original purpose of
job control but might include nested jobs, representing objects such as
functions and chunks of pipelines, that might themselves have jobs
representing external commands.  Furthermore, a job would become a first
class representation of shell state, so anything just involving shell
builtins would have a job, removing the obfuscation in the current
relationship between jobs and code execution.  It would thus in
principle allow better encapsulation of other shell state.

The job table would become a list of top-level jobs, while the job
structure would have pointers to nested jobs.  We might get rid of the
table all together and simply have a single pointer that got searched
for top-level jobs the same way those jobs got searched for subjobs.  If
this was inefficient for searching we could hash PIDs and the like; if
it was inefficient for memory management (but I don't see why it should
be, particularly, compared with memory management for anything else) we
could still have a pool of job structures.

I think this means we'd have a status associated with a job, not just a
process.  How you got the job status would depend on the nature of the
job.

I think a job would be one of:

- An external command with one main process and possibly auxiliary
processes; the status of the job is just that of the process.  This is
sort of a degenerate case of a pipeline but I'm wondering it might be
neater to make it just one part of a pipeline so auxiliary processes get
tied to the right main process.  This also makes an external command in
a pipeline more similar to the case of a shell construct in a pipeline,
which we know needs to look like a single job (in this new sense);
that's where we came in.

- A shell structure of code being executed together (details such as
where we need new subjobs TBD) such as a function or a complex command
typed at the command line or in a particular part of a pipeline --- this
is roughly what's referred to as a "list" in the code.  This would have
arbitrarily complicated subjobs which would vary depending on what was
being executed within the structure.  There would be no external process
associated with the top level job unless we explicitly forked to make
this happen; putting the current shell job into the background should
now be as natural as putting a chunk of shell code into a pipeline not
as the last element.  The status of the job is the status of the last
command to execute (which may be a "return" if it's a function).

- A pipeline, a set of jobs (in this new sense --- maybe we need a
better name to separate this from job control) each of which was one of
the above two types.  (I don't think pipelines nest directly in
pipelines, you need some intervening shell structure otherwise they
degenerate to the same pipeline.)  The status of the job is either the
status of the last subjob in the pipeline (NO_PIPE_FAIL) or uses the
PIPE_FAIL rules.

If it worked to the extent of allowing the removal of things like
list_pipe and STAT_SUPERJOB it would have served its purpose well.

We might even be able to move over to this gradually.  As long as we can
still search all jobs, we could introduce the new structures with the
existing logic.  For example, the pipestatus code could for now search
every process associated with the current job, but would gradually get
rewritten to pick out subjobs at the appropriate level.  You might hope
that as this process went on it would become easier to ask the question
"when has this job finished"?

Possibly the most unpleasant bit of this is making the hierarchy of
calls in exec.c agree with this new structural hierarchy.  It might be
messy enough to put the kibosh on the whole thing --- I don't understand
execpline() and execpline2() and I think it's necessary to do so.

Whether this is ever going to come about...

pws

-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/

Follow-Ups:
- Re: Shell job structure
  - From: Bart Schaefer

References:
- Re: No pipefail option?
  - From: Bart Schaefer
- Re: No pipefail option?
  - From: Peter Stephenson
- Re: No pipefail option?
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author