Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Multi-core loops



Hi, I've been a very happy user of Zsh for the last 18 years (!).
Regretfully I haven't been on this list for many years, and now I have
resubscribed to propose a simple, but I think useful, feature for zsh.

Zsh, like all shells, lets you easily do something many times in a loop.
E.g.,

	for i in ...
	do
		dosomething $i
	done
	
But when "dosomething" is CPU intensive, this is *not* what you'd want to
do on a multi-core (multi CPU) machine, which have more-or-less become
standard nowadays...
Such a loop would only use one of the CPUs, and leave the other(s) unused.
Instead, you'll want to keep all CPUs busy all the time, running M (=number
of CPUs) processes at the same time.

This idea has been raised before on this list by others - one thread I found
dates back to 10 years ago,
	http://www.zsh.org/mla/users/1999/msg00644.html
and another one from 7 years ago
	http://www.zsh.org/mla/users/2002/msg00117.html

But at the time, I guess that the whole concept of multi-CPU machines sounded
esoteric. This is no longer the case, most people nowadays have multi-CPU
machines, and probably run into this issue often. I know I do. So I believe
zsh should make it easy to handle this useful case easily.

The first thread I cited suggested adding a loop new syntax, e.g.

	for i in * PARALLEL N ; do job $i ; done

I think this is a very interesting idea (not necessarily with that syntax),
and I think among all the other options I'll mention below, this is probably
the best one. However, I fear that it may be harder for the developers to
accept than the other options below because it involves new syntax and
possibly quite a bit of code (because of all the different types of loops that
are involved). I wonder what other people think - are we ready for a new
syntax for this multi-process loop feature?

If there is a chance that this option will be accepted, I will be happy to
volunteer to write a patch.

The second option, suggested in both threads, requires the user to write more
code, along the lines of this pseudo-code:

	for i in ...
	do
		if ((number_of_jobs >= number_of_processors ))
		then
			wait any_job
		fi
		command &
	done
	
The problem with this is that "wait" currently has no way to ask to wait
for just one job - it can either wait for a specific job, or *all* jobs to
finish. I wonder if there is a reason not to add such a feature?

Because the lack of such a "wait for any job" feature, Bart Schaefer
suggested in the first thread an eleborate technique involving a coprocess
to do something similar.

A somewhat similar option I'd like to propose is to add a builtin, or better
yet, a new option for the existing builtin "jobs". "jobs -w 4" will wait
until there are 4 or less jobs in the job-control list. Then the 4-cpu loop
is as easy as writing:

	for i in ...
	do
		jobs -w 4
		dosomething $i &
	done


Another possibility I wanted to raise is adding a new parameter, say
MAXBACKGROUND; If that parameter is set to 4, then any time you run a
"command &" when there are already 4 jobs in the job-control list,
instead of forking immediately zsh first waits for one of the previous jobs
to finish, and only then runs the command line. With this parameter set,
the multi-CPU loop becomes trivial:

	for i in ...
	do
		dosomething $i &
	done



Any thoughts?

Thanks,
Nadav.

-- 
Nadav Har'El                        |       Sunday, Jun 28 2009, 6 Tammuz 5769
nyh@xxxxxxxxxxxxxxxxxxx             |-----------------------------------------
Phone +972-523-790466, ICQ 13349191 |In Fortran, God is real unless declared
http://nadav.harel.org.il           |an integer.



Messages sorted by: Reverse Date, Date, Thread, Author