Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: Using file lines as "input files"

X-seq: zsh-users 27878
From: Dominik Vogt <dominik.vogt@xxxxxx>
To: Zsh Users <zsh-users@xxxxxxx>
Subject: Re: Using file lines as "input files"
Date: Sat, 9 Jul 2022 00:17:57 +0100
Archived-at: <https://zsh.org/users/27878>
In-reply-to: <CAH+w=7betuB2tG+RrkYiqsQ66_3xKwfK1kSkVyY-SBRcVjqhvQ@mail.gmail.com>
List-id: <zsh-users.zsh.org>
Mail-followup-to: Zsh Users <zsh-users@xxxxxxx>
References: <Ysiacl2I2eYF+uY4@gmx.de> <CAH+w=7betuB2tG+RrkYiqsQ66_3xKwfK1kSkVyY-SBRcVjqhvQ@mail.gmail.com>
Reply-to: dominik.vogt@xxxxxx

On Fri, Jul 08, 2022 at 03:04:31PM -0700, Bart Schaefer wrote:
> On Fri, Jul 8, 2022 at 1:58 PM Dominik Vogt <dominik.vogt@xxxxxx> wrote:
> >
> > Disclaimer: I _know_ this can be done in seconds with perl /
> > python, but I like to not rely on scripting languages when the
> > shell can do the job.
>
> This is sort of like saying "I like to not rely on hiking boots when
> shoes can do the job."

Actually, for me, scripting languages are the "shoes" because they
don't interact very well with the command pipeline, unless you
spend an absurd amount of work to make them do so.  Calling
commands for everything can be slower, but most of the time it's
just a symptom of bad scripting.  GNU coreutils are faster than
anything I'll ever be willing to code (or any perl or python
script or C or C++ library for that matter).  The trick is keeping
the process spawning overhead low.

> >   $ chksum Fline1 Fline2 Fline3 ... Fline265000
> >
> > (Of course without actually splitting the input file
>
> If "not actually splitting" means what it seems to mean, and you
> literally want to run cksum, the answer is no.

Right.

This does the job pretty well, relying entirely on existing Unix
tools:

  ulimit -s 100000
  split -l 1 "$INPUTF" ff
  cksum ff*
  rm ff*

That cuts runtime down to seven seconds instead of four minutes,
at the cost of a fem hunred MB on the RAM disk.  Splitting the
source file and removing the fragments takes about three to four
seconds.

Thanks for the comments which put me on the right track.

--

(I prefer to have a huge stack size anyway to be able to do things
like "grep foobar **/*(.)".)

Ciao

Dominik ^_^  ^_^

--

Dominik Vogt

Follow-Ups:
- Re: Using file lines as "input files"
  - From: Mikael Magnusson

References:
- Using file lines as "input files"
  - From: Dominik Vogt
- Re: Using file lines as "input files"
  - From: Bart Schaefer

Messages sorted by: Reverse Date, Date, Thread, Author