Zsh Mailing List Archive Messages sorted by: Reverse Date, Date, Thread, Author

Re: GetfFilesize with highest performance

X-seq: zsh-users 11331
From: Bart Schaefer <schaefer@xxxxxxxxxxxxxxxx>
To: zsh-users@xxxxxxxxxx
Subject: Re: GetfFilesize with highest performance
Date: Sat, 24 Mar 2007 10:24:29 -0700
In-reply-to: <20070324143954.GC6685@solfire>
Mailing-list: contact zsh-users-help@xxxxxxxxxx; run by ezmlm
References: <20070324143954.GC6685@solfire>

On Mar 24,  3:39pm, meino.cramer@xxxxxx wrote:
}
}  What would be the best way to get the size of a file
}  performancewise?

You'd have to try it different ways and compare.  Using a zsh glob
will potentially read status from each of the files and then you'd
still have to read it the status again to capture the size.  On the
other hand "find ... -ls" would probably only stat the file once,
but then you have to text-process the output.

So which one is better probably depends on how fast your CPU is as
compared to how fast your disk is, and whether your OS does file
system caching, etc.

The way to do it wth a zsh glob is:

    zmodload -i zsh/stat
    stat -n +size **/*

(Adjust the glob as necessary to match the files you care about.)

On my machine I can get sizes for 9300 files through 1400 directories
in 0.293 seconds by globbing, 0.05 seconds slower than "find . -ls".
However, if I change it to **/*(.) vs. "find . -type f -ls" then find
speeds up a little and globbing slows down a little.

The next consideration is what you want to do with the sizes after you
have them.  If you want to do anything else with them in the shell,
the zsh/stat method gives you a hook:

    typeset -A sizes
    stat -A sizes -n +size **/*

This is an undocumented trick; -A expects to assign elements to an
array, but by a quirk of the way that's done internally, if the array
is a hash the pairwise assignment works just as if you had done:

    typeset -a tmp
    stat -A tmp -n +size **/*
    typeset -A sizes
    sizes=($tmp)

Now ${(k)sizes} gives the names and ${(v)sizes} gives the sizes, and
${sizes[somefile]} is the size of somefile.  This is almost certainly
faster than reading and parsing the output of find.  Once you have
this you can do other tricks like:

    print Total size of all files is $(( ${(j:+:v)sizes} ))

(My 9300 files are 1360562692 bytes.)

References:
- GetfFilesize with highest performance
  - From: meino . cramer

Messages sorted by: Reverse Date, Date, Thread, Author