Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: find duplicate files



On Mon, Apr 8, 2019 at 4:18 AM Charles Blake <charlechaud@xxxxxxxxx> wrote:
>
> >I find that a LOT more understandable than the python code.
>
> Understandability is, of course, somewhat subjective (e.g. some might say
> every 15th field is unclear relative to a named label)

Yes, lack of multi-dimensional data structures is a limitation on the
shell implementation.

I could have done it this way:

names=( **/*(.l+0) )
zstat -tA stats $names
sizes=( ${(M)stats:#size *} )

I chose the other way so the name and size would be directly connected
in the stats array rather than rely on implicit ordering (to one of
your later points, bad things happen with the above if a file is
removed between generating the list of names and collecting the file
stats).

> >unless you're NOT going to consider linked files as duplicates you
> >might as well just compare sizes.  (It would be faster to get inodes
>
> It may have been underappreciated is that handling hard-link identity also
> lets you skip recomputing hashes over a hard link cluster

Yes, this could be used to reduce the number of names passed to
"cksum" or the equivalent.

> Almost everything you say needs a "probably/maybe"
> qualifier.  I don't think you disagree.  I'm just elaborating a little
> for passers by.

Absolutely.  The flip side of this is that shells and utilities are
generally optimized for the average case, not for the extremes.



Messages sorted by: Reverse Date, Date, Thread, Author