Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: special characters in file names issue



On Sat, Nov 11, 2023 at 10:28 AM Jim <linux.tech.guy@xxxxxxxxx> wrote:
>
>>       local i files fname hash orig
>>       files=( $(shasum -ba 256 -- "$@") ) || return
>>
>> This code has an added advantage of forking only once. It also handles
>> file names with backslashes and linefeeds in them.
>
> there are some issues. The files I'm working on are in excess of 96K, and most
> utilities, including shasum, report the input line is too long.

If you're already putting the hashes in a gdbm, it should be possible
to write a zargs command to automatically batch them up and populate
the database.  Once that's working on a few files as a test case, you
can use zargs -P N to run N copies of the hashing job at once.

> So a few changes
> are needed. Even with "groups" of files, shasum takes over two and half hours
> to do 96K.

For your purposes, do you need to generate a hash of the file contents
(which shasum is doing) or just hash the file name to hide special
characters?  Roman's example needs the former because it is searching
for duplicated content.




Messages sorted by: Reverse Date, Date, Thread, Author