Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Compare two (or more) filenames and return what is common between them



On Tue, 18 Mar 2014 03:05:27 -0400
TJ Luoma <luomat@xxxxxxxxx> wrote:
> What I am trying to do:
> 
> Given a folder/directory full of files (and, possibly, some existing
> folders/directories), I want to create folders which will group files
> with similar files names, but which will leave folders alone.

I'm still not quite sure after reading your description what it is you
want, but below is a function for you to play with.  It deals with array
entries rather than files, but fixing that part should be
straightforward.  Somewhere you'll have a '*(.)' pattern to select all
the regular files in a directory, somewhere else a mkdir or possibly mkdir -p,
and somewhere else a mv.

The upshot is that for the input

  "One Two Nineteen"
  "One Two Three"
  "One Two Buckle My Shoe"
  "One Two Buckle My Belt"
  "One Three Four"
  "Two Three Sixteen"
  "Two Three Seventeen"
  "Three Forty Five"

it prints

  Extracting common prefixes 'One Two Buckle My'...
  'One Two Buckle My Shoe' goes in directory 'One Two Buckle My'
  'One Two Buckle My Belt' goes in directory 'One Two Buckle My'
  Extracting common prefixes 'One Two', 'Two Three'...
  'One Two Nineteen' goes in directory 'One Two'
  'One Two Three' goes in directory 'One Two'
  'Two Three Sixteen' goes in directory 'Two Three'
  'Two Three Seventeen' goes in directory 'Two Three'
  Unmatched files:
  'One Three Four'
  'Three Forty Five'

which may or may not be what you want.  I handled suffixes by stripping
off everything from the earliest "." to an end before looking for common
prefixes.

I have to admit I was within an ace of switching to Ruby for this.


##start
emulate -L zsh
setopt extendedglob

local -a words match mbegin mend split restwords

words=(
	"One Two Nineteen"
	"One Two Three"
	"One Two Buckle My Shoe"
	"One Two Buckle My Belt"
	"One Three Four"
	"Two Three Sixteen"
	"Two Three Seventeen"
	"Three Forty Five"
)

typeset -A groups foundgroups
integer maxwords
local word initial pat make

for word in $words; do
  initial=${word%%.*}
  split=(${=initial})
  if (( ${#split} > maxwords )); then
    maxwords=${#split}
  fi
done

words_getinitial() {
  local word=$1
  initial=${word%%.*}
  if (( maxwords > 1 )); then
    pat="(#b)(([^[:blank:]]##[[:blank:]]##)(#c$((maxwords-1)))([^[:blank:]]##))"
  else
    pat="(#b)([^[:blank:]]##)"
  fi
  initial=${(M)word##${~pat}}
}
# functions -T words_getinitial

while (( maxwords && ${#words} )); do
  restwords=()
  groups=()
  foundgroups=()
  for word in $words; do
    words_getinitial $word
    [[ -z $initial ]] && continue
    if [[ -n $groups[$initial] ]]; then
      foundgroups[$initial]=1
    else
      groups[$initial]=1
    fi
  done
  if (( ${#foundgroups} )); then
    print "Extracting common prefixes '${(kj.', '.)foundgroups}'..."
    for word in $words; do
      words_getinitial $word
      if [[ -z $initial ]]; then
	restwords+=($word)
      elif [[ -n $foundgroups[$initial] ]]; then
	print "'$word' goes in directory '$initial'"
      else
	restwords+=($word)
      fi
      words=($restwords)
    done
  fi
  (( maxwords-- ))
done

if (( ${#words} )); then
  print "Unmatched files:"
  print "'${(pj.'\n'.)words}'"
fi
##end


-- 
Peter Stephenson <p.w.stephenson@xxxxxxxxxxxx>
Web page now at http://homepage.ntlworld.com/p.w.stephenson/



Messages sorted by: Reverse Date, Date, Thread, Author