Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Surprising behaviour with numeric glob sort



2017-06-03 17:07:24 -0700, Bart Schaefer:
> On Jun 3, 10:16pm, Stephane Chazelas wrote:
> }
> } When comparing "zsh-3" with "zsh2", we compare the non-numeric
> } prefix: "zsh-" and "zsh". And already, at that point, "zsh" is
> } less than "zsh-", so we stop here (zsh2 < zsh-3)
> 
> Explain how to do that without either re-implementing strcoll(),
> copying the input strings into temporaries, or requiring the input
> strings to be writable so that we can plug in temporary '\0' bytes
> after each non-numeric substring, and at that point we can discuss
> using this algorithm.  Otherwise it's going to be unusuably slow
> for any moderately large input set.

"Slow" (though probably quite negligible compared to strcoll()
which already does things like in addition to a lot more hard
wark) but working.

Copying into temporaries (for instance allocated on stack as
twice the size of the input was what I had in mind), either in
the comparing function, or prepare the list before by linking
each string to its prepared form (though some of that
"preparation" may not be needed in the end).

I quite like the zero-padding approach myself, though if we want
to allow numbers of any width, we'd need to do two scans of the
list, once to find the widest number and a second time to pad
(and memory allocation is more complicated).


> Otherwise you're describing what's already done:  strcoll() is used,
> and then the non-numeric prefixes of the two strings are compared,
> and only if the non-numeric parts are identical is numeric comparison
> applied.  (Of course this is already slightly wrong, because it means
> the non-numeric parts have to be *identical* not merely collated the
> same, but to fix that we're back to "re-implement strcoll()".)

I don't see how that's more "re-implement strcoll" than what zsh
already does.

What do you propose?

$ print -rl -- *(n)
zsh+10
zsh2
zsh10
zsh+3

is broken (and not slightly IMO) and needs fixed.

We can already fix that numerical sort by hand with

$ n() REPLY=${REPLY//(#m)<->/${(l:20::0:)MATCH}}
$ print -rl -- *(o+n)
zsh2
zsh+3
zsh10
zsh+10
$ (LC_ALL=C; print -rl -- *(o+n))
zsh+3
zsh+10
zsh2
zsh10

But that's obviously less "efficient" than if zsh did it
internally.

-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author