Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Surprising behaviour with numeric glob sort

On Jun 2, 10:03am, Stephane Chazelas wrote:
} $ echo *(n)
} zsh-10 zsh2 zsh10 zsh-3
} (here in my en_GB.UTF-8 GNU locale)
} is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so
} I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is
} the basis of my proposal. In any case, zsh-3 should come before
} zsh-10, nobody can argue against that.

Well, one could argue that "-10" should be treated as negative ten
and therefore should sort before negative three, but I'm not sure
we want to get into that.

The reason you get the result above is of course because most sort
algorithms assume there is a total order and therefore that it is
not necessary to compare every possible pairing of elements.

Your proposal was
> } break down the strings
> } between non-numeric and numeric parts and use strcoll() on the
> } non-numeric and number comparison on the numeric parts

As far as I can tell that's exactly zstrbcmp in zle_tricky.c does.
zstrcmp in sort.c on the other hand first attempts strcoll and
only compares numeric parts if it can find corresponding numeric
substrings in both input strings.  That is, "zsh-3" is never
compared numerically to "zsh2" because "zsh2" and "zsh-" are
considered already to differ.

In either case, if zstrcmp or zstrbcomp find a digit, they consume
more digits until they hit a non-digit or two not-equal digits, and
then look both backward and forward for digits to calculate the
numeric value for comparison.

So I think what you propose is that when "zsh1" is found to have a
difference with "zsh-", the algorithm should look forward across
"zsh-" to find "3" and at that point end up comparing "10" to "3"?
That would lead to the order in your example becoming
    zsh2 zsh-3 zsh10 zsh-10.

However, that would also mean that in strings with different sets
of numeric substrings the numeric comparisons might be be "detected"
after different prefixes for different pairs of strings; I think
the result there might be even more confusing, but I can't come up
with a specific example.

It also means having to copy non-numeric substrings during every
comparison, so as to be able to call strcoll without modifying the
input strings.  (What's the alternative?)  This would probably make
sorting prohibitively slow.

Messages sorted by: Reverse Date, Date, Thread, Author