Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Surprising behaviour with numeric glob sort



2017-06-02 16:19:05 -0700, Bart Schaefer:
> On Jun 2, 10:03am, Stephane Chazelas wrote:
> }
> } $ echo *(n)
> } zsh-10 zsh2 zsh10 zsh-3
> } 
> } (here in my en_GB.UTF-8 GNU locale)
> } 
> } is unexpected/broken. "zsh" sorts before "zsh-" in my locale, so
> } I'd expect the zsh2, zsh10 to come before zsh-3, zsh-10 which is
> } the basis of my proposal. In any case, zsh-3 should come before
> } zsh-10, nobody can argue against that.
> 
> Well, one could argue that "-10" should be treated as negative ten
> and therefore should sort before negative three, but I'm not sure
> we want to get into that.

The (my at least) main usage for *(n) is to sort version numbers
like zsh-3.0, zsh-3.1, zsh-4. So handling negative numbers
wouldn't help in those cases.

[...]
> That is, "zsh-3" is never
> compared numerically to "zsh2" because "zsh2" and "zsh-" are
> considered already to differ.
[...]
> So I think what you propose is that when "zsh1" is found to have a
> difference with "zsh-", the algorithm should look forward across
> "zsh-" to find "3" and at that point end up comparing "10" to "3"?
> That would lead to the order in your example becoming
>     zsh2 zsh-3 zsh10 zsh-10.
[...]

No, what I propose is very simple.

When comparing "zsh-3" with "zsh2", we compare the non-numeric
prefix: "zsh-" and "zsh". And already, at that point, "zsh" is
less than "zsh-", so we stop here (zsh2 < zsh-3)

If it was

zsh-3.1 vs zsh-3

["zsh-", 3, ".", 1] vs ["zsh-", 3]

- strcoll(zsh-,  zsh-) => 0
- 3 == 3
- strcoll(".", "") => zsh-3 < zsh-3.1

Now there are some aspects of the current implementation that
one might find useful like:

$ echo *
a a-3.1 a-3+1 a-3.2 a-3+2
$ (LC_ALL=C; echo *)
a a-3+1 a-3+2 a-3.1 a-3.2
$ echo *(n)
a a-3.1 a-3+1 a-3.2 a-3+2
$ (LC_ALL=C; echo *(n))
a a-3+1 a-3+2 a-3.1 a-3.2


The fact that those "-" and "." are ignored in the first
strcoll() pass in some locales makes it for a more "numerical"
sort. Though again, it's easily broken with:

$ touch a-3.10
$ echo *(n)
a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2

Ideally, we'd want to hook into the strcoll() algorithm to
introduce the numerical comparisons in there. Maybe that can be
done using zero-padding like for the above, just do a strcoll()
comparison after transformation (a sort of pre-strxfrm()) of the
strings from:

a a-3.1 a-3+1 a-3.2 a-3.10 a-3+2

to:

a
a-03.01
a-03.01
a-03+01
a-03.02
a-03.10
a-03+02

adjusting the length of the padding as needed.

The above would sort to

a
a-03.01
a-03.01
a-03+01
a-03.02
a-03+02
a-03.10

In my GNU British locale and

a
a-03+01
a-03+02
a-03.01
a-03.01
a-03.02
a-03.10

In the C locale.

-- 
Stephane



Messages sorted by: Reverse Date, Date, Thread, Author