Mailing-List: contact zsh-workers-help@zsh.org; run by ezmlm
Precedence: bulk
X-No-Archive: yes
List-Id: Zsh Workers List <zsh-workers.zsh.org>
List-Post: <mailto:zsh-workers@zsh.org>
List-Help: <mailto:zsh-workers-help@zsh.org>
X-Qmail-Scanner-Diagnostics: from mailout4.w1.samsung.com by f.primenet.com.au (envelope-from <p.stephenson@samsung.com>, uid 7791) with qmail-scanner-2.11 
 (clamdscan: 0.99.2/21882. spamassassin: 3.4.1.  
 Clear:RC:0(210.118.77.14):SA:0(-1.3/5.0):. 
 Processed in 0.228418 secs); 18 Jul 2016 10:17:48 -0000
X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on f.primenet.com.au
X-Spam-Level: 
X-Spam-Status: No, score=-1.3 required=5.0 tests=RP_MATCHES_RCVD
	autolearn=unavailable autolearn_force=no version=3.4.1
X-Envelope-From: p.stephenson@samsung.com
X-Qmail-Scanner-Mime-Attachments: |
X-Qmail-Scanner-Zip-Files: |
Received-SPF: none (ns1.primenet.com.au: domain at samsung.com does not designate permitted sender hosts)
X-AuditID: cbfec7f4-f796c6d000001486-66-578cacc365a9
Date: Mon, 18 Jul 2016 11:17:35 +0100
From: Peter Stephenson <p.stephenson@samsung.com>
To: zsh-workers@zsh.org
Subject: Re: Incorrect sorting of Polish characters
Message-id: <20160718111735.6adea125@pwslap01u.europe.root.pri>
In-reply-to: <20160718103329.7acbb1b1@pwslap01u.europe.root.pri>
References:
 <CAJzQX7rXddxjFA2reSiGYHpU9Razo2vwO5F5g-4Ddz-ZjXZsUQ@mail.gmail.com>
 <160716130718.ZM4513@torch.brasslantern.com>
 <20160718103329.7acbb1b1@pwslap01u.europe.root.pri>
Organization: Samsung Cambridge Solution Centre
X-Mailer: Claws Mail 3.7.9 (GTK+ 2.22.0; i386-redhat-linux-gnu)
MIME-version: 1.0
Content-type: text/plain; charset=UTF-8
Content-transfer-encoding: quoted-printable
X-Brightmail-Tracker:
 H4sIAAAAAAAAA+NgFrrELMWRmVeSWpSXmKPExsVy+t/xy7qH1/SEG6w/p2NxsPkhkwOjx6qD
	H5gCGKO4bFJSczLLUov07RK4Mj4sm81acESu4tXbfpYGxvXiXYycHBICJhLzP31mg7DFJC7c
	Ww9kc3EICSxllLh47BEzhDODSeLjgnVMEM45RokXD7ZCZc4yShz6+IcZpJ9FQFWi+/tFMJtN
	wFBi6qbZjCC2iIC4xNm151lAbGEBY4mml8vBangF7CX+7FoNtptTwEFi6Yub7BBDtzNKzNl2
	FqyZX0Bf4urfT0wQB9pLzLxyhhGiWVDix+R7YEOZBdQlJs1bxAxha0s8eXeBFcQWAorfuLub
	fQKj8CwkLbOQtMxC0rKAkXkVo2hqaXJBcVJ6rqFecWJucWleul5yfu4mRkhIf9nBuPiY1SFG
	AQ5GJR7eG2u7w4VYE8uKK3MPMUpwMCuJ8P5Z1RMuxJuSWFmVWpQfX1Sak1p8iFGag0VJnHfu
	rvchQgLpiSWp2ampBalFMFkmDk6pBkZ56wOH4+2kt0nfuGh3lzlnfWGMQN98f7+wHVlzeyez
	tWWWv16y5fOtYPN9UvN2pWzITlC+HtSi9orhfNy0/+laS3u7RBUDJbVcV7S7N+dOqGK6+9R7
	9UHTjlN+Vd7OcU3avxs6Hx9dI6uZJ3k4L8SS4wOjhen9kig/bp6fGYt2M5U7Wf9tU2Ipzkg0
	1GIuKk4EAPYiv61lAgAA
X-Seq: zsh-workers 38879

On Mon, 18 Jul 2016 10:33:29 +0100
Peter Stephenson <p.stephenson@samsung.com> wrote:
> On Sat, 16 Jul 2016 13:07:18 -0700
> Bart Schaefer <schaefer@brasslantern.com> wrote:
> > On Jul 16,  7:17pm, M. Bartoszkiewicz wrote:
> > } I have noticed that some Polish characters
> > } are sorted incorrectly in glob expansion (but
> > } correctly in other contexts).
>=20
> A simple-minded change to pass strcoll() unmetafied versions of the
> strings does seem to fix the problem, so it looks like this is the
> case.  However, that's not the right fix as we only want to unmetafy
> once per input string, not once per comparison, and below the call to
> qsort() there's quite a lot of internal string handling.  An equally
> simple-minded fix around the call to qsort() (saving and restoring the
> strings) didn't seem to work.  So this needs a bit more thought.

Adding an umetafied entry to the glob match that only gets used for
sorting seems to do the trick.  I think an additional single pass
through the array of matches isn't a big deal.  Possibly the sort code
needs a check through to confirm it really is unmeta-friendly for
globbing as there are different ways in.  Any other suggestions?

pws

diff --git a/Src/glob.c b/Src/glob.c
index 2051016..146b4db 100644
--- a/Src/glob.c
+++ b/Src/glob.c
@@ -41,7 +41,10 @@
 typedef struct gmatch *Gmatch;
=20
 struct gmatch {
+    /* Metafied file name */
     char *name;
+    /* Unmetafied file name; embedded nulls can't occur in file names */
+    char *uname;
     /*
      * Array of sort strings:  one for each GS_EXEC sort type in
      * the glob qualifiers.
@@ -911,7 +914,8 @@ gmatchcmp(Gmatch a, Gmatch b)
     for (i =3D gf_nsorts, s =3D gf_sortlist; i; i--, s++) {
 	switch (s->tp & ~GS_DESC) {
 	case GS_NAME:
-	    r =3D zstrcmp(b->name, a->name, gf_numsort ? SORTIT_NUMERICALLY : 0);
+	    r =3D zstrcmp(b->uname, a->uname,
+			gf_numsort ? SORTIT_NUMERICALLY : 0);
 	    break;
 	case GS_DEPTH:
 	    {
@@ -1859,6 +1863,7 @@ zglob(LinkList list, LinkNode np, int nountok)
 	int nexecs =3D 0;
 	struct globsort *sortp;
 	struct globsort *lastsortp =3D gf_sortlist + gf_nsorts;
+	Gmatch gmptr;
=20
 	/* First find out if there are any GS_EXECs, counting them. */
 	for (sortp =3D gf_sortlist; sortp < lastsortp; sortp++)
@@ -1910,6 +1915,29 @@ zglob(LinkList list, LinkNode np, int nountok)
 	    }
 	}
=20
+	/*
+	 * Where necessary, create unmetafied version of names
+	 * for comparison.  If no Meta characters just point
+	 * to original string.  All on heap.
+	 */
+	for (gmptr =3D matchbuf; gmptr < matchptr; gmptr++)
+	{
+	    char *nptr;
+	    for (nptr =3D gmptr->name; *nptr; nptr++)
+	    {
+		if (*nptr =3D=3D Meta)
+		    break;
+	    }
+	    if (*nptr =3D=3D Meta)
+	    {
+		int dummy;
+		gmptr->uname =3D dupstring(gmptr->name);
+		unmetafy(gmptr->uname, &dummy);
+	    } else {
+		gmptr->uname =3D gmptr->name;
+	    }
+	}
+
 	/* Sort arguments in to lexical (and possibly numeric) order. *
 	 * This is reversed to facilitate insertion into the list.    */
 	qsort((void *) & matchbuf[0], matchct, sizeof(struct gmatch),
diff --git a/Test/D07multibyte.ztst b/Test/D07multibyte.ztst
index dedf241..1b1d042 100644
--- a/Test/D07multibyte.ztst
+++ b/Test/D07multibyte.ztst
@@ -562,3 +562,20 @@
   }
   : $functions)
 0:Multibtye handled of functions parameter
+
+  if [[ -n ${$(locale -a 2>/dev/null)[(R)pl_PL.utf8]} ]]; then
+  (
+    export LC_ALL=3Dpl_PL.UTF-8
+    local -a names=3D(a b c d e f $'\u0105' $'\u0107' $'\u0119')
+    print -o $names
+    mkdir -p plchars
+    cd plchars
+    touch $names
+    print ?
+  )
+  else
+    ZTST_skip=3D"No Polish UTF-8 local found, skipping sort test"
+  fi
+0:Sorting of metafied Polish characters
+>a =C4=85 b c =C4=87 d e =C4=99 f
+>a =C4=85 b c =C4=87 d e =C4=99 f

