Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: num_in_chars incremented after each mbrtowc()



On Sun, 6 Dec 2015 10:37:21 +0100
Sebastian Gniazdowski <sgniazdowski@xxxxxxxxx> wrote:
> Hello,
> while working my hands off on implementing display width handling in
> params.c rather than subst.c I encountered a bug in mb_metastrlenend.
> It will reveal itself only on improper unicode strings.

I don't understand your patch: the change is to increment num_in_char in
exactly the cases where it is deliberately set to 0 later to reflect the
fact we've now got a complete character and the effect is included in
num instead.

Somebody complained about this function a couple of months ago and I
explained, then, too; it suggests it needs some more comments, so I've
added some.

It may be the real difficulty is with the API, in which case you'll need
to say (in words, not videos, please) what you're expecting.  As long as
this is consistently dealt with in callers of the function it might be
possible to change --- I guess you're only worried about the case for
returning a width, which is uncommon in the code and indeed doesn't
really have a well defined result for incomplete/invalid characters.
Maybe you have a particular strategy in mind.

Consequently I haven't looked at your other patch.

pws

diff --git a/Src/utils.c b/Src/utils.c
index d131383..45f8286 100644
--- a/Src/utils.c
+++ b/Src/utils.c
@@ -5179,6 +5179,17 @@ mb_metastrlenend(char *ptr, int width, char *eptr)
 	ret = mbrtowc(&wc, &inchar, 1, &mb_shiftstate);
 
 	if (ret == MB_INCOMPLETE) {
+	    /*
+	     * "num_in_char" is only used for incomplete characters.  The
+	     * assumption is that we will output this octet as a single
+	     * character (of single width) if we don't get a complete
+	     * character; if we do get a complete character, num_in_char
+	     * becomes irrelevant and is set to zero.
+	     *
+	     * This is in contrast to "num" which counts the characters
+	     * or widths in complete characters.  The two are summed,
+	     * so we don't count characters twice.
+	     */
 	    num_in_char++;
 	} else {
 	    if (ret == MB_INVALID) {



Messages sorted by: Reverse Date, Date, Thread, Author