Zsh Mailing List Archive
Messages sorted by:
Reverse Date,
Date,
Thread,
Author
PATCH: fix bug with packing strings into wordcode
- X-seq: zsh-workers 54104
- From: Oliver Kiddle <opk@xxxxxxx>
- To: Zsh workers <zsh-workers@xxxxxxx>
- Subject: PATCH: fix bug with packing strings into wordcode
- Date: Sun, 23 Nov 2025 07:51:07 +0100
- Archived-at: <https://zsh.org/workers/54104>
- List-id: <zsh-workers.zsh.org>
In ecstrcode(), strings are packed up for wordcode by putting them in a
binary search tree to identify duplicates and track offsets. For this
we don't care about sorted order, we just want values to divide 50-50
between left and right in a reproducible way. While debugging something
else, I noticed that the same string was being stored twice where the
hash value was high.
In 41402: a call to hasher() was added as an optimisation. hasher()
returns unsigned but it was stored as an int. Then separately in 44122,
given -fsanitize=undefined diagnostics indicating integer overflows
and undefined behaviour, the two values were cast to long.
Storing it in an int caused it to be be negative and after casting to
long, it would stay negative where the same value from the fresh hash
would be positive because the long cast would come from an unsigned.
After this, the same hash value would not compare equal.
I don't think the long cast is a correct solution on a typical 32-bit
system anyway because long can be the same size as int.
The following sticks with unsigned. We don't care about subtractions
causing under/overflows in this case and my understanding is that for
unsigned they are not considered undefined behaviour. Correct me if
I'm wrong but the final cast to int should be ok too.
Repeating the basic tests from 41402 with repeat and time, this shows a
very slight speed-up. If memory use matters more, substrings could be
reused where they share a common suffix.
Oliver
diff --git a/Src/parse.c b/Src/parse.c
index ab708a279..2b7f8bb59 100644
--- a/Src/parse.c
+++ b/Src/parse.c
@@ -446,7 +446,9 @@ ecstrcode(char *s)
long cmp;
for (pp = &ecstrs; (p = *pp); ) {
- if (!(cmp = p->nfunc - ecnfunc) && !(cmp = (((long)p->hashval) - ((long)val))) && !(cmp = strcmp(p->str, s))) {
+ if (!(cmp = p->nfunc - ecnfunc) &&
+ !(cmp = (int) (p->hashval - val)) &&
+ !(cmp = strcmp(p->str, s))) {
/* Re-use the existing string. */
return p->offs;
}
diff --git a/Src/zsh.h b/Src/zsh.h
index b82c35aea..fb7f6165d 100644
--- a/Src/zsh.h
+++ b/Src/zsh.h
@@ -850,7 +850,7 @@ struct eccstr {
int nfunc;
/* Hash of str. */
- int hashval;
+ unsigned hashval;
};
/*
Messages sorted by:
Reverse Date,
Date,
Thread,
Author