Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: TRAPINT doesn't work reliably



On Thu, 2019-09-26 at 17:10 +0000, Dennis Schwartz wrote:
> I don't fully understand what you mean with "the logic where you're
> defining TRAPINT," but I have the following code in my `.zshrc`:
> 
>     function TRAPINT {
>         VIMODE="$VIINS"
>         print $1  # for debug only
>         return $(( 128 + $1 ))
>     }

I was just wondering if there's more structure than that around,
but I think I was reading too much into what I suspect (see below) is
actually irrelevant information.

> I did manage to capture the bug with valgrind on `master` using the
> following sequence of commands (output tidied):
> 
> $ git checkout master
> $ git checkout zsh-5.7.1 -- Config/version.mk
> $ ./configure --enable-zsh-debug && make && sudo make install
> $ valgrind --leak-check=full --log-file=zsh-valgrind.log /usr/local/bin/zsh
> /usr/share/zsh-antigen/antigen.zsh:2134: parse error near `\n'
> $ ll
> TRAPINT:1: not an identifier:

Thanks, this is exactly what I was asking for.

Obviously TRAPINT is getting screwed up somehow.  Unforunately, I
think the dmaage may have been done too early for this to tell us where.

> Invalid read of size 1
>    at 0x4838CC2: __strlen_sse2 (vg_replace_strmem.c:462)
>    by 0x1B0792: dupstring (string.c:39)
>    by 0x19BC70: ecgetstr (parse.c:2809)
>    by 0x144095: addvars (exec.c:2429)
>    by 0x1404DB: execsimple (exec.c:1237)
>    by 0x140A85: execlist (exec.c:1378)
>    by 0x14038F: execode (exec.c:1194)
>    by 0x14DCB0: runshfunc (exec.c:5980)
>    by 0x14D2E8: doshfunc (exec.c:5830)
>    by 0x1AF4D1: dotrapargs (signals.c:1371)
>    by 0x1AFA8F: dotrap (signals.c:1487)
>    by 0x1AF18C: handletrap (signals.c:1202)

This is saying it's trying to execute your trap.  It's getting into
trouble when it's trying to read in the variable assignment from the
trap.  Either that's the VIMODE="$VIINS" chunk that's been messed up,
or it's already got confused and is guessing what's going on.
I would suspect that actually the main function structure is still
there, since it's otherwise quite unlikely to negotiate the exec
hierarchy down to addvars().  However, it's possible it's also been
erroneously freed but malloc has only grabbed the assignment part of
it for reuse so far.

Does removing that assignment make a difference?  That's just for
testing, obviously.  But given the shell obviously is trying to do an
assignment and that's gone awol, it might tell us something.  (If, for
example, the error now occurs somewhere a bit later it might indicate
that indeed the entire fucntion is free and malloc() is repurposing the
memory piecemeal.)

>  Address 0x566b948 is 0 bytes after a block of size 328 free'd
>    at 0x48369AB: free (vg_replace_malloc.c:530)
>    by 0x13D8F3: zcontext_restore_partial (context.c:108)
>    by 0x13DA56: zcontext_restore (context.c:119)
>    by 0x175A04: parse_subscript (lex.c:1697)
>    by 0x18B7F1: getindex (params.c:1858)
>    by 0x18C132: fetchvalue (params.c:2106)
>    by 0x1B6304: paramsubst (subst.c:2516)
>    by 0x1B1DB9: stringsubst (subst.c:322)
>    by 0x1B1108: prefork (subst.c:142)
>    by 0x14486C: execsubst (exec.c:2570)
>    by 0x1772E9: execfor (loop.c:98)
>    by 0x148469: execcmd_exec (exec.c:3913)
>  Block was alloc'd at
>    at 0x483577F: malloc (vg_replace_malloc.c:299)
>    by 0x13D5D6: zcontext_save_partial (context.c:58)
>    by 0x13D7E9: zcontext_save (context.c:82)
>    by 0x1758A7: parse_subscript (lex.c:1661)
>    by 0x18B7F1: getindex (params.c:1858)
>    by 0x18C132: fetchvalue (params.c:2106)
>    by 0x1B6304: paramsubst (subst.c:2516)
>    by 0x1B1DB9: stringsubst (subst.c:322)
>    by 0x1B1108: prefork (subst.c:142)
>    by 0x14486C: execsubst (exec.c:2570)
>    by 0x1772E9: execfor (loop.c:98)
>    by 0x148469: execcmd_exec (exec.c:3913)

So this stuff is saying, when we performed a substitution we had to save
and restore some memory and we used the chunk that valgrind reported the
error on.  In other words, it had apparently been freed somewhere else
already, so malloc() just grabbed it.  So I don't think the code being
executed here is actually relevant to the original problem, it's just
the unlucky victim that got a chunk that shouldn't have been freed in
the first place.

Unfortunately this doesn't tell us where that happened.  But it does
look like it was actually freed, i.e. the problem isn't something is
stomping on memory owned by something else, it's that the memory was
erroneously given back to the system.  (At least, that's the simple
interpretation.)

Not sure quite where to go from here --- but at least we have something
that's reproducible, which is quite good by the standards of memory
errors.  I think we'll need to add something to the code you're using
that marks the memory in the TRAPINT somehow.  I'll need to think what
seems propitious...

First simple step might be to see if the shell is indeed freeing the
TRAPINT() function code at some point.  That shouldn't be so hard to
find out but it'll need a bit of confection.

cheers
pws



Messages sorted by: Reverse Date, Date, Thread, Author