Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Performance of _store_cache and _retrieve_cache



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have profiled this a bit, using:

    % valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes Src/zsh -f
    ==1513== Callgrind, a call-graph generating cache profiler
    ==1513== Copyright (C) 2002-2013, and GNU GPL'd, by Josef Weidendorfer et al.
    ==1513== Using Valgrind-3.10.0.SVN and LibVEX; rerun with -h for copyright info
    ==1513== Command: Src/zsh -f
    ==1513== 
    --1513-- warning: L3 cache found, using its data for the LL simulation.
    ==1513== For interactive control, run 'callgrind_control -h'.
    lenny% source ~/.zcompcache/pip_allpkgs.slow  
    lenny% 
    ==1513== 
    ==1513== Events    : Ir Dr Dw I1mr D1mr D1mw ILmr DLmr DLmw
    ==1513== Collected : 16367431118 6050585484 1493446409 47422 484447632 867383 8162 142249 187982
    ==1513== 
    ==1513== I   refs:      16,367,431,118
    ==1513== I1  misses:            47,422
    ==1513== LLi misses:             8,162
    ==1513== I1  miss rate:            0.0%
    ==1513== LLi miss rate:            0.0%
    ==1513== 
    ==1513== D   refs:       7,544,031,893  (6,050,585,484 rd + 1,493,446,409 wr)
    ==1513== D1  misses:       485,315,015  (  484,447,632 rd +       867,383 wr)
    ==1513== LLd misses:           330,231  (      142,249 rd +       187,982 wr)
    ==1513== D1  miss rate:            6.4% (          8.0%   +           0.0%  )
    ==1513== LLd miss rate:            0.0% (          0.0%   +           0.0%  )
    ==1513== 
    ==1513== LL refs:          485,362,437  (  484,495,054 rd +       867,383 wr)
    ==1513== LL misses:            338,393  (      150,411 rd +       187,982 wr)
    ==1513== LL miss rate:             0.0% (          0.0%   +           0.0%  )
    valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes  Src/zsh -f  491,95s user 0,33s system 92% cpu 8:52,36 total


A screenshot of kcachegrind displaying the hot spot is available at:
http://i.imgur.com/8ntTLUQ.png

I can also provide the callgrind.out.1513 file itself, if this helps.

- From Src/parse.c, line 390:

        for (pp = &ecstrs; (p = *pp); ) {

This following condition is never true (but the most expensive):

            if (!(cmp = p->nfunc - ecnfunc) && !(cmp = strcmp(p->str, s)))
          > 286166892 call(s) to '__strcmp_ssse3' (libc-2.19.so: strcmp.S)
          > Jumping 286 166 892 times to parse.c:393 with 286 166 892 executions
              return p->offs;
            pp = (cmp < 0 ? &(p->left) : &(p->right));

Thanks,
Daniel.

On 08.02.2015 17:19, Daniel Hahler wrote:
> Hi,
> 
> I've noticed that the completion systems cache mechanism
> (_retrieve_cache and _store_cache) is slow with large lists (~50000).
> 
> _store_cache saves the array like this:
> 
>     _zsh_all_pkgs=( '02exercicio' '0x10c-asm'  ... )
> 
> and _retrieve_cache then sources it from a file.
> 
> The problem is that `source ./pip_allpkgs.slow` takes about 8 seconds,
> and is slower than generating the list anew!
> 
> 
> When converting the list to be line-separated, the following is much
> faster (less than a second):
> 
>    _zsh_all_pkgs=(${(f)"$(<pip_allpkgs)"})
> 
> This also applies to using the "formatted"/"typed" source file as-is:
> Even when using the slow list as is:
> 
>    _zsh_all_pkgs=(${$(<pip_allpkgs.slow)})
> 
> 
> The initial list is generated using:
> 
>       _zsh_all_pkgs=( $(curl -s https://pypi.python.org/simple/ \
>         | sed -n '/<a href/ s/.*>\([^<]\{1,\}\).*/\1/p' \
>         | tr '\n' ' ') )
> 
> 
> Regards,
> Daniel.
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iD8DBQFU16qtfAK/hT/mPgARAgNAAJ9y//ybvVz0MPwu9XzxC/6/js2PSACeLgQp
vWt3CCIPbOOeaD0+I0flWWg=
=5aY6
-----END PGP SIGNATURE-----



Messages sorted by: Reverse Date, Date, Thread, Author