Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Idea for optimization (use case: iterate string with index parameter)



Hello,
iterating string with index parameter is quite slow, because unicode characters are skipped and counted using mbrtowc(). This function takes much CPU time. It is called theta(0.5 * n^2) times when iterating string of length n. The goal is to make this theta(n).

The idea is to avoid calling mbrtowc by holding array of 5-10 structures with:
- pointer to beginning of string
- requested index (request to getarg() / getindex())
- computed in-string pointer for that index (by getarg() / getindex())

In general, the array would hold #N (5-10 or so) last string-index requests. If new request would target the same string, but index greater by 1, getarg() would call mbrtowc() once (via MB_METACHARLEN macro) reusing the previous in-string pointer.

So the idea is to catch not too large loops (using not too many scalar indexing) into the cache. For example, I saw z-sy-h uses such loops, my projects sometimes use them too. The point is that iterating a string and doing something with letters, e.g. counting brackets, is a very common use case, and the optimization would be triggered often.

I'm sharing as maybe there's better way to do this and I'm not sure when I'll get time to write this, and maybe someone has time and energy to do it. It would solve the rather shocking revelation of a typical C programmer, that strings are iterated like lists :)
--  
Sebastian Gniazdowski
psprint /at/ zdharma.org



Messages sorted by: Reverse Date, Date, Thread, Author