Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Memory usage of history?

25.06.2016, 04:49, "Bart Schaefer" <schaefer@xxxxxxxxxxxxxxxx>:
> On Fri, Jun 24, 2016 at 6:47 AM, Dominik Vogt <vogt@xxxxxxxxxxxxxxxxxx> wrote:
>>  (A colleague
>>  says his zshs use 200 MB memory each with a history size of a
>>  million lines).
> To expand on Eric's answer, zsh reads the entire $HISTFILE and retains
> the last $HISTSIZE entries. So a large $HISTFILE also slows down
> startup, even if it doesn't consume lots of memory.
> I can't imagine anyone having a million useful lines of history. A
> few tens of thousands at most. Things he might consider that would
> allow him to reduce SAVEHIST and/or HISTSIZE without losing too much
> information:
> * Set the hist_ignore_all_dups option, if he doesn't already.
> * Set the hist_save_no_dups option, similarly.
> * Define a zshaddhistory function to filter out commands that are
> unlikely to be used again.
> If he isn't already ignoring / not saving duplicates, an interesting
> experiment might be to add hist_ignore_all_dups without changing
> HISTSIZE, then run zsh and see how many lines of history actually end
> up being retained.

Actually there may be better solution: consider the case when zsh

1. allows saving user-defined metada in history file and
2. allows user to get control over what exactly will be removed.

Specifically first may be used to save information about

1. How often the command is used (total number of uses, anything else like “uses per month” would be harder to determine).
2. Time it took command to type (when it was typed for the first time) (time between first self-insert (or $*BUFFER modification if it was constructed by a widget) and accept-line).
3. Last time command was run.
4. Time it took command to finish (average among all runs).
5. What was the exit code (hash exit code - number of times it occurred).

Second is supposed to be a function like `zshhistkey` that returns basically the same thing as function used for `(o+)`: function that accepts history entry with attached metadata (passed through arguments or via a local parameter that is an associative array, meatadata saved by EXTENDED_HISTORY should also be passed) and saves something in $REPLY, history entries with least values in $REPLY will be removed.

On this basis it would be possible to construct a more useful filter, I guess the first three would be enough (when removing history lines, find least often then fastest to type commands and remove them in first place, but always save commands typed during the last hour: `zshhistkey() { REPLY="$(printf "%u-%020u-%020.2g" $[$(date +%s) - $metadata[last_run_time] < 60 * 60 ? 1 : 0] $metadata[num_runs] $metadata[type_duration])"`). EXTENDED_HISTORY already provides 4 (though I do not think it provides “average”) and 3, but I do not find that very useful (especially 4, 3 is needed to protect most recent commands).

Without something like this “set $HISTSIZE and $SAVEHIST to a rather large number” strategy (in addition to the options you mentioned) is the best option, I personally have both set to 65536. I have no idea how one may construct “zshaddhistory function” that “filters out commands that are unlikely to be used again” without somehow know what these stats will be in the future.

Messages sorted by: Reverse Date, Date, Thread, Author