Zsh Mailing List Archive
Messages sorted by: Reverse Date, Date, Thread, Author

Re: Issues with fcntl() history file locking



On Wed, 2019-02-27 at 13:27 -0800, Bart Schaefer wrote:
> On Wed, Feb 27, 2019 at 10:31 AM Philippe Troin <phil@xxxxxxxx>
> wrote:
> > I've been using zsh with share_history for many years and never had
> > any
> > real issues on several networks where my home directory is mounted
> > over
> > NFS.  Recently, it's been giving me trouble, maybe when I bumped up
> > my
> > history file size to 10k entries.
> > 
> > I then discovered hist_fcntl_lock, which I had not ever set, and
> > turned
> > it on.  It didn't improve anything.
> 
> Well, it wouldn't ... in fact it would likely make things worse.
> flock() historically doesn't work reliably over NFS, and if you turn
> that option on you are disabling the symlink-based file locking that
> is usually more NFS-friendly.  We used to do both kinds of locking
> when hist_fcntl_lock, but workers/32580 reverted to using only one
> kind ... I forget why I was asked to do that, probably something not
> working as fast as was desired.

Not necessarily worse.
While you're right that (BSD) flock() never worked correctly on NFS,
that is not the case with POSIX fcntl() locks.  Zsh uses the later even
though the zsh functions are named flock*.
Also locking the file with fcntl clears the NFS attribute cache for
that inode, making sure that you get the latest data.

> > Unfortunately, POSIX states that the fcntl() lock will be released
> > upon
> > the closing the first descriptor to the file.  [...and thus...]
> > 
> >  * writehistfile writes the history file without lock
> 
> If that were the problem, you'd be likely to see corrupted entries
> (the read stopping somewhere in the middle of what's being written)
> or
> problems when both shells were writing to the file, which would also
> likely manifest as corrupted entries.
> 
> Do the entries from terminal 1 NEVER show up in the file?  Are they
> in
> the file but never show up in the history of terminal 2?  Or are they
> just slow to arrive in terminal 2?
> 
> I'd be more inclined to suspect async NFS issues rather than locking.
> Have you strace'd both processes to see when writes v. reads are
> happening?

The history file never gets corrupted.  What I'm experiencing is loss
of sync for a while.  New commands on host1 never seem to appear (or
take a long time to appear) on host2.
Given this happens randomly, it's hard to catch zsh in the act.

> > The right and hard way is to have the various calls to open() the
> > history file to actually use the flock_fd lock file descriptor (and
> > not
> > close it when done with it, leaving that to unlockhistfile()).
> > 
> > The easy messy way is to keep track of all the open descriptors to
> > the
> > history file in a global variable, and delaying the actual close
> > until
> > unlockhistfile() is called.
> 
> If this actually turns out to be necessary, the second way is more
> similar to how we handle descriptors in other parts of the shell.

I'll do further experiments.

This is my current hunch:  everything is swell as long as lines are
appended to the history file.  But, when one host decides it's time to
trim the history file is when stuff hits the fan.  If someone had an
idea on how to force zsh to trim history reliably, I'm all ears.

Phil.



Messages sorted by: Reverse Date, Date, Thread, Author