summaryrefslogtreecommitdiff
path: root/doc/doc-misc/Ext-mbx-locking
diff options
context:
space:
mode:
Diffstat (limited to 'doc/doc-misc/Ext-mbx-locking')
-rw-r--r--doc/doc-misc/Ext-mbx-locking400
1 files changed, 400 insertions, 0 deletions
diff --git a/doc/doc-misc/Ext-mbx-locking b/doc/doc-misc/Ext-mbx-locking
new file mode 100644
index 000000000..f1b0523f6
--- /dev/null
+++ b/doc/doc-misc/Ext-mbx-locking
@@ -0,0 +1,400 @@
+ UNIX Advisory File Locking Implications on c-client
+ Mark Crispin, 28 November 1995
+
+
+ THIS DOCUMENT HAS BEEN UPDATED TO REFLECT THE CODE IN THE
+ IMAP-4 TOOLKIT AS OF NOVEMBER 28, 1995. SOME STATEMENTS
+ IN THIS DOCUMENT DO NOT APPLY TO EARLIER VERSIONS OF THE
+ IMAP TOOLKIT.
+
+INTRODUCTION
+
+ Advisory locking is a mechanism by which cooperating processes
+can signal to each other their usage of a resource and whether or not
+that usage is critical. It is not a mechanism to protect against
+processes which do not cooperate in the locking.
+
+ The most basic form of locking involves a counter. This counter
+is -1 when the resource is available. If a process wants the lock, it
+executes an atomic increment-and-test-if-zero. If the value is zero,
+the process has the lock and can execute the critical code that needs
+exclusive usage of a resource. When it is finished, it sets the lock
+back to -1. In C terms:
+
+ while (++lock) /* try to get lock */
+ invoke_other_threads (); /* failed, try again */
+ .
+ . /* critical code here */
+ .
+ lock = -1; /* release lock */
+
+ This particular form of locking appears most commonly in
+multi-threaded applications such as operating system kernels. It
+makes several presumptions:
+ (1) it is alright to keep testing the lock (no overflow)
+ (2) the critical resource is single-access only
+ (3) there is shared writeable memory between the two threads
+ (4) the threads can be trusted to release the lock when finished
+
+ In applications programming on multi-user systems, most commonly
+the other threads are in an entirely different process, which may even
+be logged in as a different user. Few operating systems offer shared
+writeable memory between such processes.
+
+ A means of communicating this is by use of a file with a mutually
+agreed upon name. A binary semaphore can be passed by means of the
+existance or non-existance of that file, provided that there is an
+atomic means to create a file if and only if that file does not exist.
+In C terms:
+
+ /* try to get lock */
+ while ((fd = open ("lockfile",O_WRONLY|O_CREAT|O_EXCL,0666)) < 0)
+ sleep (1); /* failed, try again */
+ close (fd); /* got the lock */
+ .
+ . /* critical code here */
+ .
+ unlink ("lockfile"); /* release lock */
+
+ This form of locking makes fewer presumptions, but it still is
+guilty of presumptions (2) and (4) above. Presumption (2) limits the
+ability to have processes sharing a resource in a non-conflicting
+fashion (e.g. reading from a file). Presumption (4) leads to
+deadlocks should the process crash while it has a resource locked.
+
+ Most modern operating systems provide a resource locking system
+call that has none of these presumptions. In particular, a mechanism
+is provided for identifying shared locks as opposed to exclusive
+locks. A shared lock permits other processes to obtain a shared lock,
+but denies exclusive locks. In other words:
+
+ current state want shared want exclusive
+ ------------- ----------- --------------
+ unlocked YES YES
+ locked shared YES NO
+ locked exclusive NO NO
+
+ Furthermore, the operating system automatically relinquishes all
+locks held by that process when it terminates.
+
+ A useful operation is the ability to upgrade a shared lock to
+exclusive (provided there are no other shared users of the lock) and
+to downgrade an exclusive lock to shared. It is important that at no
+time is the lock ever removed; a process upgrading to exclusive must
+not relenquish its shared lock.
+
+ Most commonly, the resources being locked are files. Shared
+locks are particularly important with files; multiple simultaneous
+processes can read from a file, but only one can safely write at a
+time. Some writes may be safer than others; an append to the end of
+the file is safer than changing existing file data. In turn, changing
+a file record in place is safer than rewriting the file with an
+entirely different structure.
+
+
+FILE LOCKING ON UNIX
+
+ In the oldest versions of UNIX, the use of a semaphore lockfile
+was the only available form of locking. Advisory locking system calls
+were not added to UNIX until after the BSD vs. System V split. Both
+of these system calls deal with file resources only.
+
+ Most systems only have one or the other form of locking. AIX
+emulates the BSD form of locking as a jacket into the System V form.
+Ultrix and OSF/1 implement both forms.
+
+BSD
+
+ BSD added the flock() system call. It offers capabilities to
+acquire shared lock, acquire exclusive lock, and unlock. Optionally,
+the process can request an immediate error return instead of blocking
+when the lock is unavailable.
+
+
+FLOCK() BUGS
+
+ flock() advertises that it permits upgrading of shared locks to
+exclusive and downgrading of exclusive locks to shared, but it does so
+by releasing the former lock and then trying to acquire the new lock.
+This creates a window of vulnerability in which another process can
+grab the exclusive lock. Therefore, this capability is not useful,
+although many programmers have been deluded by incautious reading of
+the flock() man page to believe otherwise. This problem can be
+programmed around, once the programmer is aware of it.
+
+ flock() always returns as if it succeeded on NFS files, when in
+fact it is a no-op. There is no way around this.
+
+ Leaving aside these two problems, flock() works remarkably well,
+and has shown itself to be robust and trustworthy.
+
+SYSTEM V/POSIX
+
+ System V added new functions to the fnctl() system call, and a
+simple interface through the lockf() subroutine. This was
+subsequently included in POSIX. Both offer the facility to apply the
+lock to a particular region of the file instead of to the entire file.
+lockf() only supports exclusive locks, and calls fcntl() internally;
+hence it won't be discussed further.
+
+ Functionally, fcntl() locking is a superset of flock(); it is
+possible to implement a flock() emulator using fcntl(), with one minor
+exception: it is not possible to acquire an exclusive lock if the file
+is not open for write.
+
+ The fcntl() locking functions are: query lock station of a file
+region, lock/unlock a region, and lock/unlock a region and block until
+have the lock. The locks may be shared or exclusive. By means of the
+statd and lockd daemons, fcntl() locking is available on NFS files.
+
+ When statd is started at system boot, it reads its /etc/state
+file (which contains the number of times it has been invoked) and
+/etc/sm directory (which contains a list of all remote sites which are
+client or server locking with this site), and notifies the statd on
+each of these systems that it has been restarted. Each statd then
+notifies the local lockd of the restart of that system.
+
+ lockd receives fcntl() requests for NFS files. It communicates
+with the lockd at the server and requests it to apply the lock, and
+with the statd to request it for notification when the server goes
+down. It blocks until all these requests are completed.
+
+ There is quite a mythos about fcntl() locking.
+
+ One religion holds that fcntl() locking is the best thing since
+sliced bread, and that programs which use flock() should be converted
+to fcntl() so that NFS locking will work. However, as noted above,
+very few systems support both calls, so such an exercise is pointless
+except on Ultrix and OSF/1.
+
+ Another religion, which I adhere to, has the opposite viewpoint.
+
+
+FCNTL() BUGS
+
+ For all of the hairy code to do individual section locking of a
+file, it's clear that the designers of fcntl() locking never
+considered some very basic locking operations. It's as if all they
+knew about locking they got out of some CS textbook with not
+investigation of real-world needs.
+
+ It is not possible to acquire an exclusive lock unless the file
+is open for write. You could have append with shared read, and thus
+you could have a case in which a read-only access may need to go
+exclusive. This problem can be programmed around once the programmer
+is aware of it.
+
+ If the file is opened on another file designator in the same
+process, the file is unlocked even if no attempt is made to do any
+form of locking on the second designator. This is a very bad bug. It
+means that an application must keep track of all the files that it has
+opened and locked.
+
+ If there is no statd/lockd on the NFS server, fcntl() will hang
+forever waiting for them to appear. This is a bad bug. It means that
+any attempt to lock on a server that doesn't run these daemons will
+hang. There is no way for an application to request flock() style
+``try to lock, but no-op if the mechanism ain't there''.
+
+ There is a rumor to the effect that fcntl() will hang forever on
+local files too if there is no local statd/lockd. These daemons are
+running on mailer.u, although they appear not to have much CPU time.
+A useful experiment would be to kill them and see if imapd is affected
+in any way, but I decline to do so without an OK from UCS! ;-) If
+killing statd/lockd can be done without breaking fcntl() on local
+files, this would become one of the primary means of dealing with this
+problem.
+
+ The statd and lockd daemons have quite a reputation for extreme
+fragility. There have been numerous reports about the locking
+mechanism being wedged on a systemwide or even clusterwide basis,
+requiring a reboot to clear. It is rumored that this wedge, once it
+happens, also blocks local locking. Presumably killing and restarting
+statd would suffice to clear the wedge, but I haven't verified this.
+
+ There appears to be a limit to how many locks may be in use at a
+time on the system, although the documentation only mentions it in
+passing. On some of their systems, UCS has increased lockd's ``size
+of the socket buffer'', whatever that means.
+
+C-CLIENT USAGE
+
+ c-client uses flock(). On System V systems, flock() is simulated
+by an emulator that calls fcntl(). This emulator is provided by some
+systems (e.g. AIX), or uses c-client's flock.c module.
+
+
+BEZERK AND MMDF
+
+ Locking in the traditional UNIX formats was largely dictated by
+the status quo in other applications; however, additional protection
+is added against inadvertantly running multiple instances of a
+c-client application on the same mail file.
+
+ (1) c-client attempts to create a .lock file (mail file name with
+``.lock'' appended) whenever it reads from, or writes to, the mail
+file. This is an exclusive lock, and is held only for short periods
+of time while c-client is actually doing the I/O. There is a 5-minute
+timeout for this lock, after which it is broken on the presumption
+that it is a stale lock. If it can not create the .lock file due to
+an EACCES (protection failure) error, it once silently proceeded
+without this lock; this was for systems which protect /usr/spool/mail
+from unprivileged processes creating files. Today, c-client reports
+an error unless it is built otherwise. The purpose of this lock is to
+prevent against unfavorable interactions with mail delivery.
+
+ (2) c-client applies a shared flock() to the mail file whenever
+it reads from the mail file, and an exclusive flock() whenever it
+writes to the mail file. This lock is freed as soon as it finishes
+reading. The purpose of this lock is to prevent against unfavorable
+interactions with mail delivery.
+
+ (3) c-client applies an exclusive flock() to a file on /tmp
+(whose name represents the device and inode number of the file) when
+it opens the mail file. This lock is maintained throughout the
+session, although c-client has a feature (called ``kiss of death'')
+which permits c-client to forcibly and irreversibly seize the lock
+from a cooperating c-client application that surrenders the lock on
+demand. The purpose of this lock is to prevent against unfavorable
+interactions with other instances of c-client (rewriting the mail
+file).
+
+ Mail delivery daemons use lock (1), (2), or both. Lock (1) works
+over NFS; lock (2) is the only one that works on sites that protect
+/usr/spool/mail against unprivileged file creation. Prudent mail
+delivery daemons use both forms of locking, and of course so does
+c-client.
+
+ If only lock (2) is used, then multiple processes can read from
+the mail file simultaneously, although in real life this doesn't
+really change things. The normal state of locks (1) and (2) is
+unlocked except for very brief periods.
+
+
+TENEX AND MTX
+
+ The design of the locking mechanism of these formats was
+motivated by a design to enable multiple simultaneous read/write
+access. It is almost the reverse of how locking works with
+bezerk/mmdf.
+
+ (1) c-client applies a shared flock() to the mail file when it
+opens the mail file. It upgrades this lock to exclusive whenever it
+tries to expunge the mail file. Because of the flock() bug that
+upgrading a lock actually releases it, it will not do so until it has
+acquired an exclusive lock (2) first. The purpose of this lock is to
+prevent against expunge taking place while some other c-client has the
+mail file open (and thus knows where all the messages are).
+
+ (2) c-client applies a shared flock() to a file on /tmp (whose
+name represents the device and inode number of the file) when it
+parses the mail file. It applies an exclusive flock() to this file
+when it appends new mail to the mail file, as well as before it
+attempts to upgrade lock (1) to exclusive. The purpose of this lock
+is to prevent against data being appended while some other c-client is
+parsing mail in the file (to prevent reading of incomplete messages).
+It also protects against the lock-releasing timing race on lock (1).
+
+OBSERVATIONS
+
+ In a perfect world, locking works. You are protected against
+unfavorable interactions with the mailer and against your own mistake
+by running more than one instance of your mail reader. In tenex/mtx
+formats, you have the additional benefit that multiple simultaneous
+read/write access works, with the sole restriction being that you
+can't expunge if there are any sharers of the mail file.
+
+ If the mail file is NFS-mounted, then flock() locking is a silent
+no-op. This is the way BSD implements flock(), and c-client's
+emulation of flock() through fcntl() tests for NFS files and
+duplicates this functionality. There is no locking protection for
+tenex/mtx mail files at all, and only protection against the mailer
+for bezerk/mmdf mail files. This has been the accepted state of
+affairs on UNIX for many sad years.
+
+ If you can not create .lock files, it should not affect locking,
+since the flock() locks suffice for all protection. This is, however,
+not true if the mailer does not check for flock() locking, or if the
+the mail file is NFS-mounted.
+
+ What this means is that there is *no* locking protection at all
+in the case of a client using an NFS-mounted /usr/spool/mail that does
+not permit file creation by unprivileged programs. It is impossible,
+under these circumstances, for an unprivileged program to do anything
+about it. Worse, if EACCES errors on .lock file creation are no-op'ed
+, the user won't even know about it. This is arguably a site
+configuration error.
+
+ The problem with not being able to create .lock files exists on
+System V as well, but the failure modes for flock() -- which is
+implemented via fcntl() -- are different.
+
+ On System V, if the mail file is NFS-mounted and either the
+client or the server lacks a functioning statd/lockd pair, then the
+lock attempt would have hung forever if it weren't for the fact that
+c-client tests for NFS and no-ops the flock() emulator in this case.
+Systemwide or clusterwide failures of statd/lockd have been known to
+occur which cause all locks in all processes to hang (including
+local?). Without the special NFS test made by c-client, there would
+be no way to request BSD-style no-op behavior, nor is there any way to
+determine that this is happening other than the system being hung.
+
+ The additional locking introduced by c-client was shown to cause
+much more stress on the System V locking mechanism than has
+traditionally been placed upon it. If it was stressed too far, all
+hell broke loose. Fortunately, this is now past history.
+
+TRADEOFFS
+
+ c-client based applications have a reasonable chance of winning
+as long as you don't use NFS for remote access to mail files. That's
+what IMAP is for, after all. It is, however, very important to
+realize that you can *not* use the lock-upgrade feature by itself
+because it releases the lock as an interim step -- you need to have
+lock-upgrading guarded by another lock.
+
+ If you have the misfortune of using System V, you are likely to
+run into problems sooner or later having to do with statd/lockd. You
+basically end up with one of three unsatisfactory choices:
+ 1) Grit your teeth and live with it.
+ 2) Try to make it work:
+ a) avoid NFS access so as not to stress statd/lockd.
+ b) try to understand the code in statd/lockd and hack it
+ to be more robust.
+ c) hunt out the system limit of locks, if there is one,
+ and increase it. Figure on at least two locks per
+ simultaneous imapd process and four locks per Pine
+ process. Better yet, make the limit be 10 times the
+ maximum number of processes.
+ d) increase the socket buffer (-S switch to lockd) if
+ it is offered. I don't know what this actually does,
+ but giving lockd more resources to do its work can't
+ hurt. Maybe.
+ 3) Decide that it can't possibly work, and turn off the
+ fcntl() calls in your program.
+ 4) If nuking statd/lockd can be done without breaking local
+ locking, then do so. This would make SVR4 have the same
+ limitations as BSD locking, with a couple of additional
+ bugs.
+ 5) Check for NFS, and don't do the fcntl() in the NFS case.
+ This is what c-client does.
+
+ Note that if you are going to use NFS to access files on a server
+which does not have statd/lockd running, your only choice is (3), (4),
+or (5). Here again, IMAP can bail you out.
+
+ These problems aren't unique to c-client applications; they have
+also been reported with Elm, Mediamail, and other email tools.
+
+ Of the other two SVR4 locking bugs:
+
+ Programmer awareness is necessary to deal with the bug that you
+can not get an exclusive lock unless the file is open for write. I
+believe that c-client has fixed all of these cases.
+
+ The problem about opening a second designator smashing any
+current locks on the file has not been addressed satisfactorily yet.
+This is not an easy problem to deal with, especially in c-client which
+really doesn't know what other files/streams may be open by Pine.
+
+ Aren't you so happy that you bought an System V system?