Whamcloud - gitweb
LU-8760 lib: avoid unexpected out of order execution 22/28322/4
authorFan Yong <fan.yong@intel.com>
Fri, 4 Nov 2016 01:04:39 +0000 (09:04 +0800)
committerJohn L. Hammond <john.hammond@intel.com>
Wed, 6 Sep 2017 16:31:26 +0000 (16:31 +0000)
commit9785fb53d0c939b2d94a69a580bdf0b6d968a25e
tree023711c60be7ff611d1fa9647b795e2e164fdc62
parent8e2cd001a9640c5e9959341c5af6da680c609eee
LU-8760 lib: avoid unexpected out of order execution

There is race condtion in __l_wait_event() because of the
out-of-order execution between changing thread state and
checking condition. It may block the thread (to be waken)
for ever. Consider the following real execution order:

1. Thread1 checks condition on CPU1, gets false.
2. Thread2 sets condition on CPU2.
3. Thread2 calls wake_up() on CPU2 to wake the threads with
   state TASK_INTERRUPTIBLE | TASK_UNINTERRUPTIBLE. But the
   Thread1'sstate is TASK_RUNNING at that time.
4. Thread1 sets its state as TASK_INTERRUPTIBLE on CPU1,
   then schedule.

If the '__timeout' variable is zero, the Thread1 will have
no chance to check the condition again.

Generally, the interval between out-of-ordered step1 and step4
is very tiny, as to above step2 and step3 cannot happen. On some
degree, it can explain why we seldom hit related trouble. But
such race really exists, especially consider that the step1 and
step4 can be interruptible.

The patch adds barrier between changing thread's state and
checking condition to avoid out-of-order execution.

Lustre-change: https://review.whamcloud.com/23564
Lustre-commit: c2b6030e9217e54e7153c0a33cce0c2ea4afa54c

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I32caee6b332f037d864419ea8728112da563cce0
Reviewed-by: Yang Sheng <yang.sheng@intel.com>
Reviewed-by: Dmitry Eremin <dmitry.eremin@intel.com>
Signed-off-by: Minh Diep <minh.diep@intel.com>
Reviewed-on: https://review.whamcloud.com/28322
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
lustre/include/lustre_lib.h