Whamcloud - gitweb
LU-6146 tests: race condition for check/use cfs_fail_val 81/13481/9
authorFan Yong <fan.yong@intel.com>
Tue, 4 Nov 2014 08:02:22 +0000 (16:02 +0800)
committerOleg Drokin <oleg.drokin@intel.com>
Sun, 25 Jan 2015 01:53:26 +0000 (01:53 +0000)
commitf6ef1b797f2f6b28e7c5860b6cf16759cadfc9a4
tree6eb58ad530cdddccb4650916a11c62bb145e8128
parent97df2f4cae374130c057cbf1168ad1427c96cbc5
LU-6146 tests: race condition for check/use cfs_fail_val

There are some race conditions when check/use cfs_fail_val.
For example: when inject failure stub for LFSCK test as following:

764   if (OBD_FAIL_CHECK(OBD_FAIL_LFSCK_DELAY2) &&
765       cfs_fail_val > 0) {
766           struct l_wait_info lwi;
767
768           lwi = LWI_TIMEOUT(cfs_time_seconds(cfs_fail_val),
769                             NULL, NULL);
770           l_wait_event(thread->t_ctl_waitq,
771                        !thread_is_running(thread),
772                        &lwi);
773
774           if (unlikely(!thread_is_running(thread))) {
775                   CDEBUG(D_LFSCK, "%s: scan dir exit for engine "
776                          "stop, parent "DFID", cookie "LPX64"n",
777                          lfsck_lfsck2name(lfsck),
778                          PFID(lfsck_dto2fid(dir)),
779                          lfsck->li_cookie_dir);
780                   RETURN(0);
781           }
782   }

The "cfs_fail_val" may be changed as zero by others after the check
at the line 765 but before using it at the line 768. Then the LFSCK
engine will fall into "wait" until someone run "lfsck_stop".

Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I418621faaf6a1f42ba1d541b37374c1dc21831be
Reviewed-on: http://review.whamcloud.com/13481
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
libcfs/libcfs/fail.c
lustre/lfsck/lfsck_engine.c
lustre/lfsck/lfsck_layout.c
lustre/lfsck/lfsck_namespace.c
lustre/lfsck/lfsck_striped_dir.c
lustre/osd-ldiskfs/osd_scrub.c