Whamcloud - gitweb
LU-12485 obdclass: 0-nlink race in lu_object_find_at() 34/35834/2
authorLai Siyao <lai.siyao@whamcloud.com>
Fri, 28 Jun 2019 12:19:56 +0000 (20:19 +0800)
committerOleg Drokin <green@whamcloud.com>
Thu, 12 Sep 2019 03:49:35 +0000 (03:49 +0000)
commitc4a91e08b1e1452e950037c135dfe9f6cf7a7c30
tree6ddd47eb21b99cfd515af04f6979cab0c5311f6b
parentb17f4ae323ac35d38a3f89004a46603d0932d767
LU-12485 obdclass: 0-nlink race in lu_object_find_at()

There is a race in lu_object_find_at: in the gap between
lu_object_alloc() and hash insertion, another thread may
have allocated another object for the same file and unlinked
it, so we may get an object with 0-nlink, which will trigger
assertion in osd_object_release().

To avoid such race, initialize object after hash insertion.
But this may cause an unitialized object found in cache, if
so, wait for the object initialized by the allocator.

To reproduce the race, introduced cfs_race_wait() and
cfs_race_wakeup(): cfs_race_wait() will cause the thread that
calls it wait on the race; while cfs_race_wakeup() will wake
up the waiting thread. Same as cfs_race(), CFS_FAIL_ONCE
should be set together with fail_loc.

Add sanityn test_84.

Lustre-change: https://review.whamcloud.com/35360
Lustre-commit: 2ff420913b9718ee8d80ae51fddc6e5df4a3148a

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0869f254544256987b73f0ff92f75e4d1562e566
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/35834
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
libcfs/include/libcfs/libcfs_fail.h
lustre/include/lu_object.h
lustre/include/obd_support.h
lustre/mdt/mdt_reint.c
lustre/obdclass/lu_object.c
lustre/tests/sanityn.sh