Whamcloud - gitweb
LU-12485 obdclass: 0-nlink race in lu_object_find_at() 60/35360/11
authorLai Siyao <lai.siyao@whamcloud.com>
Fri, 28 Jun 2019 12:19:56 +0000 (20:19 +0800)
committerOleg Drokin <green@whamcloud.com>
Thu, 15 Aug 2019 07:54:08 +0000 (07:54 +0000)
commit2ff420913b9718ee8d80ae51fddc6e5df4a3148a
tree2df252b21b22382d487822700f29f2e314336792
parent7e0cba246a7f2408c8266574a657e4459f691570
LU-12485 obdclass: 0-nlink race in lu_object_find_at()

There is a race in lu_object_find_at: in the gap between
lu_object_alloc() and hash insertion, another thread may
have allocated another object for the same file and unlinked
it, so we may get an object with 0-nlink, which will trigger
assertion in osd_object_release().

To avoid such race, initialize object after hash insertion.
But this may cause an unitialized object found in cache, if
so, wait for the object initialized by the allocator.

To reproduce the race, introduced cfs_race_wait() and
cfs_race_wakeup(): cfs_race_wait() will cause the thread that
calls it wait on the race; while cfs_race_wakeup() will wake
up the waiting thread. Same as cfs_race(), CFS_FAIL_ONCE
should be set together with fail_loc.

Add sanityn test_84.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0869f254544256987b73f0ff92f75e4d1562e566
Reviewed-on: https://review.whamcloud.com/35360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
libcfs/include/libcfs/libcfs_fail.h
lustre/include/lu_object.h
lustre/include/obd_support.h
lustre/mdt/mdt_reint.c
lustre/obdclass/lu_object.c
lustre/tests/sanityn.sh