Whamcloud - gitweb
LU-11020 osp: fix race during lov_objids update 67/32867/11
authorAlexey Lyashkov <c17817@cray.com>
Wed, 1 Aug 2018 15:52:28 +0000 (18:52 +0300)
committerOleg Drokin <green@whamcloud.com>
Sat, 17 Nov 2018 01:25:48 +0000 (01:25 +0000)
commit8cd4760536d7f423db87c67bdc8214f13ede3ca8
tree50b03a2641829c7a7fbdf8838447f6d10f5990b3
parentebf742028b57a88817b26d6fb7748110ec15d31c
LU-11020 osp: fix race during lov_objids update

First thread can be delayed due to reading from disk, so it
will completed after second thread and overwrite the on-disk
lov_objids data with an older OID for that OST.

If the transaction commits during this window and then the
MDS crashes, it is possible that the stale lov_objids results
in an OST object being deleted during MDS->OSS recovery that
should have been kept.

Use a single buffer shared between threads to store lov_objids
so that even if multiple threads are updating the lov_objids
file at once, the latest OID will be written to disk even if
the threads commit their transactions out of order.

Cray-bug-id: LUS-5841
Change-Id: I0984e5f55d569260c1219bf87c82423cc5b8589b
Signed-off-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-on: https://review.whamcloud.com/32867
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/osp/osp_dev.c
lustre/osp/osp_internal.h
lustre/osp/osp_object.c
lustre/osp/osp_precreate.c