Whamcloud - gitweb
LU-13974 llog: check stale osp object 62/46862/2
authorAlexander Boyko <c17825@cray.com>
Tue, 24 Nov 2020 05:34:11 +0000 (00:34 -0500)
committerOleg Drokin <green@whamcloud.com>
Thu, 5 May 2022 06:10:01 +0000 (06:10 +0000)
commit6ead672b83d9f12f9760a4af0958ef5e2b9de082
tree04ce25f5170c23524b7ee23b90bce1e35268f0c8
parent3bd965a786afb04317acd1d8eb1708e594a1fc91
LU-13974 llog: check stale osp object

The logic of osp_attr_get has 2 path,
1) return attributes from a cache for health osp object
2) make an out update request and return attributes for stale
osp object, object lose stale state.

When some out update request with llog writes failed, osp object
become stale. But llog handle stay inconsistent (bitmap,count,
last_index), and a next llog_add->llog_osd_write_rec do dt_attr_get,
gets attributes and makes osp object valid, and uses wrong llog
handle data. The result is index jump at llog file - recX, recX+2.
And it makes an error during update log processing if failover take
a place.
The fix adds dt_object_stale function to check osp_object.
llog_osd_write_rec check it and return ESTALE. llog_add would fail
with ESTALE error and doesn't corrupt update log.

Lustre-change: https://review.whamcloud.com/40742
Lustre-commit: 82c6e42d6137f39a1f2394b7bc6e8d600eb36181

HPE-bug-id: LUS-9030
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iadf53fd816e1c5bde0a19d4c537f0408796c864a
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46862
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/dt_object.h
lustre/obdclass/llog_osd.c
lustre/osd-ldiskfs/osd_handler.c
lustre/osd-zfs/osd_object.c
lustre/osp/osp_internal.h
lustre/osp/osp_md_object.c
lustre/osp/osp_object.c