Whamcloud - gitweb
LU-11765 ofd: return EAGAIN during 1st CLEANUP_ORPHAN 36/33836/6
authorSergey Cheremencev <c17829@cray.com>
Wed, 24 Oct 2018 10:23:43 +0000 (13:23 +0300)
committerOleg Drokin <green@whamcloud.com>
Fri, 15 Mar 2019 23:14:37 +0000 (23:14 +0000)
commitdc52a88cde1e7cea093b25fc9a15509fe0ac527a
tree799d9ebb4dd66b3fb318c85e2f4cc274976517ea
parent75d6bd8875010a7e49b77c6913835e6a65f61a18
LU-11765 ofd: return EAGAIN during 1st CLEANUP_ORPHAN

During the 1st CLEANUP_ORPHAN after failover some objects
could absent - they haven't been recreated yet. Issue exists
when MDS last_id much grater than OST last_id and ofd should
recreate thousands of objects. Some of these objects could
be assigned to a FID and requested by client through
glimpse RPC. Thus if object is not found return EAGAIN instead
of ENOENT during the 1st CLEANUP_ORPHAN.

Patch is also adding a test to reproduce the issue.
Test adds a delay to osd_trans_commit_cb() causing
large number OST objects not written to the disk
after failover. And checks that all objects have been
successfully recreated after failover.
The test works only with FAILURE_MODE=HARD option.

Cray-bug-id: LUS-6414
Change-Id: Ia6899b4c1c35e1681f49faf1cb93a501ad159ec2
Signed-off-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-on: https://es-gerrit.dev.cray.com/154151
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/33836
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd_support.h
lustre/ofd/ofd_dev.c
lustre/ofd/ofd_fs.c
lustre/ofd/ofd_internal.h
lustre/ofd/ofd_lvb.c
lustre/osd-ldiskfs/osd_handler.c
lustre/tests/replay-ost-single.sh