Whamcloud - gitweb
LU-15938 lod: prevent endless retry in recovery thread
authorMikhail Pershin <mpershin@whamcloud.com>
Wed, 22 Jun 2022 10:27:48 +0000 (13:27 +0300)
committerAndreas Dilger <adilger@whamcloud.com>
Fri, 15 Jul 2022 04:25:21 +0000 (04:25 +0000)
commit3b95b7ad1e2117e440a75f6f439ab6595ad1ac85
treeb381e355316cee289f8a7bc99f20428f71ab5774
parent954e00b84f1ee8a39225d8bc58708402fb9cd773
LU-15938 lod: prevent endless retry in recovery thread

- abort lod_sub_recovery_thread() by obd_abort_recov_mdt in
  addition to obd_abort_recovery
- handle 'short llog' situation gracefully, when remote llog
  is shorter than local copy header expects, trust remote llog
  data and consider llog processing as finished
- on other errors during remote llog read, set obd_abort_recov_mdt
  but not obd_abort_recovery in attempt to skip MDT-MDT recovery
  only and continue with client recovery while possible
- fix parsing problem with 'abort_recov' and 'abort_recov_mdt' in
  lmd_parse() causing no MDT recovery abort but client recovery
  abort always. Allow also 'abort_recovery_mdt' mount option name

The original case with endless retry is caused by such de-sync
between local llog structures and remote llog. The local llog
header says there is record with some ID, so recovery thread
is trying to get that record from remote llog. Meanwhile there
is no such record on remote server, so it reads whole llog and
return it back properly but llog processing consider that as
incomplete llog due to network issues and retry endlessly.

Lustre-change: https://review.whamcloud.com/47698
Lustre-commit: TBD (b57b8f126e0fe00e20b1a6c3164fdf902baf91c0)

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib127fd0d1abd5289d90c7b4b3ca74ab6fc78bc71
Reviewed-on: https://review.whamcloud.com/47889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lustre/lod/lod_dev.c
lustre/mdt/mdt_handler.c
lustre/obdclass/llog_osd.c
lustre/obdclass/obd_mount.c