Whamcloud - gitweb
LU-15938 lod: prevent endless retry in recovery thread
- abort lod_sub_recovery_thread() by obd_abort_recov_mdt in
addition to obd_abort_recovery
- handle 'short llog' situation gracefully, when remote llog
is shorter than local copy header expects, trust remote llog
data and consider llog processing as finished
- on other errors during remote llog read, set obd_abort_recov_mdt
but not obd_abort_recovery in attempt to skip MDT-MDT recovery
only and continue with client recovery while possible
- fix parsing problem with 'abort_recov' and 'abort_recov_mdt' in
lmd_parse() causing no MDT recovery abort but client recovery
abort always. Allow also 'abort_recovery_mdt' mount option name
The original case with endless retry is caused by such de-sync
between local llog structures and remote llog. The local llog
header says there is record with some ID, so recovery thread
is trying to get that record from remote llog. Meanwhile there
is no such record on remote server, so it reads whole llog and
return it back properly but llog processing consider that as
incomplete llog due to network issues and retry endlessly.
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib127fd0d1abd5289d90c7b4b3ca74ab6fc78bc71
Reviewed-on: https://review.whamcloud.com/47698
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>