Whamcloud - gitweb
LU-17365 lod: handle llog errors gracefuly
Distinguish remote llog errors by their source and type
in LOD lod_sub_prep_llog() and uniform errors reported
by llog_osd_read_header() and llog_init_handle.
- Partial llog header or 0-size llog is to be
reinitialized, new header is created
- in llog_read_header() dt_attr_get() and read_header()
thier errors are printed and returned as -EIO to caller
- llog with invalid llog header data is skipped and new
one is created to be used instead. To indicate that
the llog_init_handle() returns -EINVAL error code instead
of -EIO. Therefore network errors are to be handled by
lod_sub_recovery_thread() retry logic while bad llog
content will lead to immediate llog re-creation.
- lod_sub_init_llogs() tries to init all targets even
if some failed
- always recreate llogs after recovery abort no matter
if ctxt->loc_handle exists or not
Patch tries to cover known issues and types of error during
update log recovery and provides also better debug for
similar cases in future
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2705e0dc245ed4365123ce47137193a9ed769673
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>