Whamcloud - gitweb
LU-16159 lod: cancel update llogs upon recovery abort
authorLai Siyao <lai.siyao@whamcloud.com>
Sun, 28 Aug 2022 18:35:25 +0000 (14:35 -0400)
committerAndreas Dilger <adilger@whamcloud.com>
Sat, 21 Jan 2023 02:50:56 +0000 (02:50 +0000)
commit5adc8261b062e58a3eb2fbe313ce68769c445da5
tree9d5809e9b5a2eb869dfe5b4363d07c2811046666
parente9f98228224edcf06095ae733361ab6560950fdb
LU-16159 lod: cancel update llogs upon recovery abort

If recovery is aborted, cancel update catalog from catlist, and keep
them on disk for some time (for debug purpose), as can avoid
accumulating stale update records, and also avoid recovery problems
if update llogs are corrupt.

Update llogs are canceled after recovery completes and before regular
request processing. For these logs, their ctime will be set, and log
header will be marked with LLOG_F_MAX_AGE|LLOG_F_RM_ON_ERR, and when
30 days passed, they will be removed automatically.

Tidy up recovery abort code:
* if obd_abort_recovery is set, or OBD is stopping, stop both
  client recovery and MDT recovery.
* otherwise if obd_abort_mdt_recovery is set, stop MDT recovery only.

lctl llog_print support printing update log FIDs used by specified
MDT:
* "lctl --device <MDT> llog_print update_log" will list all update
  llog FIDs used by this MDT device.

Disabled replay-single.sh 100c stripe check because abort_recovery
will cancel update llogs, and won't replay them upon next recovery.

Added replay-single.sh 100d.

Formatall in the end of replay-single.sh because directory unlink may
fail.

Lustre-change: https://review.whamcloud.com/48584
Lustre-commit: b054fcd7852f6a22f8ec469ce94ddf6f3331ab34

Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie2bda6c097d65f5c51cba66c2dbf6ae4a5d36dda
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49403
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
12 files changed:
lustre/include/lustre_log.h
lustre/include/obd.h
lustre/include/uapi/linux/lustre/lustre_idl.h
lustre/ldlm/ldlm_lib.c
lustre/lod/lod_dev.c
lustre/mdd/mdd_device.c
lustre/mdt/mdt_handler.c
lustre/obdclass/dt_object.c
lustre/obdclass/llog.c
lustre/obdclass/llog_cat.c
lustre/obdclass/llog_ioctl.c
lustre/tests/replay-single.sh