Whamcloud - gitweb
LU-17142 mgc: reconnection without pinger 98/52498/5
authorAlexander Boyko <alexander.boyko@hpe.com>
Tue, 22 Aug 2023 09:53:14 +0000 (05:53 -0400)
committerOleg Drokin <green@whamcloud.com>
Sat, 18 Nov 2023 21:41:53 +0000 (21:41 +0000)
commit867ba433e3a0fce4a1b2f8d37a91d550ada41a26
tree59f0ae125a4385664e43d8dc460bd6e542207646
parent0e6e60b1233b08952c338b2c4f121ef749a99f8b
LU-17142 mgc: reconnection without pinger

When MGS was offline for some time, AT is increased and
connection request deadline is high. Reconnect with a pinger
waits a request deadline for a next attempt. A situation is
worse with a failover partner, when different connections are used.
Reconnection could fail with local MGS too.

Here is the error when MGC could not connect to a local MGS, MDT
combined with MGS.

LustreError: 15c-8: MGC90@kfi:
Confguration from log kjlmo12-MDT0000 failed from MGS -5.

The patch forces reconnection with import invalidate and aborts
inflight requests.

ptlrpc_recover_import() aborts waiting for disconnect import state.
But disconnect happens between connection attempt and it is valid.
This is fixed.

Reset Adaptive Timeout when local MGS starts. It allows MGC to
reconnect efficiently.

mgs_barrier_gl_interpret_reply() should handle EINVAL from a client,
it means client don't have a lock.

HPE-bug-id: LUS-11633
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ie631e04fb3e72900af076cf7f268f20f7b285445
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/mgc/mgc_request.c
lustre/mgs/mgs_barrier.c
lustre/ptlrpc/import.c
lustre/ptlrpc/pinger.c
lustre/ptlrpc/ptlrpc_internal.h
lustre/ptlrpc/recover.c