Whamcloud - gitweb
LU-17484 gss: reply error for SEC_CTX_INIT on wrong node 70/53970/6
authorSebastien Buisson <sbuisson@ddn.com>
Thu, 8 Feb 2024 12:44:21 +0000 (13:44 +0100)
committerOleg Drokin <green@whamcloud.com>
Fri, 23 Feb 2024 07:17:19 +0000 (07:17 +0000)
commit3d635dd3f24421c181aca5673cd81ed8f3e2c622
tree8f4c805935bb748ef8aa832141dbf00c8ccd3fd4
parent02caf7170762d97dac4f367651addc7d90b6eb32
LU-17484 gss: reply error for SEC_CTX_INIT on wrong node

When a server receives a SEC_CTX_INIT request for a target that is not
available (either stopping, or not set up yet, or moved to a failover
node), the request gets dropped. This makes the client-side RPC time
out, increasing the time it takes to establish a proper gss context
with the target, because it slows down the HA mechanism that tries
alternate failover NIDs.
Instead of dropping the request reply for SEC_CTX_INIT, the server
needs to send back a proper error reply. The client will then be able
to immediately try alternate failover NIDs, speeding mount/reconnect
process up, and avoiding potential eviction.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id2cefaa7d54729a63c7be13b65d7ace579bcaa78
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53970
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
lustre/ldlm/ldlm_lib.c
lustre/ptlrpc/gss/lproc_gss.c
lustre/ptlrpc/gss/sec_gss.c
lustre/ptlrpc/import.c
lustre/utils/gss/svcgssd_proc.c