Whamcloud - gitweb
LU-15616 lnet: ln_api_mutex deadlocks
authorChris Horn <chris.horn@hpe.com>
Mon, 7 Mar 2022 17:03:50 +0000 (11:03 -0600)
committerAndreas Dilger <adilger@whamcloud.com>
Fri, 23 Sep 2022 16:42:17 +0000 (16:42 +0000)
commit4b14d9fb00214abffcbe96a5c0759108d194cbf8
treef44a388b286c795577800aadd68c02e984a088e2
parented028f9bda5a4c6185fdca0a0952577237d0f428
LU-15616 lnet: ln_api_mutex deadlocks

LNetNIFini() acquires the ln_api_mutex and holds onto it throughout
various shutdown routines. Meanwhile, LND threads (via
lnet_nid2peerni_locked()) or the discovery thread (via
lnet_peer_data_present()) may need to acquire this mutex in order to
progress.

Address these potential deadlocks by setting the_lnet.ln_state to
LNET_STATE_STOPPING earlier in LNetNIFini(), and release the mutex
prior to any call into LND module or before any wait.

LNetNIInit() is modified to return -ESHUTDOWN if it finds that there
is a concurrent shutdown in progress.

Lustre-change: https://review.whamcloud.com/46727
Lustre-commit: 22de0bd145b649768b16dd42559d326af3c13200

Test-Parameters: trivial
HPE-bug-id: LUS-10681
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia8b28cc95ff71e66a0f99aed4f2c22ec9d44ce1e
Reviewed-on: https://review.whamcloud.com/48384
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lnet/lnet/api-ni.c
lnet/lnet/lib-move.c
lnet/lnet/peer.c