Whamcloud - gitweb
LU-15616 lnet: ln_api_mutex deadlocks 27/46727/7
authorChris Horn <chris.horn@hpe.com>
Mon, 7 Mar 2022 17:03:50 +0000 (11:03 -0600)
committerOleg Drokin <green@whamcloud.com>
Sun, 3 Apr 2022 16:08:52 +0000 (16:08 +0000)
commit22de0bd145b649768b16dd42559d326af3c13200
treebfa84161c1cf3917ef7b3c9c3f71eecf323c2695
parent3e3f70eb1ec95f32d9a97795d7fdf02cca82b5a0
LU-15616 lnet: ln_api_mutex deadlocks

LNetNIFini() acquires the ln_api_mutex and holds onto it throughout
various shutdown routines. Meanwhile, LND threads (via
lnet_nid2peerni_locked()) or the discovery thread (via
lnet_peer_data_present()) may need to acquire this mutex in order to
progress.

Address these potential deadlocks by setting the_lnet.ln_state to
LNET_STATE_STOPPING earlier in LNetNIFini(), and release the mutex
prior to any call into LND module or before any wait.

LNetNIInit() is modified to return -ESHUTDOWN if it finds that there
is a concurrent shutdown in progress.

HPE-bug-id: LUS-10681
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia8b28cc95ff71e66a0f99aed4f2c22ec9d44ce1e
Reviewed-on: https://review.whamcloud.com/46727
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/api-ni.c
lnet/lnet/lib-move.c
lnet/lnet/peer.c