Whamcloud - gitweb
LU-18260 o2iblnd: fix race between REJ vs kiblnd_connd
This patch fixes a possible race between CM_EVENT_REJECTED and
kiblnd_connd().
kiblnd_connd() set connection state to IBLND_CONN_DISCONNECTED
before removing the QP. So if CM_EVENT_REJECTED is received in this
time windows, it will cause the following crash:
Workqueue: ib_cm cm_work_handler [ib_cm]
all Trace:
<TASK>
dump_stack_lvl+0x34/0x48
panic+0x100/0x2d2
lbug_with_loc.cold+0x18/0x18 [libcfs]
kiblnd_cm_callback+0x108d/0x10b0 [ko2iblnd]
cma_cm_event_handler+0x1e/0xb0 [rdma_cm]
cma_ib_handler+0x8d/0x2e0 [rdma_cm]
cm_process_work+0x22/0x190 [ib_cm]
cm_rej_handler+0xdf/0x260 [ib_cm]
cm_work_handler+0x47f/0x4d0 [ib_cm]
process_one_work+0x1e8/0x390
worker_thread+0x53/0x3d0
kthread+0x124/0x150
ret_from_fork+0x1f/0x30
</TASK>
Test-Parameters: trivial testlist=sanity-lnet
Fixes: 0b8c18d ("LU-17480 o2iblnd: add a timeout for rdma_connect")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I2d04433eb51e1a6862b788a89e127d8abb24b8a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56518
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>