Whamcloud - gitweb
LU-18260 o2iblnd: fix race between REJ vs kiblnd_connd 18/56518/3
authorEtienne AUJAMES <etienne.aujames@cea.fr>
Fri, 27 Sep 2024 14:50:15 +0000 (16:50 +0200)
committerOleg Drokin <green@whamcloud.com>
Sun, 24 Nov 2024 06:08:52 +0000 (06:08 +0000)
commitc37f55ebb7f337b31ab8caf914dfb064feeca2d3
tree604fce75396a5158d479a59ddad2bc01d51f9891
parent3177b0dc5d18a8e3d77eb10fa6f266ff83cbe222
LU-18260 o2iblnd: fix race between REJ vs kiblnd_connd

This patch fixes a possible race between CM_EVENT_REJECTED and
kiblnd_connd().

kiblnd_connd() set connection state to IBLND_CONN_DISCONNECTED
before removing the QP. So if CM_EVENT_REJECTED is received in this
time windows, it will cause the following crash:

Workqueue: ib_cm cm_work_handler [ib_cm]
all Trace:
<TASK>
dump_stack_lvl+0x34/0x48
panic+0x100/0x2d2
lbug_with_loc.cold+0x18/0x18 [libcfs]
kiblnd_cm_callback+0x108d/0x10b0 [ko2iblnd]
cma_cm_event_handler+0x1e/0xb0 [rdma_cm]
cma_ib_handler+0x8d/0x2e0 [rdma_cm]
cm_process_work+0x22/0x190 [ib_cm]
cm_rej_handler+0xdf/0x260 [ib_cm]
cm_work_handler+0x47f/0x4d0 [ib_cm]
process_one_work+0x1e8/0x390
worker_thread+0x53/0x3d0
kthread+0x124/0x150
ret_from_fork+0x1f/0x30
</TASK>

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 0b8c18d ("LU-17480 o2iblnd: add a timeout for rdma_connect")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I2d04433eb51e1a6862b788a89e127d8abb24b8a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/56518
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
lnet/klnds/o2iblnd/o2iblnd_cb.c