From: Yang Sheng Date: Mon, 8 Mar 2021 14:53:13 +0000 (+0800) Subject: LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted X-Git-Tag: 2.14.51~28 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=b635a0435d13d8431a8344735322b84cb4613b68 LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted The request may be referenced by other target even the threads of service were stopped. It caused by some portal shared among different services. Just wait the request to be released as a workaround. LustreError: (service.c::ptlrpc_service_purge_all()) ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed: LustreError: (service.c::ptlrpc_service_purge_all()) LBUG Pid: 21, comm: umount 3.10.0 #1 SMP Call Trace: [] libcfs_call_trace+0x8c/0xc0 [libcfs] [] lbug_with_loc+0x4c/0xa0 [libcfs] [] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc] [] ost_cleanup+0x82/0x1b0 [ost] [] class_free_dev+0x1ca/0x630 [obdclass] [] class_export_put+0x1e0/0x2b0 [obdclass] [] class_unlink_export+0x135/0x170 [obdclass] [] class_decref+0x80/0x160 [obdclass] [] class_detach+0x1b1/0x2e0 [obdclass] [] class_process_config+0x1a91/0x2820 [obdclass] [] class_manual_cleanup+0x1e0/0x6d0 [obdclass] [] server_stop_servers+0xd5/0x160 [obdclass] [] server_put_super+0x126/0xca0 [obdclass] [<8121068a>] generic_shutdown_super+0x6a/0xf0 [<81210a62>] kill_anon_super+0x12/0x20 [] lustre_kill_super+0x32/0x50 [obdclass] [<81210e59>] deactivate_locked_super+0x49/0x60 [<812115a6>] deactivate_super+0x46/0x60 [<8123019f>] cleanup_mnt+0x3f/0x80 [<81230232>] __cleanup_mnt+0x12/0x20 [<810ab085>] task_work_run+0xb5/0xf0 [<8102ac12>] do_notify_resume+0x92/0xb0 [<81783c83>] int_signal+0x12/0x17 Kernel panic - not syncing: LBUG Signed-off-by: Yang Sheng Change-Id: Idfb19df123ceae177a0e447e9344bac6861166bf Reviewed-on: https://review.whamcloud.com/41936 Tested-by: jenkins Reviewed-by: Andreas Dilger Tested-by: Maloo Reviewed-by: Bobi Jam Reviewed-by: Oleg Drokin --- diff --git a/lustre/ptlrpc/service.c b/lustre/ptlrpc/service.c index 1086f9c..acb23bc 100644 --- a/lustre/ptlrpc/service.c +++ b/lustre/ptlrpc/service.c @@ -3485,7 +3485,23 @@ ptlrpc_service_purge_all(struct ptlrpc_service *svc) ptlrpc_server_finish_active_request(svcpt, req); } - LASSERT(list_empty(&svcpt->scp_rqbd_posted)); + /* + * The portal may be shared by several services (eg:OUT_PORTAL). + * So the request could be referenced by other target. So we + * have to wait the ptlrpc_server_drop_request invoked. + * + * TODO: move the req_buffer as global rather than per service. + */ + spin_lock(&svcpt->scp_lock); + while (!list_empty(&svcpt->scp_rqbd_posted)) { + spin_unlock(&svcpt->scp_lock); + wait_event_idle_timeout(svcpt->scp_waitq, + list_empty(&svcpt->scp_rqbd_posted), + cfs_time_seconds(1)); + spin_lock(&svcpt->scp_lock); + } + spin_unlock(&svcpt->scp_lock); + LASSERT(svcpt->scp_nreqs_incoming == 0); LASSERT(svcpt->scp_nreqs_active == 0); /*