Whamcloud - gitweb
LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted 36/41936/3
authorYang Sheng <ys@whamcloud.com>
Mon, 8 Mar 2021 14:53:13 +0000 (22:53 +0800)
committerOleg Drokin <green@whamcloud.com>
Sat, 13 Mar 2021 18:34:26 +0000 (18:34 +0000)
The request may be referenced by other target even the threads
of service were stopped. It caused by some portal shared among
different services. Just wait the request to be released as a
workaround.

LustreError: (service.c::ptlrpc_service_purge_all())
ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed:
LustreError: (service.c::ptlrpc_service_purge_all()) LBUG
Pid: 21, comm: umount 3.10.0 #1 SMP
Call Trace:
 [<a01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<a01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<a0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
 [<a005e122>] ost_cleanup+0x82/0x1b0 [ost]
 [<a08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
 [<a08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
 [<a08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
 [<a08f8030>] class_decref+0x80/0x160 [obdclass]
 [<a08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
 [<a08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
 [<a08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
 [<a092a115>] server_stop_servers+0xd5/0x160 [obdclass]
 [<a092f6c6>] server_put_super+0x126/0xca0 [obdclass]
 [<8121068a>] generic_shutdown_super+0x6a/0xf0
 [<81210a62>] kill_anon_super+0x12/0x20
 [<a09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
 [<81210e59>] deactivate_locked_super+0x49/0x60
 [<812115a6>] deactivate_super+0x46/0x60
 [<8123019f>] cleanup_mnt+0x3f/0x80
 [<81230232>] __cleanup_mnt+0x12/0x20
 [<810ab085>] task_work_run+0xb5/0xf0
 [<8102ac12>] do_notify_resume+0x92/0xb0
 [<81783c83>] int_signal+0x12/0x17
 Kernel panic - not syncing: LBUG

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Idfb19df123ceae177a0e447e9344bac6861166bf
Reviewed-on: https://review.whamcloud.com/41936
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ptlrpc/service.c

index 1086f9c..acb23bc 100644 (file)
@@ -3485,7 +3485,23 @@ ptlrpc_service_purge_all(struct ptlrpc_service *svc)
                        ptlrpc_server_finish_active_request(svcpt, req);
                }
 
                        ptlrpc_server_finish_active_request(svcpt, req);
                }
 
-               LASSERT(list_empty(&svcpt->scp_rqbd_posted));
+               /*
+                * The portal may be shared by several services (eg:OUT_PORTAL).
+                * So the request could be referenced by other target. So we
+                * have to wait the ptlrpc_server_drop_request invoked.
+                *
+                * TODO: move the req_buffer as global rather than per service.
+                */
+               spin_lock(&svcpt->scp_lock);
+               while (!list_empty(&svcpt->scp_rqbd_posted)) {
+                       spin_unlock(&svcpt->scp_lock);
+                       wait_event_idle_timeout(svcpt->scp_waitq,
+                               list_empty(&svcpt->scp_rqbd_posted),
+                               cfs_time_seconds(1));
+                       spin_lock(&svcpt->scp_lock);
+               }
+               spin_unlock(&svcpt->scp_lock);
+
                LASSERT(svcpt->scp_nreqs_incoming == 0);
                LASSERT(svcpt->scp_nreqs_active == 0);
                /*
                LASSERT(svcpt->scp_nreqs_incoming == 0);
                LASSERT(svcpt->scp_nreqs_active == 0);
                /*