Whamcloud - gitweb
LU-11289 ptlrpc: fix ASSERTION on scp_rqbd_posted
authorYang Sheng <ys@whamcloud.com>
Mon, 8 Mar 2021 14:53:13 +0000 (22:53 +0800)
committerLi Xi <lixi@ddn.com>
Wed, 28 Apr 2021 13:41:45 +0000 (13:41 +0000)
The request may be referenced by other target even the threads
of service were stopped. It caused by some portal shared among
different services. Just wait the request to be released as a
workaround.

LustreError: (service.c::ptlrpc_service_purge_all())
ASSERTION( list_empty(&svcpt->scp_rqbd_posted) ) failed:
LustreError: (service.c::ptlrpc_service_purge_all()) LBUG
Pid: 21, comm: umount 3.10.0 #1 SMP
Call Trace:
 [<a01c47dc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
 [<a01c488c>] lbug_with_loc+0x4c/0xa0 [libcfs]
 [<a0b534dd>] ptlrpc_unregister_service+0xced/0xd90 [ptlrpc]
 [<a005e122>] ost_cleanup+0x82/0x1b0 [ost]
 [<a08e0bfa>] class_free_dev+0x1ca/0x630 [obdclass]
 [<a08e1240>] class_export_put+0x1e0/0x2b0 [obdclass]
 [<a08e2cc5>] class_unlink_export+0x135/0x170 [obdclass]
 [<a08f8030>] class_decref+0x80/0x160 [obdclass]
 [<a08f8481>] class_detach+0x1b1/0x2e0 [obdclass]
 [<a08fef21>] class_process_config+0x1a91/0x2820 [obdclass]
 [<a08ffe90>] class_manual_cleanup+0x1e0/0x6d0 [obdclass]
 [<a092a115>] server_stop_servers+0xd5/0x160 [obdclass]
 [<a092f6c6>] server_put_super+0x126/0xca0 [obdclass]
 [<8121068a>] generic_shutdown_super+0x6a/0xf0
 [<81210a62>] kill_anon_super+0x12/0x20
 [<a09027e2>] lustre_kill_super+0x32/0x50 [obdclass]
 [<81210e59>] deactivate_locked_super+0x49/0x60
 [<812115a6>] deactivate_super+0x46/0x60
 [<8123019f>] cleanup_mnt+0x3f/0x80
 [<81230232>] __cleanup_mnt+0x12/0x20
 [<810ab085>] task_work_run+0xb5/0xf0
 [<8102ac12>] do_notify_resume+0x92/0xb0
 [<81783c83>] int_signal+0x12/0x17
 Kernel panic - not syncing: LBUG

Lustre-change: https://review.whamcloud.com/41936
Lustre-commit: b635a0435d13d8431a8344735322b84cb4613b68

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Idfb19df123ceae177a0e447e9344bac6861166bf
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/42048
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
lustre/ptlrpc/service.c

index 05514cf..f22bb46 100644 (file)
@@ -3490,7 +3490,23 @@ ptlrpc_service_purge_all(struct ptlrpc_service *svc)
                        ptlrpc_server_finish_active_request(svcpt, req);
                }
 
-               LASSERT(list_empty(&svcpt->scp_rqbd_posted));
+               /*
+                * The portal may be shared by several services (eg:OUT_PORTAL).
+                * So the request could be referenced by other target. So we
+                * have to wait the ptlrpc_server_drop_request invoked.
+                *
+                * TODO: move the req_buffer as global rather than per service.
+                */
+               spin_lock(&svcpt->scp_lock);
+               while (!list_empty(&svcpt->scp_rqbd_posted)) {
+                       spin_unlock(&svcpt->scp_lock);
+                       wait_event_idle_timeout(svcpt->scp_waitq,
+                               list_empty(&svcpt->scp_rqbd_posted),
+                               cfs_time_seconds(1));
+                       spin_lock(&svcpt->scp_lock);
+               }
+               spin_unlock(&svcpt->scp_lock);
+
                LASSERT(svcpt->scp_nreqs_incoming == 0);
                LASSERT(svcpt->scp_nreqs_active == 0);
                /*