Whamcloud - gitweb
LU-16297 ptlrpc: don't panic during reconnection 29/49029/9
authorAlexander Boyko <alexander.boyko@hpe.com>
Thu, 3 Nov 2022 11:23:20 +0000 (07:23 -0400)
committerOleg Drokin <green@whamcloud.com>
Tue, 3 Jan 2023 21:34:59 +0000 (21:34 +0000)
ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks

(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||

reconnect changes import state and flags
and second part

(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))

MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.

HPE-bug-id: LUS-10985
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ifc9e413c679c3e8a4c8f4f541251bebabae41c82
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ptlrpc/niobuf.c

index 90dac9a..64dfa70 100644 (file)
@@ -797,13 +797,21 @@ int ptl_send_rpc(struct ptlrpc_request *request, int noreply)
                LBUG();
        }
 
-       /** For enabled AT all request should have AT_SUPPORT in the
-        * FULL import state when OBD_CONNECT_AT is set */
-       LASSERT(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
-               (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
-               !(imp->imp_connect_data.ocd_connect_flags &
-                 OBD_CONNECT_AT));
-
+       /**
+        * For enabled AT all request should have AT_SUPPORT in the
+        * FULL import state when OBD_CONNECT_AT is set.
+        * This check has a race with ptlrpc_connect_import_locked()
+        * with low chance, don't panic, only report.
+        */
+       if (!(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||
+           (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
+           !(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT))) {
+               DEBUG_REQ(D_HA, request, "Wrong state of import detected, AT=%d, imp=%d, msghdr=%d, conn=%d\n",
+                         AT_OFF, imp->imp_state != LUSTRE_IMP_FULL,
+                         (imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT),
+                         !(imp->imp_connect_data.ocd_connect_flags &
+                           OBD_CONNECT_AT));
+       }
        if (request->rq_resend) {
                lustre_msg_add_flags(request->rq_reqmsg, MSG_RESENT);
                if (request->rq_resend_cb != NULL)