Whamcloud - gitweb
LU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn 98/53298/2
authorSerguei Smirnov <ssmirnov@whamcloud.com>
Thu, 30 Nov 2023 18:55:11 +0000 (10:55 -0800)
committerOleg Drokin <green@whamcloud.com>
Wed, 20 Dec 2023 01:57:51 +0000 (01:57 +0000)
There were examples in the field with RoCE setups which demonstrate
that CM_EVENT_UNREACHABLE may be received when connection is already
in ESTABLISHED state. This causes an assert in kiblnd_cm_callback to
fail.

Handle this in a more gracious manner: report the event as unexpected
and allow the flow to continue. If there are indeed issues on
the connection, it is expected to report transaction errors later
and get cleaned up without crashing the whole system.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If32166fe9fc59e025609c2035cb1c03d3bed22f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53298
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/klnds/o2iblnd/o2iblnd_cb.c

index d0ce634..85dc9dd 100644 (file)
@@ -3341,10 +3341,10 @@ kiblnd_cm_callback(struct rdma_cm_id *cmid, struct rdma_cm_event *event)
 
        case RDMA_CM_EVENT_UNREACHABLE:
                conn = cmid->context;
-               CNETERR("%s: UNREACHABLE %d cm_id %p conn %p\n",
-                       libcfs_nid2str(conn->ibc_peer->ibp_nid), event->status, cmid, conn);
-               LASSERT(conn->ibc_state != IBLND_CONN_ESTABLISHED &&
-                       conn->ibc_state != IBLND_CONN_INIT);
+               CNETERR("%s: UNREACHABLE %d cm_id %p conn %p ibc_state: %d\n",
+                       libcfs_nid2str(conn->ibc_peer->ibp_nid),
+                       event->status, cmid, conn, conn->ibc_state);
+               LASSERT(conn->ibc_state != IBLND_CONN_INIT);
                if (conn->ibc_state == IBLND_CONN_ACTIVE_CONNECT ||
                    conn->ibc_state == IBLND_CONN_PASSIVE_WAIT) {
                        kiblnd_connreq_done(conn, -ENETDOWN);