Whamcloud - gitweb
LU-8303 lnet: make connection more stable with packet loss 74/20874/2
authorAlexander Boyko <alexander.boyko@seagate.com>
Sun, 19 Jun 2016 12:56:51 +0000 (15:56 +0300)
committerOleg Drokin <oleg.drokin@intel.com>
Tue, 5 Jul 2016 23:47:59 +0000 (23:47 +0000)
IB network may lose last connection handshake packet.
This problem isn't Lustre specific and described at
https://oss.oracle.com/pipermail/rds-devel/2007-December/000271.html
as example.
Solution is to make conection established if any packet is received
for it.

Signed-off-by: Alexander Boyko <alexander.boyko@seagate.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Seagate-bug-id: MRP-2883
Change-Id: I9bf69fdca24e51a5de06c92e9ad76b4e040c5d65
Reviewed-on: http://review.whamcloud.com/20874
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lnet/klnds/o2iblnd/o2iblnd_cb.c

index b13fcf9..b211f95 100644 (file)
@@ -3406,6 +3406,10 @@ kiblnd_qp_event(struct ib_event *event, void *arg)
         case IB_EVENT_COMM_EST:
                 CDEBUG(D_NET, "%s established\n",
                        libcfs_nid2str(conn->ibc_peer->ibp_nid));
         case IB_EVENT_COMM_EST:
                 CDEBUG(D_NET, "%s established\n",
                        libcfs_nid2str(conn->ibc_peer->ibp_nid));
+               /* We received a packet but connection isn't established
+                * probably handshake packet was lost, so free to
+                * force make connection established */
+               rdma_notify(conn->ibc_cmid, IB_EVENT_COMM_EST);
                 return;
 
         default:
                 return;
 
         default: