Whamcloud - gitweb
LU-9094 o2iblnd: reconnect peer for REJ_INVALID_SERVICE_ID 78/25378/3
authorSergey Cheremencev <sergey.cheremencev@seagate.com>
Fri, 16 Dec 2016 12:08:56 +0000 (15:08 +0300)
committerOleg Drokin <oleg.drokin@intel.com>
Sat, 18 Feb 2017 23:51:30 +0000 (23:51 +0000)
commit603aa7a1df6ee6ce6fe0d501a8b2bd1bfdf43bb8
tree853db958f58e414c2819d270d13de3c74dda28fd
parent81e010d101a667b2bc22c2caddeefd40f02a3d19
LU-9094 o2iblnd: reconnect peer for REJ_INVALID_SERVICE_ID

Don't kill the peer in case of INVALID_SERVICE_ID. This produces
huge number of peers for the same nid and may cause an OOM.

The OOM was frequently seen with mlnx-ofa-kernel-2.3 where used
RCU mechanism in mlx4_cq_free. In older mlnx4 versions to mitigate
the issue RCU was changed with spin locks.

Change-Id: Ib609232242c45bc9819e1cb4c593da3a490c63a0
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Seagate-bug-id: MRP-4056
Reviewed-on: https://review.whamcloud.com/25378
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lnet/klnds/o2iblnd/o2iblnd.h
lnet/klnds/o2iblnd/o2iblnd_cb.c