Whamcloud - gitweb
LU-7646 lnet: Stop Infinite CON RACE Condition 30/19430/11
authorDoug Oucharek <doug.s.oucharek@intel.com>
Tue, 19 Jan 2016 01:26:08 +0000 (17:26 -0800)
committerOleg Drokin <oleg.drokin@intel.com>
Mon, 15 Aug 2016 21:11:56 +0000 (21:11 +0000)
commit94f757bf67d58694201b2434f7879974c7abd622
treeb853ccc56df6b597106791a0948f237d0d5f0b34
parentc3e44bc813e93ea62b5618bf47a1ddd35a92b866
LU-7646 lnet: Stop Infinite CON RACE Condition

In current code, when a CON RACE occurs, the passive side will
let the node with the higher NID value win the race.

We have a field case where a node can have a "stuck"
connection which never goes away and is the trigger of a
never-ending loop of re-connections.

This patch introduces a counter to how many times a
connection in a connecting state has been the cause of a CON RACE
rejection. After 20 times (constant MAX_CONN_RACES_BEFORE_ABORT),
we assume the connection is stuck and let the other side (with
lower NID) win.

Signed-off-by: Doug Oucharek <doug.s.oucharek@intel.com>
Change-Id: I32e035806e95868b13c28c42e241b969940a35c9
Reviewed-on: http://review.whamcloud.com/19430
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Amir Shehata <amir.shehata@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lnet/klnds/o2iblnd/o2iblnd.h
lnet/klnds/o2iblnd/o2iblnd_cb.c