Whamcloud - gitweb
LU-8429 gnilnd: Option to not reconnect after conn timeout 59/21459/4
authorChuck Fossen <chuckf@cray.com>
Fri, 15 Apr 2016 13:42:27 +0000 (13:42 +0000)
committerOleg Drokin <oleg.drokin@intel.com>
Thu, 13 Oct 2016 23:36:37 +0000 (23:36 +0000)
commit99bc4ba277637656f6329a67158af6cee7070b48
tree64868be9b37502bbe947fead5afacbc16a71ca53
parent0d68cfcf18f8f2118d5115fe3766d72c0630bb4d
LU-8429 gnilnd: Option to not reconnect after conn timeout

When routers time out a client connection during a catastrophic
network disturbance like a cabinet EPO, there still may be
traffic from the file system that is using the router for the
return path to the client. This will cause a new connection to try
to be formed before the network has quiesced causing multiple failed
connection attempts which need to be put in purgatory since they could
possibly connect in the future. This can cause the gart space to be
consumed with registrations.

To avoid this, add a module parameter to_reconn_disable which when set
will change the state of the peer that has timed out to PEER_TIMED_OUT
which will act just like PEER_DOWN so that no traffic will be
attempted to a peer in this state.

When the network recovers, the client will form a new connection and
the state will change back to PEER_UP.

Changed gnp_down to gnp_state and GNILND_RCA_NODE_* to GNILND_PEER_*.

To add this option to routers, update /etc/modprobe.conf.local with:
options kgnilnd to_reconn_disable=1

To dynamically add this parameter to a booted node:
echo 1 > /sys/module/kgnilnd/parameters/to_reconn_disable

Tested functionality with both timing out a connection and bringing
down nodes to check the proper states are entered.

Test-Parameters: trivial
Signed-off-by: Chris Horn <hornc@cray.com>
Change-Id: I19cebab401208133d94e29c603eb340f77354684
Reviewed-on: http://review.whamcloud.com/21459
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Chuck Fossen <chuckf@cray.com>
Reviewed-by: James Shimek <jshimek@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lnet/klnds/gnilnd/gnilnd.c
lnet/klnds/gnilnd/gnilnd.h
lnet/klnds/gnilnd/gnilnd_cb.c
lnet/klnds/gnilnd/gnilnd_conn.c
lnet/klnds/gnilnd/gnilnd_modparams.c
lnet/klnds/gnilnd/gnilnd_proc.c
lnet/klnds/gnilnd/gnilnd_stack.c