Whamcloud - gitweb
LU-9119 lnet: fix race in lnet shutdown path 90/26690/4
authorOlaf Weber <olaf@sgi.com>
Fri, 27 Jan 2017 15:13:29 +0000 (16:13 +0100)
committerOleg Drokin <oleg.drokin@intel.com>
Fri, 5 May 2017 00:43:45 +0000 (00:43 +0000)
commit1612925723908f4eb4bc2cefe677b7825027fe7f
treed335d0f97e9aec3941f6a91004c5fb8153060458
parent2b0d3ff9a3b516e240e7fb44f79e2cb4e4a064a7
LU-9119 lnet: fix race in lnet shutdown path

The locking changes for the lnet_net_lock made for Multi-Rail
introduce a race in the LNet shutdown path. The code keeps two
states in the_lnet.ln_shutdown: 0 means LNet is either up and
running or shut down, while 1 means lnet is shutting down. In
lnet_select_pathway() if we need to restart and drop and relock
the lnet_net_lock we can find that LNet went from running to
stopped, and not be able to tell the difference.

Replace ln_shutdown with a three-state ln_state patterned on
ln_rc_state: states are LNET_STATE_SHUTDOWN, LNET_STATE_RUNNING,
and LNET_STATE_STOPPING. Most checks against ln_shutdown now test
ln_state against LNET_STATE_RUNNING. LNet moves to RUNNING state
in lnet_startup_lndnets().

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Change-Id: I7afcbeb793dfa4d0a361e421ae06a99b7d4db903
Reviewed-on: https://review.whamcloud.com/26690
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Doug Oucharek <doug.s.oucharek@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lnet/include/lnet/lib-types.h
lnet/lnet/api-ni.c
lnet/lnet/lib-move.c
lnet/lnet/lib-ptl.c
lnet/lnet/peer.c
lnet/lnet/router.c