Whamcloud - gitweb
LU-9120 lnet: handle fatal device error 72/32772/15
authorAmir Shehata <amir.shehata@intel.com>
Fri, 29 Jun 2018 23:54:38 +0000 (16:54 -0700)
committerAmir Shehata <ashehata@whamcloud.com>
Fri, 17 Aug 2018 20:13:05 +0000 (20:13 +0000)
commit6b1571209a9938719b081465f1ee327380a70554
treecfd7ecb9737ca03261236cc60584611870720c3c
parent0b1947d14188db34de0aa12b8d21e0b09e00ae13
LU-9120 lnet: handle fatal device error

The o2iblnd can receive device status on the QP event handler.
There are three in specific that are being handled in this patch:
IB_EVENT_DEVICE_FATAL
IB_EVENT_PORT_ERR
IB_EVENT_PORT_ACTIVE
For DEVICE_FATAL and PORT_ERR the NI associated with the QP is set
in fatal error mode. This NI will no longer be selected when sending
messages. When PORT_ACTIVE is received the NI associated with the QP
has the fatal error cleared and future messages can use that NI.

Test-Parameters: forbuildonly
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I282aa463927f489c46e4e45040e93478c9823a37
Reviewed-on: https://review.whamcloud.com/32772
Reviewed-by: Sonia Sharma <sharmaso@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Tested-by: Jenkins
lnet/include/lnet/lib-types.h
lnet/klnds/o2iblnd/o2iblnd_cb.c
lnet/lnet/lib-move.c