Whamcloud - gitweb
LU-12402 lnet: handle recursion in resend 31/35431/6
authorAmir Shehata <ashehata@whamcloud.com>
Sat, 6 Jul 2019 16:02:33 +0000 (09:02 -0700)
committerOleg Drokin <green@whamcloud.com>
Wed, 21 Aug 2019 04:51:57 +0000 (04:51 +0000)
commitad9243693c9a5a5b2c34165ad853ddf5ceec4617
tree945384054f40e429b0dbcd0b38b028b777f28b09
parent12d7b7af5e397368c32bc4b82609e37afa0c0a26
LU-12402 lnet: handle recursion in resend

When we're resending a message we have to decommit it first. This
could potentially result in another message being picked up from the
queue and sent, which could fail immediately and be finalized, causing
recursion. This problem was observed when a router was being shutdown.

This patch uses the same mechanism used in lnet_finalize() to limit
recursion. If a thread is already finalizing a message and it gets
into path where it starts finalizing a second, then that message
is queued and handled later.

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0cb943473fc8c22573d98da63a99cf7d678d4f42
Reviewed-on: https://review.whamcloud.com/35431
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <hornc@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/include/lnet/lib-types.h
lnet/lnet/lib-msg.c