Whamcloud - gitweb
LU-13468 fld: repeat rpc in fld_client_rpc after EAGAIN 02/38302/15
authorVladimir Saveliev <vlaidimir.saveliev@hpe.com>
Sun, 31 Oct 2021 06:42:35 +0000 (09:42 +0300)
committerOleg Drokin <green@whamcloud.com>
Thu, 20 Jan 2022 18:25:00 +0000 (18:25 +0000)
commitb1acf734f31c13d291c5e1534d7a01f0fbd7e972
treeaf472e36a7bfaa957d249a8a14c722330b2c2ecb
parent0feec5a3c7d4518d5c563739124b202a6a0a99f7
LU-13468 fld: repeat rpc in fld_client_rpc after EAGAIN

Timeout-ed rpc sent by fld_client_rpc() may lead to client operation
failure.

Have fld_client_rpc() to repeat rpc in case of EAGAIN after a while.

Test to illustrate the issue is added.

Typo in fld_client_rpc() in failure simulation is fixed.
recovery-small.sh:test_110k() is changed so that fld_client_rpc()
failed only once, otherwise it would fall into endless loop.

HPE-bug-id: LUS-8652
Fixes: e3f6111dfd1c ("LU-11761 fld: lets caller to retry FLD_QUERY")
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I145e719ec2fb5f5dbf9b5aa4b2a5b7e62f98c19f
Reviewed-on: https://review.whamcloud.com/38302
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/fld/fld_handler.c
lustre/fld/fld_request.c
lustre/tests/recovery-small.sh
lustre/tests/sanity.sh