Whamcloud - gitweb
LU-1565 ptlrpc: resend CANCEL rpc
authorVitaly Fertman <vitaly_fertman@xyratex.com>
Tue, 6 Nov 2012 19:09:47 +0000 (23:09 +0400)
committerOleg Drokin <green@whamcloud.com>
Thu, 29 Nov 2012 00:28:53 +0000 (19:28 -0500)
commit99a7b72117efe0fcf32013c24e550c6fa5765868
treecfeb5edb5e47713668aa64b8666a916613f261d8
parente6ea6e70cd91f01334957f421b5692dbaa6ad75b
LU-1565 ptlrpc: resend CANCEL rpc

it is better to deliver CANCEL rpc to server reliably in the case of:
    RPC timeout, re-connect, CANCEL resend
because server may have sent BL AST and is waiting for this CANCEL.
this avoids possible idle time on server and later client evictions.

CANCEL is always has up-todate lock handle or both enqueue and cancel
are not replayed on recovery, with the exception of the case of:
    BL AST is sent; recovery starts, lock is re-enqueued, BL AST comes
    to client, cancel is created, recovery ends (lock handle has
    changed), CANCEL is sent, its reply gets estale as lock handled is
    not updated in the RPC.
this case is left unfixed and still may result in lock callback
timeout and client eviction, but this race window is much much shorter
than the target case being fixed by this fix.

Also remove lock cancelling from client_common_put_super() as it is
done later in client_disconnect_export().

Change-Id: I1bfe70444299d93c3fb348b737cb9721ea63eda3
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andrew Perepechko <Andrew_Perepechko@xyratex.com>
Xyratex-bug-id: MRP-477
Reviewed-on: http://review.whamcloud.com/3189
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <tappro@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd_support.h
lustre/ldlm/ldlm_lockd.c
lustre/ldlm/ldlm_request.c
lustre/llite/llite_lib.c
lustre/ptlrpc/service.c
lustre/tests/recovery-small.sh