Whamcloud - gitweb
LU-10945 ldlm: fix l_last_activity usage 31/34131/3
authorOleg Drokin <green@whamcloud.com>
Tue, 29 Jan 2019 17:45:48 +0000 (12:45 -0500)
committerOleg Drokin <green@whamcloud.com>
Sat, 23 Feb 2019 05:10:00 +0000 (05:10 +0000)
commit20722fbd3d4fd5e83773332753069e8bec0d2b15
tree2e632cbe331218457f4c05da9186c85a45555c1b
parent4ae14186ce1958373c506e3abb12b891d46e70dc
LU-10945 ldlm: fix l_last_activity usage

When race happen between ldlm_server_blocking_ast() and
ldlm_request_cancel(), the at_measured() is called with wrong
value equal to current time. And even worse, ldlm_bl_timeout() can
return current_time*1.5.
Before a time functions was fixed by LU-9019(e920be681) for 64bit,
this race leads to ETIMEDOUT at ptlrpc_import_delay_req() and
client eviction during bl ast sending. The wrong type conversion
take a place at pltrpc_send_limit_expired() at cfs_time_seconds().

We should not take cancels into accoount if the BLAST is not send,
just because the last_activity is not properly initialised - it
destroys the AT completely.
The patch devides l_last_activity to the client l_activity and
server l_blast_sent for better understanding. The l_blast_sent is
used for blocking ast only to measure time between BLAST and
cancel request.

For example:
 server cancels blocked lock after 1518731697s
 waiting_locks_callback()) ### lock callback timer expired after 0s:
 evicting client

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I44962d2b3675b77e09182bbe062bdd78d6cb0af5
Cray-bug-id: LUS-5736
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34131
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
lustre/include/lustre_dlm.h
lustre/ldlm/ldlm_lock.c
lustre/ldlm/ldlm_lockd.c
lustre/ldlm/ldlm_request.c