Whamcloud - gitweb
LU-10945 ldlm: fix l_last_activity usage 33/32133/3
authorAlexander Boyko <c17825@cray.com>
Tue, 24 Apr 2018 07:06:42 +0000 (03:06 -0400)
committerOleg Drokin <oleg.drokin@intel.com>
Tue, 29 May 2018 04:53:41 +0000 (04:53 +0000)
commite09d273cb5f218a857d22011ab3035f7b178b023
treea1620fabd8f282ded7e1240bf1c60649bea5f19e
parent4325b1e456647f519a0eca32204554e0c358646f
LU-10945 ldlm: fix l_last_activity usage

When race happen between ldlm_server_blocking_ast() and
ldlm_request_cancel(), the at_measured() is called with wrong
value equal to current time. And even worse, ldlm_bl_timeout() can
return current_time*1.5.
Before a time functions was fixed by LU-9019(e920be681) for 64bit,
this race leads to ETIMEDOUT at ptlrpc_import_delay_req() and
client eviction during bl ast sending. The wrong type conversion
take a place at pltrpc_send_limit_expired() at cfs_time_seconds().

We should not take cancels into accoount if the BLAST is not send,
just because the last_activity is not properly initialised - it
destroys the AT completely.
The patch devides l_last_activity to the client l_activity and
server l_blast_sent for better understanding. The l_blast_sent is
used for blocking ast only to measure time between BLAST and
cancel request.

For example:
 server cancels blocked lock after 1518731697s
 waiting_locks_callback()) ### lock callback timer expired after 0s:
 evicting client

Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I44962d2b3675b77e09182bbe062bdd78d6cb0af5
Cray-bug-id: LUS-5736
Reviewed-on: https://review.whamcloud.com/32133
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-by: Mike Pershin <mike.pershin@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/include/lustre_dlm.h
lustre/ldlm/ldlm_lock.c
lustre/ldlm/ldlm_lockd.c
lustre/ldlm/ldlm_request.c