Whamcloud - gitweb
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages 83/33883/12
authorJames Simmons <uja.ornl@yahoo.com>
Mon, 25 Mar 2019 15:42:17 +0000 (11:42 -0400)
committerOleg Drokin <green@whamcloud.com>
Mon, 8 Apr 2019 05:32:39 +0000 (05:32 +0000)
commit1ba794f6ec9e7ce7ad65fd74f170089fffc31d91
tree8955f4dedf7cfe9fd936118819a9e905e8edc2b1
parent42adbae36f206a6ed4170e7619cd993c8fa80b1d
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages

Currently the functions target_handle_connect/reconnect show
incorrect timeout to the end of recovery:

fs1-OST0000: Recovery already passed deadline 71578:57.
If you do not want to wait more, please abort the recovery by force.
...
fs1-OST0000: Denying connection for new client ...
(1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57

This is due to the assumption that the time returned by the
monotonic clock and jiffies was initialized at the same time but
that is not the case. So a compare between ktime_get_seconds()
and jiffies converted to seconds is invalid.

We solve this by replacing the recovery timer with a hrtimer based
one. Their are many benefits to using a hrtimer over jiffies like
better scaling, power profile, and better handling on tickless
system. This also makes the code clear by using just the real wall
clock in all cases.

Change-Id: I50442605686382f7afb9a1f49eb336c0ee637cdc
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/33883
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd.h
lustre/ldlm/ldlm_lib.c