Whamcloud - gitweb
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages 10/34710/4
authorJames Simmons <uja.ornl@yahoo.com>
Thu, 18 Apr 2019 23:07:39 +0000 (19:07 -0400)
committerOleg Drokin <green@whamcloud.com>
Sat, 25 May 2019 04:57:08 +0000 (04:57 +0000)
commit9334f1d51249c186e15b42a1717312d03385153a
tree7dbc5bbcb66a3b81eb902ee87846e4bcdf8203a6
parent2d0c621d21be4e67b6075b76017af6e6fcd18c64
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages

Currently the functions target_handle_connect/reconnect show
incorrect timeout to the end of recovery:

fs1-OST0000: Recovery already passed deadline 71578:57.
If you do not want to wait more, please abort the recovery by force.
...
fs1-OST0000: Denying connection for new client ...
(1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57

This is due to the assumption that the time returned by the
monotonic clock and jiffies was initialized at the same time but
that is not the case. So a compare between ktime_get_seconds()
and jiffies converted to seconds is invalid.

We solve this by replacing the recovery timer with a hrtimer based
one. Their are many benefits to using a hrtimer over jiffies like
better scaling, power profile, and better handling on tickless
system. This also makes the code clear by using just the real wall
clock in all cases.

Change-Id: I9d7e7e92e67ee942bc1dc51fbb0af7d8f53e54e1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34710
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
lustre/include/obd.h
lustre/ldlm/ldlm_lib.c