Whamcloud - gitweb
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages 76/35276/3
authorJames Simmons <uja.ornl@yahoo.com>
Thu, 18 Apr 2019 23:07:39 +0000 (19:07 -0400)
committerOleg Drokin <green@whamcloud.com>
Sun, 11 Aug 2019 23:34:07 +0000 (23:34 +0000)
commit632bf95a442931b97c2ea4816fa3e56a7853a2a2
treecf89c61b8ba3f5136fdb92cedce21ab16a5abba4
parente0d67dab399a866027f84886343885d228a9acdb
LU-11771 ldlm: use hrtimer for recovery to fix timeout messages

Currently the functions target_handle_connect/reconnect show
incorrect timeout to the end of recovery:

fs1-OST0000: Recovery already passed deadline 71578:57.
If you do not want to wait more, please abort the recovery by force.
...
fs1-OST0000: Denying connection for new client ...
(1 recovered, 11 in progress, and 1 evicted) to recover in 71578:57

This is due to the assumption that the time returned by the
monotonic clock and jiffies was initialized at the same time but
that is not the case. So a compare between ktime_get_seconds()
and jiffies converted to seconds is invalid.

We solve this by replacing the recovery timer with a hrtimer based
one. Their are many benefits to using a hrtimer over jiffies like
better scaling, power profile, and better handling on tickless
system. This also makes the code clear by using just the real wall
clock in all cases.

Change-Id: I9d7e7e92e67ee942bc1dc51fbb0af7d8f53e54e1
Signed-off-by: James Simmons <uja.ornl@yahoo.com>
Reviewed-on: https://review.whamcloud.com/34710
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Petros Koutoupis <pkoutoupis@cray.com>
Reviewed-on: https://review.whamcloud.com/35276
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd.h
lustre/ldlm/ldlm_lib.c