Whamcloud - gitweb
LU-12956 ldlm: fix hrtimer using 13/40513/3
authorAlexander Boyko <c17825@cray.com>
Mon, 2 Nov 2020 11:02:47 +0000 (06:02 -0500)
committerOleg Drokin <green@whamcloud.com>
Thu, 19 Nov 2020 10:19:49 +0000 (10:19 +0000)
commit01a70a56540f095b3dc30656b7135636b4b3abef
tree01bba6f0b3744220d39ae6af0f510bea080036dc
parent12531667529f5a327f8050f0a9f1495cf9e8b002
LU-12956 ldlm: fix hrtimer using

A race could happen between hrtimer_start() and
hrtimer_expires_remaning(), cause the second one doesn't hold a lock
on timer->base. And a first one could change it between different CPU.
The following failure happened:
BUG: unable to handle kernel NULL pointer dereference at 000000000028
IP: [<ffffffffc0fc773f>] target_handle_connect+0x12ff/0x2b50 [ptlrpc]
at remaining = hrtimer_expires_remaining(timer), timer->base was NULL

The fix changes hrtimer_expires_remaining() to hrtimer_get_remaining()
which helds a lock and prevents race.

Fixes: 9334f1d51249 ("LU-11771 ldlm: use hrtimer for recovery to fix timeout messages")
HPE-bug-id: LUS-9514
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2cea1e5e2d523f131f1acb3346cf0324adae624e
Reviewed-on: https://review.whamcloud.com/40513
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ldlm/ldlm_lib.c