Whamcloud - gitweb
LU-11276 ldlm: fix lock convert races
The blocking cb may be triggered in parallel and the convert logic
of the DOM lock must be ready that the cancel_bits could be already
zeroed by the first executor.
As there may be several blocking cb parallel executors and several
conversion callers, each requesting for different inode bits, setup
the following logic:
- the lock keeps the aggregated set of bits requested for cancelling
by different parties, where 0 means the whole lock is to be
cancelled, and where the CBPENDING flag means there is a canceling
job pending;
- once completed, the cancel_bits are zeroed and the CBPENDING flag
is dropped, meaning the next request will be a part of the next job;
- once a local lock is converted, its state is changed appropriately
and no cleanup is left for the interpret time as the lock is ready
for the next usage;
- as the lock is unlocked in a process of conversion and more bits
may appear, check it and repeat appropriately;
- let just 1 conversion executor to work at a time, others are waiting
similar to ldlm_cli_cancel();
- there are others who may want to cancel unused locks (cancel_lru,
cancel_resource_local), consider CANCELING as a request to cancel
the full lock independently of the cancel_bits;
Some cleanups are done:
- move the cache drop logic to the CANCELING part of the blocking cb
from the BLOCKING one;
- remove the convert RPC interpret, as the lock cleanups are already
done in advance; the convert RPC is re-sendable and an error means
there is a serioes net problem;
Test-Parameters: testlist=racer,racer,racer
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I901de34241704ed801152f071cb7f610fe6f4bfe
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36466
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>