Whamcloud - gitweb
LU-17872 ldlm: switch to read_positive in reclaim_full 41/55141/7
authorPatrick Farrell <paf0187@gmail.com>
Wed, 29 May 2024 20:41:54 +0000 (16:41 -0400)
committerOleg Drokin <green@whamcloud.com>
Wed, 19 Jun 2024 01:12:23 +0000 (01:12 +0000)
commit0c16987b2233c32d775f0e3e6f6503c4b7825e02
tree0713b0782eae049338655bbc7bc7a34d93875641
parent19929e35ac5c9762feb52d104c8f8d059ba5eb1b
LU-17872 ldlm: switch to read_positive in reclaim_full

Checking reclaim full for every lock request is expensive;
it requires taking a global spinlock and can completely
clog the MDS CPU on larger systems.

If we switch to read_positive rather than sum_positive for
our counter read, we avoid this spinlock at the cost of
being off by as much as NR_CPU*32.

Since the counter is for hundreds of thousands to millions
of items and just triggers memory reclaim, this level of
error is completely fine.

This resolves the contention issue, on an OCI system with
384 cores, here's our mdtest comparison:

Operation           | Without Patch | With Patch  | %Change
---------------------|---------------|-------------|-------
Directory creation  | 69481.994     | 64373.060   | -7%
Directory stat      | 87942.757     | 274670.454  | 212%
Directory rename    | 78127.922     | 92592.239   | 19%
Directory removal   | 69901.490     | 89560.415   | 28%
File creation       | 62789.774     | 107294.450  | 71%
File stat           | 88039.061     | 480469.711  | 446%
File read           | 82192.370     | 151117.380  | 84%
File removal        | 146690.828    | 127589.655  | -13%
Tree creation       | 46.549        | 56.992      | 22%
Tree removal        | 51.531        | 53.967      | 5%

Note the *446%* improvement in stat and the 70-80% in
file creation and read.

Note this issue is likely much worse on systems with higher
core counts since the cost of summing the counter scales
with the number of CPUs.  This may be why this has not been
seen before.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I01a39abf5e6f0829156b413b1f44001e2c504be2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55141
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: wangdi <di.d.wang@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ldlm/ldlm_lockd.c
lustre/ldlm/ldlm_reclaim.c