Whamcloud - gitweb
LU-16674 obdclass: optimize job_stats reads
This patch has 2 objectives:
1/ limit the lock time on ojs_list (list of job stats)
"lctl get_param mdt.*.job_stats" can not dump job_stats in a single
read (seq_file buffer is limited to 4k). So, several reads are needed
to dump the full job list.
For each read, we have to find the job entry corresponding to the file
offset. For now, we walk ojs_list from the beginning to get this
entry.
This patch saved the last known entry and the corresponding offset to
start the next read from here.
2/ avoid the lock contention when reading job_stats
This patch replaces the read lock on ojs_lock by RCU locking, this
enables userspace processes reading the job_stats not to interfere
with the kernel target threads.
Add the stress test sanity 205g to check for possible races.
Add stack_trap in sanity test 205a and 205e to restore jobid_name and
jobid_var.
* Performance *
The following command is used to capture records:
$ time grep -c job_id /proc/fs/lustre/mdt/lustrefs-MDT0000/job_stats
- job_stats dump with no fs activity
Here are results after ending sanity test 205g with slow mode and
job_cleanup_interval=300s.
___________________________________
| nbr of job | time | rate |
_____________|____________|______|_______________|
|without patch| 14749 | 1.3s | 11345 jobid/s |
|_____________|____________|______|_______________|
|with patch | 22209 | 0.6s | 37015 jobid/s |
|_____________|____________|______|_______________|
|diff % | +43% | -54% | +226% |
|_____________|____________|______|_______________|
- job_stats dump with fs activity
Here are results before ending sanity test 205g with slow mode and
job_cleanup_interval=300s.
___________________________________
| nbr of job | time | rate |
_____________|____________|______|_______________|
|without patch| 14849 | 2.3s | 6428 jobid/s |
|_____________|____________|______|_______________|
|with patch | 22776 | 1.2s | 18823 jobid/s |
|_____________|____________|______|_______________|
|diff % | +53% | -47% | +192% |
|_____________|____________|______|_______________|
Test-Parameters: testlist=sanity env=SLOW=yes,ONLY=205g,ONLY_REPEAT=10
Test-Parameters: testlist=sanity env=ONLY=205g serverversion=2.15.2
Test-Parameters: testlist=sanity env=SLOW=yes,ONLY=205
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ic4cd90965720af76eff0ed4e00ca897518bfbc66
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50459
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>