Whamcloud - gitweb
LU-14793 hsm: record index for further HSM action scanning
authorQian Yingjin <qian@ddn.com>
Fri, 25 Jun 2021 08:22:35 +0000 (16:22 +0800)
committerAndreas Dilger <adilger@whamcloud.com>
Fri, 14 Jan 2022 06:07:29 +0000 (06:07 +0000)
commitd37d95ce403c12460428d526d7d1b24e91ce2c22
tree7bd2999f61013ded43f605a3523d8278369d58bd
parent345f0d8e56c464a7b4e222430b0f4728c1d13ec7
LU-14793 hsm: record index for further HSM action scanning

there is contention between HSM archive request and "hsm_cdtr"
kernel thread:
->mdt_hsm_request()
  ->mdt_hsm_add_actions()
    ->mdt_hsm_register_hal()
      ->mdt_agent_record_add()
        ->down_write(&cdt->cdt_llog_lock)
        ->llog_cat_add()
        ->up_write(&cdt->cdt_llog_lock)

->mdt_coordinator()
  ->cdt_llog_process()
    ->down_write(&cdt->cdt_llog_lock);
    ->llog_cat_process()
    ->up_write(&cdt->cdt_llog_lock);

HSM archive request and HSM cat llog scanning in the kernel daemon
"hsm_cdtr" are both contenting for write llog lock to add or
update the "hsm_actions" llog.

In the tesing, it uses max_requests = 1000000.
In the current implementation, it means kernel daemon thread
"hsm_cdtr" needs to scan nearly whole "hsm_actions" llog from the
beginning position with write llog lock held.
This will slow down the HSM archive requests which is contented
for write llog lock.

As llog is append-only, we record the latest handled position in
the llog, thus next scanning can start from the previous recorded
postion (llog index), does not need to start from the beginning.

Another way to mitigate this probelm is:
when the llog scanner found that there are other process
contended for the llog lock, it will stop the llog scanning and
release write llog lock properly for incoming HSM archive requests.

After applied this patch, with 200000 HSM actions in llog, the time
to queue 10000 HSM archive requests reduces from 10 seconds to 4
seconds.

Lustre-change: https://review.whamcloud.com/44077
Lustre-commit: a15a5432f8063e3a04a87d74eafac0060a8f9d26

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I2e92daf34844605ee648787daf859143335c68bf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46013
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
lustre/mdt/mdt_coordinator.c
lustre/tests/sanity-hsm.sh