Whamcloud - gitweb
LU-11675 hsm: don't allow new HSM requests during CDT_INIT 12/36212/2
authorNikitas Angelinas <nangelinas@cray.com>
Wed, 24 Jul 2019 09:43:53 +0000 (02:43 -0700)
committerOleg Drokin <green@whamcloud.com>
Mon, 23 Sep 2019 08:43:08 +0000 (08:43 +0000)
When the HSM CDT is shut down and restarted, it resets cdt_last_cookie
using ktime_get_real_seconds() and examines the CDT llog for existing
requests, in order to set cdt_last_cookie to the highest known value,
so that newly-assigned cookies are unique. There is a window between
CDT_INIT and CDT_RUNNING during which new requests can arrive, and if
the CDT llog has not been fully examined, cookies can be reused. This
can cause the following two assertions to be triggered in
cdt_agent_record_hash_add():

LASSERT(carl0->carl_cat_idx == carl1->carl_cat_idx);
LASSERT(carl0->carl_rec_idx == carl1->carl_rec_idx);

Fix this by not allowing new HSM requests during CDT_INIT.

Also, cookie values are incremented on a separate line, which causes
one value to be skipped at CDT startup time. This is not an issue, but
there does not seem to be a need for it; fix this post-incrementing
and assigning cookie values in the same line.

Lustre-change: https://review.whamcloud.com/33671
Lustre-commit: 39862136c3cfee127c4b0a9604ff12f560af3124

Signed-off-by: Nikitas Angelinas <nangelinas@cray.com>
Cray-bug-id: LUS-6589
Test-Parameters: trivial testlist=sanity-hsm
Change-Id: I18a1c3e85de6c50a9bf1ce598e21d83d893ad0ca
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Ben Evans <bevans@cray.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/36212
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/mdt/mdt_coordinator.c
lustre/mdt/mdt_hsm_cdt_actions.c
lustre/mdt/mdt_hsm_cdt_client.c

index 7548598..50373e1 100644 (file)
@@ -925,9 +925,10 @@ static int hsm_restore_cb(const struct lu_env *env,
 
        larr = (struct llog_agent_req_rec *)hdr;
        hai = &larr->arr_hai;
-       if (hai->hai_cookie > cdt->cdt_last_cookie)
+       if (hai->hai_cookie >= cdt->cdt_last_cookie) {
                /* update the cookie to avoid collision */
                cdt->cdt_last_cookie = hai->hai_cookie + 1;
+       }
 
        if (hai->hai_action != HSMA_RESTORE ||
            agent_req_in_final_state(larr->arr_status))
index 93c4a05..8381101 100644 (file)
@@ -289,12 +289,10 @@ int mdt_agent_record_add(const struct lu_env *env, struct mdt_device *mdt,
        /* in case of cancel request, the cookie is already set to the
         * value of the request cookie to be cancelled
         * so we do not change it */
-       if (hai->hai_action == HSMA_CANCEL) {
+       if (hai->hai_action == HSMA_CANCEL)
                larr->arr_hai.hai_cookie = hai->hai_cookie;
-       } else {
-               cdt->cdt_last_cookie++;
-               larr->arr_hai.hai_cookie = cdt->cdt_last_cookie;
-       }
+       else
+               larr->arr_hai.hai_cookie = cdt->cdt_last_cookie++;
 
        rc = llog_cat_add(env, lctxt->loc_handle, &larr->arr_hdr, NULL);
        if (rc > 0)
index b0c01c9..705e98e 100644 (file)
@@ -420,7 +420,7 @@ int mdt_hsm_add_actions(struct mdt_thread_info *mti,
        ENTRY;
 
        /* no coordinator started, so we cannot serve requests */
-       if (cdt->cdt_state == CDT_STOPPED)
+       if (cdt->cdt_state == CDT_STOPPED || cdt->cdt_state == CDT_INIT)
                RETURN(-EAGAIN);
 
        if (!hal_is_sane(hal))