Whamcloud - gitweb
LU-9075 mdt: avoid race causing mdt_coordinator_cb() err msgs 43/25243/2
authorBruno Faccini <bruno.faccini@intel.com>
Fri, 3 Feb 2017 16:55:37 +0000 (17:55 +0100)
committerOleg Drokin <oleg.drokin@intel.com>
Sun, 23 Apr 2017 03:11:09 +0000 (03:11 +0000)
This patch mainly moves mdt_agent_record_update() call before
mdt_cdt_remove_request() in mdt_hsm_update_request_state(), to
avoid the frequent couple of
"(mdt_coordinator.c:1473:mdt_hsm_update_request_state()) ...
Cannot find running request for cookie ..."
and
"(mdt_coordinator.c:339:mdt_coordinator_cb()) ...
cannot cleanup timed out request ..."
error msgs, likely to concern active requests that have completed
and thus that have already been removed from memory in
mdt_hsm_update_request_state() (using mdt_cdt_remove_request() and
in the context of a MDT thread handling CT's MDS_HSM_PROGRESS
requests), but the corresponding action LLOG record update is stuck
awaiting for CDT to give-back cdt_llog_lock in
mdt_agent_record_update().

Others related but minor changes are, use of arr_req_change instead of
arr_req_create to more accuratelly determine if a request exceeds the
timeout, and change main debug msg in mdt_hsm_update_request_state()
to reflect if action LLOG record update will occur or not.

Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I043813f1ff11a7e9e99c534fa8560a35e2c52543
Reviewed-on: https://review.whamcloud.com/25243
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Henri Doreau <henri.doreau@cea.fr>
Reviewed-by: Quentin Bouget <quentin.bouget@cea.fr>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/mdt/mdt_coordinator.c

index c563284..445bf7d 100644 (file)
@@ -284,7 +284,7 @@ static int mdt_coordinator_cb(const struct lu_env *env,
                 */
                car = mdt_cdt_find_request(cdt, larr->arr_hai.hai_cookie, NULL);
                if (car == NULL) {
-                       last = larr->arr_req_create;
+                       last = larr->arr_req_change;
                } else {
                        last = car->car_req_update;
                        mdt_cdt_put_request(car);
@@ -1427,15 +1427,14 @@ int mdt_hsm_update_request_state(struct mdt_thread_info *mti,
 
                rc = hsm_cdt_request_completed(mti, pgs, car, &status);
 
-               /* remove request from memory list */
-               mdt_cdt_remove_request(cdt, pgs->hpk_cookie);
-
-               CDEBUG(D_HSM, "Updating record: fid="DFID" cookie=%#llx"
-                             " action=%s status=%s\n", PFID(&pgs->hpk_fid),
-                      pgs->hpk_cookie,
+               CDEBUG(D_HSM, "%s record: fid="DFID" cookie=%#llx action=%s "
+                             "status=%s\n",
+                      update_record ? "Updating" : "Not updating",
+                      PFID(&pgs->hpk_fid), pgs->hpk_cookie,
                       hsm_copytool_action2name(car->car_hai->hai_action),
                       agent_req_status2name(status));
 
+               /* update record first (LU-9075) */
                if (update_record) {
                        int rc1;
 
@@ -1451,6 +1450,10 @@ int mdt_hsm_update_request_state(struct mdt_thread_info *mti,
                                       pgs->hpk_cookie);
                        rc = (rc != 0 ? rc : rc1);
                }
+
+               /* then remove request from memory list (LU-9075) */
+               mdt_cdt_remove_request(cdt, pgs->hpk_cookie);
+
                /* ct has completed a request, so a slot is available, wakeup
                 * cdt to find new work */
                mdt_hsm_cdt_wakeup(mdt);