Whamcloud - gitweb
LU-5195: hsm: delete HSM records not found on disk 19/11419/4
authorFrank Zago <fzago@cray.com>
Mon, 11 Aug 2014 23:24:07 +0000 (18:24 -0500)
committerOleg Drokin <oleg.drokin@intel.com>
Mon, 25 Aug 2014 16:56:50 +0000 (16:56 +0000)
After a MDS crash, it is possible the file containing an hsm record is
not present anymore. When the MDS restarts, we get traces like this:

(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000:
    failed to process HSM_ACTIONS llog (rc=-2)
(mdt_hsm_cdt_actions.c:104:cdt_llog_process())
    Skipped 600 previous similar messages
(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000:
    error opening log id 0x1c:1:0: rc = -2
(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages
(llog_cat.c:556:llog_cat_process_cb()) tas01-MDD0000:
    cannot find handle for llog 0x1c:1: -2
(llog_cat.c:556:llog_cat_process_cb()) Skipped 600 previous similar messages
(mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000:
    failed to process HSM_ACTIONS llog (rc=-2)
(mdt_hsm_cdt_actions.c:104:cdt_llog_process())
    Skipped 600 previous similar messages
(llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000:
    error opening log id 0x1c:1:0: rc = -2
(llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages

No HSM operation can happen, and the only way to clean is to unmount
the MDT and delete hsm_actions.

If the record can't be found, let the MDS delete it instead.

Change-Id: I61f625d4c18750c8044909ff56d53042cf0b6d86
Signed-off-by: frank zago <fzago@cray.com>
Reviewed-on: http://review.whamcloud.com/11419
Reviewed-by: Niu Yawei <yawei.niu@intel.com>
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Ryan Haasken <haasken@cray.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/obdclass/llog_cat.c

index b34d5d3..05d2f26 100644 (file)
@@ -535,6 +535,17 @@ int llog_cat_process_cb(const struct lu_env *env, struct llog_handle *cat_llh,
                CERROR("%s: cannot find handle for llog "DOSTID": %d\n",
                       cat_llh->lgh_ctxt->loc_obd->obd_name,
                       POSTID(&lir->lid_id.lgl_oi), rc);
+               if (rc == -ENOENT || rc == -ESTALE) {
+                       /* After a server crash, a stub of index
+                        * record in catlog could be kept, because
+                        * plain log destroy + catlog index record
+                        * deletion are not atomic. So we end up with
+                        * an index but no actual record. Destroy the
+                        * index and move on. */
+                       rc = llog_cat_cleanup(env, cat_llh, NULL,
+                                             rec->lrh_index);
+               }
+
                RETURN(rc);
        }
 
@@ -650,6 +661,17 @@ static int llog_cat_reverse_process_cb(const struct lu_env *env,
                CERROR("%s: cannot find handle for llog "DOSTID": %d\n",
                       cat_llh->lgh_ctxt->loc_obd->obd_name,
                       POSTID(&lir->lid_id.lgl_oi), rc);
+               if (rc == -ENOENT || rc == -ESTALE) {
+                       /* After a server crash, a stub of index
+                        * record in catlog could be kept, because
+                        * plain log destroy + catlog index record
+                        * deletion are not atomic. So we end up with
+                        * an index but no actual record. Destroy the
+                        * index and move on. */
+                       rc = llog_cat_cleanup(env, cat_llh, NULL,
+                                             rec->lrh_index);
+               }
+
                RETURN(rc);
        }