From: Frank Zago Date: Mon, 11 Aug 2014 23:24:07 +0000 (-0500) Subject: LU-5195: hsm: delete HSM records not found on disk X-Git-Tag: 2.6.52~37 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;ds=sidebyside;h=3a83b4b93373db46a4ee60a7388775fa0be2eb9a;p=fs%2Flustre-release.git LU-5195: hsm: delete HSM records not found on disk After a MDS crash, it is possible the file containing an hsm record is not present anymore. When the MDS restarts, we get traces like this: (mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000: failed to process HSM_ACTIONS llog (rc=-2) (mdt_hsm_cdt_actions.c:104:cdt_llog_process()) Skipped 600 previous similar messages (llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000: error opening log id 0x1c:1:0: rc = -2 (llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages (llog_cat.c:556:llog_cat_process_cb()) tas01-MDD0000: cannot find handle for llog 0x1c:1: -2 (llog_cat.c:556:llog_cat_process_cb()) Skipped 600 previous similar messages (mdt_hsm_cdt_actions.c:104:cdt_llog_process()) tas01-MDT0000: failed to process HSM_ACTIONS llog (rc=-2) (mdt_hsm_cdt_actions.c:104:cdt_llog_process()) Skipped 600 previous similar messages (llog_cat.c:192:llog_cat_id2handle()) tas01-MDD0000: error opening log id 0x1c:1:0: rc = -2 (llog_cat.c:192:llog_cat_id2handle()) Skipped 600 previous similar messages No HSM operation can happen, and the only way to clean is to unmount the MDT and delete hsm_actions. If the record can't be found, let the MDS delete it instead. Change-Id: I61f625d4c18750c8044909ff56d53042cf0b6d86 Signed-off-by: frank zago Reviewed-on: http://review.whamcloud.com/11419 Reviewed-by: Niu Yawei Tested-by: Jenkins Tested-by: Maloo Reviewed-by: Ryan Haasken Reviewed-by: Oleg Drokin --- diff --git a/lustre/obdclass/llog_cat.c b/lustre/obdclass/llog_cat.c index b34d5d3..05d2f26 100644 --- a/lustre/obdclass/llog_cat.c +++ b/lustre/obdclass/llog_cat.c @@ -535,6 +535,17 @@ int llog_cat_process_cb(const struct lu_env *env, struct llog_handle *cat_llh, CERROR("%s: cannot find handle for llog "DOSTID": %d\n", cat_llh->lgh_ctxt->loc_obd->obd_name, POSTID(&lir->lid_id.lgl_oi), rc); + if (rc == -ENOENT || rc == -ESTALE) { + /* After a server crash, a stub of index + * record in catlog could be kept, because + * plain log destroy + catlog index record + * deletion are not atomic. So we end up with + * an index but no actual record. Destroy the + * index and move on. */ + rc = llog_cat_cleanup(env, cat_llh, NULL, + rec->lrh_index); + } + RETURN(rc); } @@ -650,6 +661,17 @@ static int llog_cat_reverse_process_cb(const struct lu_env *env, CERROR("%s: cannot find handle for llog "DOSTID": %d\n", cat_llh->lgh_ctxt->loc_obd->obd_name, POSTID(&lir->lid_id.lgl_oi), rc); + if (rc == -ENOENT || rc == -ESTALE) { + /* After a server crash, a stub of index + * record in catlog could be kept, because + * plain log destroy + catlog index record + * deletion are not atomic. So we end up with + * an index but no actual record. Destroy the + * index and move on. */ + rc = llog_cat_cleanup(env, cat_llh, NULL, + rec->lrh_index); + } + RETURN(rc); }