Whamcloud - gitweb
LU-4365 quota: wait for global lock cancel 86/8586/3
authorNiu Yawei <yawei.niu@intel.com>
Mon, 16 Dec 2013 08:08:08 +0000 (03:08 -0500)
committerOleg Drokin <oleg.drokin@intel.com>
Tue, 17 Dec 2013 05:38:28 +0000 (05:38 +0000)
In qsd_qtype_fini(), we'd wait for the global lock cancel done.

Test-Parameters: envdefinitions=SLOW=yes,ENABLE_QUOTA=yes \
mdtfilesystemtype=zfs mdsfilesystemtype=zfs ostfilesystemtype=zfs \
testlist=recovery-small

Signed-off-by: Niu Yawei <yawei.niu@intel.com>
Change-Id: Id235c7d6e07f5ce436655a6d5382e4c8c161fa3b
Reviewed-on: http://review.whamcloud.com/8586
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/quota/qsd_lib.c

index b1a29b1..f8b3b5b 100644 (file)
@@ -273,6 +273,7 @@ static void qsd_qtype_fini(const struct lu_env *env, struct qsd_instance *qsd,
                           int qtype)
 {
        struct qsd_qtype_info   *qqi;
+       int repeat = 0;
        ENTRY;
 
        if (qsd->qsd_type_array[qtype] == NULL)
@@ -290,6 +291,29 @@ static void qsd_qtype_fini(const struct lu_env *env, struct qsd_instance *qsd,
                qqi->qqi_site = NULL;
        }
 
+       /* The qqi may still be holding by global locks which are being
+        * canceled asynchronously (LU-4365), see the following steps:
+        *
+        * - On server umount, we try to clear all quota locks first by
+        *   disconnecting LWP (which will invalidate import and cleanup
+        *   all locks on it), however, if quota reint process is holding
+        *   the global lock for reintegration at that time, global lock
+        *   will fail to be cleared on LWP disconnection.
+        *
+        * - Umount process goes on and stops reint process, the global
+        *   lock will be dropped on reint process exit, however, the lock
+        *   cancel in done in asynchronous way, so the
+        *   qsd_glb_blocking_ast() might haven't been called yet when we
+        *   get here.
+        */
+       while (cfs_atomic_read(&qqi->qqi_ref) > 1) {
+               CDEBUG(D_QUOTA, "qqi reference count %u, repeat: %d\n",
+                      cfs_atomic_read(&qqi->qqi_ref), repeat);
+               repeat++;
+               cfs_schedule_timeout_and_set_state(TASK_INTERRUPTIBLE,
+                                                  cfs_time_seconds(1));
+       }
+
        /* by now, all qqi users should have gone away */
        LASSERT(cfs_atomic_read(&qqi->qqi_ref) == 1);
        lu_ref_fini(&qqi->qqi_reference);