Whamcloud - gitweb
LU-1629 ldlm: Fix recovery messages when denying new clients
authorLi Wei <liwei@whamcloud.com>
Fri, 27 Jul 2012 04:18:02 +0000 (12:18 +0800)
committerOleg Drokin <green@whamcloud.com>
Thu, 2 Aug 2012 01:38:48 +0000 (21:38 -0400)
Console messages printed when denying new client connections during
target recovery show misleading client statistics.  For example, in
this case:

  Lustre: lustre-MDT0000: Denying connection for new client
  192.168.117.50@o2ib1 (at 11e711ab-a329-f07a-8312-6a40af7fc5a4),
  waiting for 0 clients in recovery for 2:38
  [...]
  Lustre: lustre-MDT0000: Recovery over after 5:00, of 112 clients 0
  recovered and 112 were evicted.

The MDT was actually waiting for all the 112 known clients to recover.
None had connected, however.  In addition, the client NID and UUID
seem to be in the wrong order.  This patch changes the first console
message to look like this:

  Lustre: lustre-MDT0000: Denying connection for new client
  939243e4-2f54-a96f-3cbb-9fcf55426e2e (at 0@lo), waiting for all 2
  known clients (1 recovered, 0 in progress, and 1 unseen) to
  recover in 0:05

Hopefully this new format will be a little bit more useful to users,
although the counters are (still) read in a racy way.

Change-Id: Iefda085602de7967d66892b8f3567561962078ab
Signed-off-by: Li Wei <liwei@whamcloud.com>
Reviewed-on: http://review.whamcloud.com/3485
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ldlm/ldlm_lib.c

index 290005e..f1d7877 100644 (file)
@@ -998,19 +998,24 @@ no_export:
         if (export == NULL) {
                 if (target->obd_recovering) {
                         cfs_time_t t;
-
-                        t = cfs_timer_deadline(&target->obd_recovery_timer);
-                        t = cfs_time_sub(t, cfs_time_current());
-                        t = cfs_duration_sec(t);
-                        LCONSOLE_WARN("%s: Denying connection for new client "
-                                      "%s (at %s), waiting for %d clients in "
-                                      "recovery for %d:%.02d\n",
-                                      target->obd_name,
-                                      libcfs_nid2str(req->rq_peer.nid),
-                                      cluuid.uuid,
-                                      cfs_atomic_read(&target-> \
-                                                      obd_lock_replay_clients),
-                                      (int)t / 60, (int)t % 60);
+                       int        c; /* connected */
+                       int        i; /* in progress */
+                       int        k; /* known */
+
+                       c = cfs_atomic_read(&target->obd_connected_clients);
+                       i = cfs_atomic_read(&target->obd_lock_replay_clients);
+                       k = target->obd_max_recoverable_clients;
+                       t = cfs_timer_deadline(&target->obd_recovery_timer);
+                       t = cfs_time_sub(t, cfs_time_current());
+                       t = cfs_duration_sec(t);
+                       LCONSOLE_WARN("%s: Denying connection for new client "
+                                     "%s (at %s), waiting for all %d known "
+                                     "clients (%d recovered, %d in progress, "
+                                     "and %d unseen) to recover in %d:%.02d\n",
+                                     target->obd_name, cluuid.uuid,
+                                     libcfs_nid2str(req->rq_peer.nid), k,
+                                     c - i, i, k - c, (int)t / 60,
+                                     (int)t % 60);
                         rc = -EBUSY;
                 } else {
 dont_check_exports: