Whamcloud - gitweb
LU-769 ptlrpc: Do not miss pending signals in ptlrpc_set wait
authorOleg Drokin <green@whamcloud.com>
Mon, 7 Nov 2011 03:34:41 +0000 (22:34 -0500)
committerOleg Drokin <green@whamcloud.com>
Fri, 11 Nov 2011 17:09:40 +0000 (12:09 -0500)
conf_sanity test 23a highlighted a problem in ptlrpc_set_wait logic,
if we enter there with a signal pending and the import is not FULL,
there is no way to interrupt such a set because we block signals
all the time. Enabling signals all the time is not an option either.
Waiting until import reconnects is questionable too since it might
never come up after all (like in the test 23a).
So for the solution we will just manually mark the set as interrupted
after the initial wait.

Change-Id: Iaa3e356e971b4f75fd7f21cc579c85f7487719a0
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: http://review.whamcloud.com/1657
Reviewed-by: Jinshan Xiong <jinshan.xiong@whamcloud.com>
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Mikhail Pershin <tappro@whamcloud.com>
lustre/ptlrpc/client.c

index 483ca5d..e9979df 100644 (file)
@@ -1907,7 +1907,7 @@ void ptlrpc_interrupted_set(void *data)
         cfs_list_t *tmp;
 
         LASSERT(set != NULL);
-        CERROR("INTERRUPTED SET %p\n", set);
+        CDEBUG(D_RPCTRACE, "INTERRUPTED SET %p\n", set);
 
         cfs_list_for_each(tmp, &set->set_requests) {
                 struct ptlrpc_request *req =
@@ -2024,6 +2024,23 @@ int ptlrpc_set_wait(struct ptlrpc_request_set *set)
 
                 rc = l_wait_event(set->set_waitq, ptlrpc_check_set(NULL, set), &lwi);
 
+                /* LU-769 - if we ignored the signal because it was already
+                 * pending when we started, we need to handle it now or we risk
+                 * it being ignored forever */
+                if (rc == -ETIMEDOUT && !lwi.lwi_allow_intr &&
+                    cfs_signal_pending()) {
+                        cfs_sigset_t blocked_sigs =
+                                           cfs_block_sigsinv(LUSTRE_FATAL_SIGS);
+
+                        /* In fact we only interrupt for the "fatal" signals
+                         * like SIGINT or SIGKILL. We still ignore less
+                         * important signals since ptlrpc set is not easily
+                         * reentrant from userspace again */
+                        if (cfs_signal_pending())
+                                ptlrpc_interrupted_set(set);
+                        cfs_block_sigs(blocked_sigs);
+                }
+
                 LASSERT(rc == 0 || rc == -EINTR || rc == -ETIMEDOUT);
 
                 /* -EINTR => all requests have been flagged rq_intr so next