Whamcloud - gitweb
LU-2889 ptlrpc: Race between start and stop service threads
authorHiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Tue, 7 May 2013 03:22:32 +0000 (12:22 +0900)
committerOleg Drokin <oleg.drokin@intel.com>
Thu, 13 Jun 2013 23:28:03 +0000 (19:28 -0400)
When ptlrpc_start_thread fails to create a new thread, it will
finalize and free a struct ptlrpc_thread created and used here.
Considering this, it can be a problem when ptlrpc_svcpt_stop_thread
is driven and handles the struct ptlrpc_thread right after or right
before failure of cfs_create_thread. Because this situation let
the both of ptlrpc_start_thread and ptlrpc_svcpt_stop_threads
access the freed ptlrpc_thread and cause OS panic. Or, it may
happen that ptlrpc_svcpt_stop_threads waits forever holding an
already-freed waitq.

This patch adds an error handling into ptlrpc_start_thread to fix
this problem.

Signed-off-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Change-Id: Ic25f40f8650c65e21abe4df127fc20f9c7d0fcd1
Reviewed-on: http://review.whamcloud.com/5552
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Liang Zhen <liang.zhen@intel.com>
Reviewed-by: Nikitas Angelinas <nikitas_angelinas@xyratex.com>
Reviewed-by: Keith Mannthey <keith.mannthey@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/ptlrpc/service.c

index 7e176c0..1dbdc88 100644 (file)
@@ -2865,11 +2865,19 @@ int ptlrpc_start_thread(struct ptlrpc_service_part *svcpt, int wait)
                CERROR("cannot start thread '%s': rc %d\n",
                       thread->t_name, rc);
                spin_lock(&svcpt->scp_lock);
-               cfs_list_del(&thread->t_link);
                --svcpt->scp_nthrs_starting;
-               spin_unlock(&svcpt->scp_lock);
-
-                OBD_FREE(thread, sizeof(*thread));
+               if (thread_is_stopping(thread)) {
+                       /* this ptlrpc_thread is being hanled
+                        * by ptlrpc_svcpt_stop_threads now
+                        */
+                       thread_add_flags(thread, SVC_STOPPED);
+                       cfs_waitq_signal(&thread->t_ctl_waitq);
+                       spin_unlock(&svcpt->scp_lock);
+               } else {
+                       cfs_list_del(&thread->t_link);
+                       spin_unlock(&svcpt->scp_lock);
+                       OBD_FREE_PTR(thread);
+               }
                 RETURN(rc);
         }