Whamcloud - gitweb
LU-8562 osp: osp_precreate_thread gets stuck after disconnect 58/24758/3
authorNed Bass <bass6@llnl.gov>
Sat, 7 Jan 2017 01:43:47 +0000 (17:43 -0800)
committerOleg Drokin <oleg.drokin@intel.com>
Tue, 24 Jan 2017 05:21:26 +0000 (05:21 +0000)
osp_precreate_thread() can get stuck because d->opd_got_disconnected
never gets reset. When opd_got_disconnected is set,
osp_precreate_cleanup_orphans() returns early with EAGAIN and can't
clear d->opd_pre_recovering. And because d->opd_pre_recovering can't
be cleared we always break out of the while loop where
d->opd_got_disconnected normally gets reset. So
osp_precreate_cleanup_orphans() is stuck always failing.

Signed-off-by: Ned Bass <bass6@llnl.gov>
Change-Id: I0b4f4e2e55e7a8d7ffae633a4d3c578b4a484ae2
Reviewed-on: https://review.whamcloud.com/24758
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@seagate.com>
Reviewed-by: Alex Zhuravlev <alexey.zhuravlev@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
lustre/osp/osp_precreate.c

index 8570fa6..f598387 100644 (file)
@@ -1162,7 +1162,8 @@ static int osp_precreate_thread(void *_arg)
                 */
                while (osp_precreate_running(d)) {
                        if (d->opd_pre_recovering &&
-                           d->opd_imp_connected)
+                           d->opd_imp_connected &&
+                           !d->opd_got_disconnected)
                                break;
                        l_wait_event(d->opd_pre_waitq,
                                     !osp_precreate_running(d) ||