Whamcloud - gitweb
LU-2591 lov: race between ptlrpc_rcv and umount/umount thread
The race which is refered here happens in the following scenario
1) mount runs but fails to communicate with some OSTs. Then the
import objects which represent the OSTs are registered to
a pinger list.
2) pinger succeeds to communicate with, at least, ONE OST. Then
ptlrpc_rcv calls lov_set_osc_active to activate the OST and
holds lov_refcount.
3) For some reason ... possibly mount finally fails or umount runs,
ll_put_super is called
4) ll_put_super tries to disconnect all the OSTs with
lov_disconnect and this func calls lov_del_target to set all
the OSC's target->ltd_reap flags in order for lov_putref to
handle all of them.
5) ptlrpc_rcv thread puts lov_refcount and if lov_refcount becomes
0 here, the thread has to disconnect all the OSCs whose ltd->reap
has been set.
6) Some OSCs' imports have still been LUSTRE_IMP_CONNECTING state
because of (2), so ptlrpc_rcv thread has to wait for these
import states to be changed to non-recovery states, such as FULL,
CLOSED or DISCON at ptlrpc_disconnect_import.
Now that ptlrpc_rcv thread is waiting for the import states to be
changed to non-recovery states but ptlrpc_rcv is the one who is
supposed to change a recovery state to a non-recovery state, So
ptlrpc_rcv must hung, And mount/umount thread which has called
ll_put_super has to also wait for changing state at
ptlrpc_disconnect_import, so that thread must hung too.
Signed-off-by: Hiroya Nozaki <nozaki.hiroya@jp.fujitsu.com>
Change-Id: I1d74967b883f079aafd11454dca44960a8f7a3f2
Reviewed-on: http://review.whamcloud.com/5527
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>