Whamcloud - gitweb
LU-5885 lfsck: deadlock when remove striped dir
There is potential deadlock race condition between removing
striped directory and namespace LFSCK. Consider the following
scenario:
1) The LFSCK thread obtained the master object firstly, at
that time, the master object has not been destroyed yet.
2) One RPC service thread destroyed the master and all its
slave objects (shards). Because the LFSCK is referencing
the master object, then the master object will be marked
as dying in RAM. On the other hand, the master object is
referencing all its slave objects, then all slave objects
will be marked as dying in RAM also.
3) The LFSCK thread tries to find some slave object with the
master object referenced. Then it will find that the slave
object is dying. According to the object visibility rules:
the object with dying flag cannot be returned to others.
So the LFSCK thread has to wait until the dying object has
been purged from RAM, then it can allocate a new object (with
the same FID) in RAM. Unfortunately, the LFSCK thread itself
is referencing the master object, and cause the master object
cannot be purged, then cause the slave object cannot be purged
also. So the LFSCK thread will fall into deadlock.
To resolve such trouble, the LFSCK should use non-blocked version
lu_object_find() to locate the slave object of the striped dir,
and return failure immediately (instead of wait) when it finds
dying (slave) object.
This patch also contorls the async pipeline depth between the
LFSCK main engine and the namespace assistant thread to avoid
potential RAM pressure.
Some other code adjustment to avoid potential data overflow
that may cause weird LFSCK statistics information.
Signed-off-by: Fan Yong <fan.yong@intel.com>
Change-Id: I00c601eca8ade5d2e4260c729463f7ecdba0ed53
Reviewed-on: http://review.whamcloud.com/12741
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>