Whamcloud - gitweb
LU-1239 ldlm: cascading client reconnects
It may happen that
- MDS is overloaded with enqueues, they consume all the threads on
MDS_REQUEST portal and waiting for a lock a client holds;
- that client tries to re-connect but MDS is out of threads and
re-connection fails;
- other clients are waiting for their enqueue completions, they try
to ping MDS if it is still alive, but despite the fact it is a HP-rpc,
there is no thread reserved for it. Thus, other clients get timed
out as well.
Ensure each service which handles HP-rpc has an extra thread reserved
for them; make MDS_CONNECT and OST_CONNECT HP-rpc.
Change-Id: I295aec6a2d2fb614d4b5f037068a3dcdda1a8b09
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andrew Perepechko <Andrew_Perepechko@xyratex.com>
Xyratex-bug-id: MRP-455
Reviewed-on: http://review.whamcloud.com/2355
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>