git://git.whamcloud.com - fs/lustre-release.git/commit

author	Vitaly Fertman <vitaly_fertman@xyratex.com>
	Tue, 26 Jun 2012 11:37:47 +0000 (15:37 +0400)
committer	Oleg Drokin <green@whamcloud.com>
	Thu, 5 Jul 2012 18:26:42 +0000 (14:26 -0400)
commit	e976ee72602477e54d26693aaeb84709ea5fd38f
tree	208eb2e54973695c82840b66ebdd679ec84fb330	tree \| snapshot
parent	7358bc93e63ca3eff1cde4f73d0e1d81896b1eaa	commit \| diff

LU-1239 ldlm: cascading client reconnects

It may happen that
- MDS is overloaded with enqueues, they consume all the threads on
  MDS_REQUEST portal and waiting for a lock a client holds;
- that client tries to re-connect but MDS is out of threads and
  re-connection fails;
- other clients are waiting for their enqueue completions, they try
  to ping MDS if it is still alive, but despite the fact it is a HP-rpc,
  there is no thread reserved for it. Thus, other clients get timed
  out as well.

Ensure each service which handles HP-rpc has an extra thread reserved
for them; make MDS_CONNECT and OST_CONNECT HP-rpc.

Change-Id: I295aec6a2d2fb614d4b5f037068a3dcdda1a8b09
Signed-off-by: Vitaly Fertman <vitaly_fertman@xyratex.com>
Reviewed-by: Alexey Lyashkov <alexey_lyashkov@xyratex.com>
Reviewed-by: Andrew Perepechko <Andrew_Perepechko@xyratex.com>
Xyratex-bug-id: MRP-455
Reviewed-on: http://review.whamcloud.com/2355
Tested-by: Hudson
Tested-by: Maloo <whamcloud.maloo@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

lustre/include/lustre_net.h		diff \| blob \| history
lustre/ldlm/ldlm_lockd.c		diff \| blob \| history
lustre/mdt/mdt_handler.c		diff \| blob \| history
lustre/ost/ost_handler.c		diff \| blob \| history
lustre/ptlrpc/events.c		diff \| blob \| history
lustre/ptlrpc/service.c		diff \| blob \| history