chgrp on a client triggers lod_sync() which in turn loops over OST/MDT
targets with dt_sync(). dt_sync() fails with -ENOTCONN when targets
have been deactivated (ie. set to active=0). The client retries
infinitely causing the client process to hang and considerably MDS
network traffic, load, and disk i/o.
the fix is to not attempt dt_sync() to ost/mdt targets that have been
deactivated and also (because of possible races) to ignore connection
errors.
tested with Lustre 2.10.4.
Signed-off-by: Robin Humble <plaguedbypenguins@gmail.com>
Change-Id: I617509cf7944541489f4fd9762c233b771132165
Reviewed-on: https://review.whamcloud.com/32964
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>