Whamcloud - gitweb
LU-12616 obclass: fix MDS start/stop race 52/35652/5
authorAlexander Boyko <c17825@cray.com>
Tue, 30 Jul 2019 11:33:15 +0000 (07:33 -0400)
committerOleg Drokin <green@whamcloud.com>
Tue, 3 Sep 2019 05:09:36 +0000 (05:09 +0000)
commit3cce65712d94cffe8f1626545845b95b88aef672
tree5fbcdf603de4bc57ed8d7476019aecac3bd0c3b9
parent4ad4a6fcabf9018ff0891b1832c8e3cd53b1cfee
LU-12616 obclass: fix MDS start/stop race

The MDS unload happen when type of MDT has no reference.
The MDT drop it during obd_cleanup. So race window located
between obd_cleanup and server_stop_servers.
Lustre can lost MDS obd_device during server_start_targets
between MDS checking and taking the type reference, if another MDT
stops.

The patch takes one more reference for a MDT type at
server_start_targets, and put it at server_stop_servers.

This patch adds sanity test 278. It reproduces the next race
   started cleanup of MDT01
   started cleanup of MDT00
   finished cleanup of MDT00
   started MDT00 mount, checked MDS exist
   finished cleanup of MDT01, and cleanup of MDS also
   asserted during MDT00 initialization

Cray-bug-id: LUS-7275
Signed-off-by: Alexander Boyko <c17825@cray.com>
Change-Id: I9ae3bc2ec1d23c8d436f143d12e26209fdb6b083
Reviewed-on: https://review.whamcloud.com/35652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Zarochentsev <c17826@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/lustre_disk.h
lustre/include/obd_support.h
lustre/obdclass/obd_mount_server.c
lustre/tests/sanity.sh