Whamcloud - gitweb
LU-12206 mdt: mdt_init0 failure handling 24/34724/3
authorVladimir Saveliev <c17830@cray.com>
Fri, 19 Apr 2019 09:33:12 +0000 (12:33 +0300)
committerOleg Drokin <green@whamcloud.com>
Sat, 25 May 2019 04:57:54 +0000 (04:57 +0000)
commitd1b5146eda4fdaa77dd44bc2195435bda0f83a94
tree14fcd56e97dbfc58ce4c111fed5a241ccdaecc06
parent9334f1d51249c186e15b42a1717312d03385153a
LU-12206 mdt: mdt_init0 failure handling

When mdt_init0 fails it has to wait until zombie workqueue has all
disconnected exports destroyed before mdt_device_alloc will free the
mdt_device. Otherwise, zombie workqueue refers to freed mdt_device
via:
  general protection fault: 0000 [#1] SMP
  ..
  Workqueue: obd_zombid obd_zombie_exp_cull [obdclass]
  ..
  [<ffffffffc08829c5>] tgt_client_free+0x1e5/0x3c0 [ptlrpc]
  [<ffffffffc0ec2327>] mdt_destroy_export+0x57/0x200 [mdt]
  [<ffffffffc05bf20e>] class_export_destroy+0xee/0x490 [obdclass]
  [<ffffffffc05bf5c5>] obd_zombie_exp_cull+0x15/0x20 [obdclass]
  [<ffffffff93ab1d2f>] process_one_work+0x17f/0x440

- mdt_init0
  call to target_recovery_fini is moved so that it is called on every
  failure after successful tgt_init.

  obd_zombie_barrier is to be called after
  target_recovery_fini->class_disconnect_exports

  obd->obd_fail is set so that mdt_export_cleanup->tgt_client_del did
  not clear client's slot in last_rcvd in case of server start failure

- mdt_quota_init
  class_manual_clean does class_detach, goto is added to avoid
  repeated call to class_detach

- qmt_device_init0
  start qmt rebalance thread with SVC_STARTING flag so that
  qmt_start_reba_thread waited until the thread has started.
  Otherwise, qmt_device may get freed before qmt rebalance thread is
  stopped

Tests for failures during mdt_init0 are added
- conf-sanity.sh:test_5i leads to general protection fault
- conf-sanity.sh:test_5h causes
  rmmod: ERROR: Module mdt is in use

Cray-bug-id: LUS-2403
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Test-Parameters: trivial testlist=conf-sanity envdefinitions=ONLY=5
Change-Id: Ic9dc9e167f6c2e47a5f97e59b5bd26c5231c23ce
Reviewed-on: https://review.whamcloud.com/34724
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/include/obd_support.h
lustre/mdt/mdt_handler.c
lustre/quota/qmt_dev.c
lustre/quota/qmt_lock.c
lustre/tests/conf-sanity.sh