Whamcloud - gitweb
LU-11243 lod: fix assertion and hang upon lod_add_device failure 50/34450/3
authorWang Shilong <wshilong@ddn.com>
Mon, 10 Dec 2018 05:45:33 +0000 (13:45 +0800)
committerOleg Drokin <green@whamcloud.com>
Mon, 1 Apr 2019 06:19:31 +0000 (06:19 +0000)
commit6c13ed6decc680092d3610518dc30aba40c83563
tree2086dc2874373d7f4ebff2b17fa824b0aa04206e
parentaf58b117ec0e252de641e05722768cad95bba44f
LU-11243 lod: fix assertion and hang upon lod_add_device failure

There are two problems:

See following assertion:

    lod_add_device() lustre-OSTe42a-osc-MDT0000:
                     can't set up pool, failed with -12
    osp_disconnect() ASSERTION( imp != ((void *)0) ) failed:
    osp_disconnect() LBUG
    CPU: 1 PID: 10059 Comm: llog_process_th

Problem is obd_disconnect() will cleanup @imp and set NULL.
 ->osp_obd_disconnect
    ->class_manual_cleanup
       ->class_process_config
          ->class_cleanup
             ->obd_precleanup
                ->osp_device_fini
                   ->client_obd_cleanup

While ldo_process_config() will try to access @imp again:
 ->ldo_process_config
    ->osp_shutdown
       ->osp_disconnect
          ->LASSERT(imp != NULL)

Another problem is if we failed before obd_connect().
we will hang on with mount:
 ->ldo_process_config
    ->osp_shutdown
       ->osp_disconnect
          ->ptlrpc_disconnect_import
             ->rc = l_wait_event(imp->imp_recovery_waitq,
                                 !ptlrpc_import_in_recovery(imp), &lwi);

Since connect is not called, imp state will stay LUSTRE_IMP_NEW.
Fix this by check whether we are in recovery properly, only consider
we are in recovery if we are in following states:

 LUSTRE_IMP_CONNECTING = 4,
 LUSTRE_IMP_REPLAY     = 5,
 LUSTRE_IMP_REPLAY_LOCKS = 6,
 LUSTRE_IMP_REPLAY_WAIT  = 7,
 LUSTRE_IMP_RECOVER    = 8,

Lustre-change: https://review.whamcloud.com/32994
Lustre-commit: f28353b3d810cfbec018a263556ceac84ab9413e

Change-Id: I2113b95a421bae7117f3057d5f0fdf78db95caa3
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/34450
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/lod/lod_lov.c
lustre/ptlrpc/recover.c