From: Amir Shehata Date: Fri, 31 Oct 2014 00:50:15 +0000 (-0700) Subject: LU-5568 lnet: fix kernel crash when network failed to start X-Git-Tag: 2.6.91~55 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=66e9055b23433bd0aa8da5e49f3b665fb1b95532;p=fs%2Flustre-release.git LU-5568 lnet: fix kernel crash when network failed to start When loading Lustre modules without proper network configuration, it always hit the following kernel panic: LNetError: 105-4: Error -100 starting up LNI tcp LNetError: 2145:0:(api-ni.c:823:lnet_unprepare()) ASSERTION( list_empty(&the_lnet.ln_nis) ) failed: LNetError: 2145:0:(api-ni.c:823:lnet_unprepare()) LBUG Pid: 2145, comm: modprobe x0aCall Trace: [] libcfs_debug_dumpstack+0x53/0x80 [libcfs] [] lbug_with_loc+0x45/0xc0 [libcfs] [] lnet_unprepare+0x297/0x340 [lnet] [] LNetNIInit+0x25c/0x3e0 [lnet] [] ? put_online_cpus+0x56/0x80 [] ? init_module+0x0/0x1000 [ptlrpc] [] ptlrpc_ni_init+0x2c/0x1a0 [ptlrpc] [] ? init_module+0x0/0x1000 [ptlrpc] [] ptlrpc_init_portals+0x11/0xf0 [ptlrpc] [] ? init_module+0x0/0x1000 [ptlrpc] [] init_module+0x1c4/0x1000 [ptlrpc] [] do_one_initcall+0xe2/0x190 [] load_module+0x129b/0x1a90 [] ? ddebug_dyndbg_module_param_cb+0x0/0x60 [] ? copy_module_from_fd.isra.43+0x53/0x150 [] SyS_finit_module+0xa6/0xd0 [] system_call_fastpath+0x16/0x1b ... This is because in lnet_startup_lndnis(), we may add list items to @the_lnet.ln_nis and @the_lnet.ln_nis_cpt before it failed. But in lnet_startup_lndis() failure path,it did not cleanup list thus causing assertion in lnet_unprepare(). Fix the assertion by cleaning up using lnet_shutdown_lndnis() if the startup fails. In a future enahancement the ni startup API will be modified to cleanup after itself in case of failure. Signed-off-by: Amir Shehata Change-Id: Ia344fd7c0f24c87b654554dda9e57bf5525edc85 Reviewed-on: http://review.whamcloud.com/12512 Tested-by: Jenkins Tested-by: Maloo Reviewed-by: Liang Zhen Reviewed-by: Isaac Huang Reviewed-by: Oleg Drokin --- diff --git a/lnet/lnet/api-ni.c b/lnet/lnet/api-ni.c index 74d5b03..4fd4133 100644 --- a/lnet/lnet/api-ni.c +++ b/lnet/lnet/api-ni.c @@ -1447,6 +1447,10 @@ lnet_shutdown_lndni(__u32 net) return 0; } +/* + * Callers of lnet_startup_lndnis need to clean up using + * lnet_shutdown_lndnis if startup fails + */ static int lnet_startup_lndnis(struct list_head *nilist, __s32 peer_timeout, __s32 peer_cr, __s32 peer_buf_cr, __s32 credits, @@ -1794,7 +1798,7 @@ LNetNIInit(lnet_pid_t requested_pid) rc = lnet_startup_lndnis(&net_head, -1, -1, -1, -1, &ni_count); if (rc != 0) - goto failed1; + goto failed2; if (the_lnet.ln_eq_waitni != NULL && ni_count > 1) { lnd_type = the_lnet.ln_eq_waitni->ni_lnd->lnd_type;