LU-12678 lnet: fix small race in unloading klnd modules. 53/36853/4
authorMr NeilBrown <neilb@suse.de>
Sat, 18 Jan 2020 13:48:35 +0000 (08:48 -0500)
committerOleg Drokin <green@whamcloud.com>
Tue, 28 Jan 2020 06:03:35 +0000 (06:03 +0000)
Reference counting of klnd modules is handled by the module itself.
Currently, it is possible for a module to be completely unloaded
between the time when the module called module_put(), and when
it subsequently returns from the function that makes that call.
During this time there may be one or two instructions to execute,
and if the module is unmapped before they are executed, an
exception will result.

The module unload will call lnet_unregister_lnd() which takes
the_lnet.ln_lnd_mutex, so module unload cannot complete while
that is held.  lnd_startup is called with this mutex held to
avoid any races, but lnd_shutdown is not.  Adding that
protection will close the race.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I98036ef2fc939101d085bbd6d0c76a29b848ee26
Reviewed-on: https://review.whamcloud.com/36853
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

index 966f38b..399acdb 100644 (file)
@@ -2062,7 +2062,14 @@ lnet_clear_zombies_nis_locked(struct lnet_net *net)
                islo = ni->ni_net->net_lnd->lnd_type == LOLND;
+               /* Holding the mutex makes it safe for lnd_shutdown
+                * to call module_put(). Module unload cannot finish
+                * until lnet_unregister_lnd() completes, and that
+                * requires the mutex.
+                */
+               mutex_lock(&the_lnet.ln_lnd_mutex);
+               mutex_unlock(&the_lnet.ln_lnd_mutex);
                if (!islo)
                        CDEBUG(D_LNI, "Removed LNI %s\n",
                        CDEBUG(D_LNI, "Removed LNI %s\n",