Whamcloud - gitweb
LU-12221 statahead: sa_handle_callback get lli_sa_lock earlier 60/34760/5
authorAnn Koehler <amk@cray.com>
Thu, 25 Apr 2019 19:02:19 +0000 (14:02 -0500)
committerOleg Drokin <green@whamcloud.com>
Sat, 25 May 2019 04:53:41 +0000 (04:53 +0000)
sa_handle_callback() must acquire the lli_sa_lock before calling
sa_has_callback(), which checks whether the sai_interim_entries list is
empty. Acquiring the lock avoids a race between an rpc handler
executing ll_statahead_interpret and the separate ll_statahead_thread.

When a client receives a stat request response, ll_statahead_interpret
increments sai_replied and if needed adds the request to the
sai_interim_entries list for instantiating by the ll_statahead_thread.
ll_statahead_interpret() holds the lli_sa_lock while doing this work.
On process termination, ll_statahead_thread() waits for sai_sent to
equal sai_replied and then removes any entries in the
sai_interim_entries list. It does not get the lli_sa_lock until
it determines that there are sai_interim_entries to process.

A bug occurs on weak memory model processors that do not guarantee
that both ll_statahead_interpret updates done under the lock are
visible to other processors at the same time. For example, on ARM
nodes, an ll_statahead_thread can read the updated value of
sai_replied and a non-updated value of sai_interim_lists.
ll_statahead_thread then thinks all replies have been received (true)
and all sai_interim_entries have been processed false). Later, the
update to sai_interim_entries becomes visible leaving the
ll_statahead_info struct in an unexpected state.

The bad state eventually triggers the LBUG:
statahead.c:477:ll_sai_put()) ASSERTION( !sa_has_callback(sai) )

Cray-bug-id: LUS-6243
Signed-off-by: Ann Koehler <amk@cray.com>
Change-Id: I9fc6bd664188d9ac7c26b1b6965e2b99abf5e948
Reviewed-on: https://review.whamcloud.com/34760
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Jenkins
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>

index 5379440..0b75ccc 100644 (file)
@@ -692,21 +692,19 @@ static void sa_handle_callback(struct ll_statahead_info *sai)
        lli = ll_i2info(sai->sai_dentry->d_inode);
+       spin_lock(&lli->lli_sa_lock);
        while (sa_has_callback(sai)) {
                struct sa_entry *entry;
-               spin_lock(&lli->lli_sa_lock);
-               if (unlikely(!sa_has_callback(sai))) {
-                       spin_unlock(&lli->lli_sa_lock);
-                       break;
-               }
                entry = list_entry(sai->sai_interim_entries.next,
                                   struct sa_entry, se_list);
                sa_instantiate(sai, entry);
+               spin_lock(&lli->lli_sa_lock);
+       spin_unlock(&lli->lli_sa_lock);