Whamcloud - gitweb
LU-15847 tgt: reply always with the latest assigned transno 92/47492/3
authorMikhail Pershin <mpershin@whamcloud.com>
Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)
committerOleg Drokin <green@whamcloud.com>
Tue, 25 Oct 2022 17:23:55 +0000 (17:23 +0000)
In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/target/tgt_lastrcvd.c

index 6e11f8c..7f1abac 100644 (file)
@@ -1985,7 +1985,17 @@ int tgt_txn_stop_cb(const struct lu_env *env, struct thandle *th,
                if (tti->tti_mult_trans == 0) {
                        CDEBUG(D_HA, "More than one transaction %llu\n",
                               tti->tti_transno);
-                       RETURN(0);
+                       /**
+                        * if RPC handler sees unexpected multiple last_rcvd
+                        * updates with transno, then it is better to return
+                        * the latest transaction number to the client.
+                        * In that case replay may fail if part of operation
+                        * was committed and can't be re-applied easily. But
+                        * that is better than report the first transno, in
+                        * which case partially committed operation would be
+                        * considered as finished so never replayed causing
+                        * data loss.
+                        */
                }
                /* we need another transno to be assigned */
                tti->tti_transno = 0;