Whamcloud - gitweb
LU-15847 tgt: reply always with the latest assigned transno
authorMikhail Pershin <mpershin@whamcloud.com>
Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)
committerAndreas Dilger <adilger@whamcloud.com>
Mon, 31 Oct 2022 04:08:10 +0000 (04:08 +0000)
In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Lustre-commit: 4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
lustre/target/tgt_lastrcvd.c

index f4f243e..19fe244 100644 (file)
@@ -1982,7 +1982,17 @@ int tgt_txn_stop_cb(const struct lu_env *env, struct thandle *th,
                if (tti->tti_mult_trans == 0) {
                        CDEBUG(D_HA, "More than one transaction %llu\n",
                               tti->tti_transno);
-                       RETURN(0);
+                       /**
+                        * if RPC handler sees unexpected multiple last_rcvd
+                        * updates with transno, then it is better to return
+                        * the latest transaction number to the client.
+                        * In that case replay may fail if part of operation
+                        * was committed and can't be re-applied easily. But
+                        * that is better than report the first transno, in
+                        * which case partially committed operation would be
+                        * considered as finished so never replayed causing
+                        * data loss.
+                        */
                }
                /* we need another transno to be assigned */
                tti->tti_transno = 0;