Whamcloud - gitweb
LU-1573 recovery: Avoid data corruption for DIO during FOFB 80/16680/10
authorParinay Kondekar <parinay.kondekar@seagate.com>
Thu, 1 Dec 2016 20:16:52 +0000 (01:46 +0530)
committerOleg Drokin <oleg.drokin@intel.com>
Fri, 3 Feb 2017 00:26:13 +0000 (00:26 +0000)
commit1d2fbade1b658db4386091e7938d9483f7aa4a05
treeca01ec26aaca1341bad9dd995dfffa9e693f3664
parent4bca07f115e6beada1462599cf362c3b84767576
LU-1573 recovery: Avoid data corruption for DIO during FOFB

When there is a userland app doing DIO and OST fails over,
obd->obd_no_transno is set to 1 & last_committed on server
is not sent to the client.  Thus client is not sure, if the
req is _committed_ to disk or not. So it removes the req
from resend queue and adds it to replay queue.

Now trans_no > last_committed, thus after reconnect, as a
part of recovery process request is replayed. Userland app,
refills the DIO buffer with different data, thus invalid data
is committed resulting in corruption.

This change avoids the client replay by dropping the reply to
the client rather than sending a reply without any transno.
This ensures the client will resend the RPC before returning
to userspace instead of putting it in the replay queue, and
thus avoids the corruption.

The test changes require replay-ost-single test_9 to be modifed
with an additional write to the file to increase the grants
available and avoid a sync write.

Seagate-Bug-Id: MRP-542, MRP-2418
Reviewed-on: http://es-gerrit.xyus.xyratex.com:8080/3358

Change-Id: Ia30783c99e6c16a0c7ab70841eb98ed75dba1de9
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@seagate.com>
Signed-off-by: Parinay Kondekar <parinay.kondekar@seagate.com>
Signed-off-by: Andrew Perepechko <andrew.perepechko@seagate.com>
Signed-off-by: James Nunez <james.a.nunez@intel.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@seagate.com>
Tested-by: Alexander Lezhoev <alexander.lezhoev@seagate.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@seagate.com>
Reviewed-on: https://review.whamcloud.com/16680
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Bob Glossman <bob.glossman@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
lustre/target/tgt_handler.c
lustre/tests/replay-ost-single.sh
lustre/tests/replay-single.sh