Whamcloud - gitweb
b=2254
At truncate time, ext3 is zeroing indirect blocks and writing a
transaction to the journal. At some point, perhaps when that
transaction commits, it gives the dirty buffer for that indirect block
to the buffer cache to write to disk. Meanwhile, having marked that
block as unused, it reallocates it to us as a normal data block into
which IOR puts some data.
The obdfilter writes that block with brw_kiovec, which writes
immediately to the disk with no regard for what data might be in the
buffer cache. Shortly thereafter, the buffer cache writes the block
of zeroes over top our valuable data.
The correct fix is to modify our special ext3 block allocation code to
look in the buffer cache for us, and discard any pending writes to the
newly-allocated blocks, much like the direct I/O code does. As a
workaround, for kernels which do not yet have this change, I added
some code to the obdfilter to do this after the call to
ext3_map_inode_page returns.
This introduces kernel version 32, but doesn't force an upgrade. I
updated the kernel patches for 2.4.18 and 2.4.20, but not 2.6. I also:
- tested the ext3_map_inode_page change on vanilla-2.4.20
- tested the workaround change on chaos-2.4.18
- compile-tested a version 32 chaos-2.4.18 kernel