Whamcloud - gitweb
Branch HEAD
Description: data loss for recently-modified files
Details : In some cases it is possible that recently written or created
files may not be written to disk in a timely manner (this should
normally be within 30s unless client IO load is very high).
The problem appears as zero-length files or files that are a
multiple of 1MB in size after a client crash or client eviction
that are missing data at the end of the file.
This problem is more likely to be hit on clients where files are
repeatedly created and unlinked in the same directory, clients
have a large amount of RAM, have many CPUs, the filesystem has
many OSTs, the clients are rebooted frequently, and/or the files
are not accessed by other nodes after being written.
The presence of the problem can be detected by looking at
/proc/sys/fs/inode-state. If the first number (nr_inodes) is
smaller than the second (nr_unused) then dirty files will not
be flushed automatically to disk. "sync; sleep 10" should be
run several times on the node before unmounting it to update
Lustre (this is also safe to run on nodes without this problem).
There is also a related kernel bug in the RHEL4 4 2.6.9 kernel
that can cause this same problem, so customers using that kernel
also need to update the kernel in addition to Lustre. In order
to properly fix this bug, the RHEL3 2.4.21 kernel is also updated.
It is normal that files written just before a client crash (less
than 30s) may not yet have been flushed to disk, even for local
filesystems.
i=green(original patch), i=shadow
b=12181, b=12203