tbd Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.6.1
+ * Support for kernels:
+ 2.4.21-47.0.1.EL (RHEL 3)
+ 2.6.5-7.283 (SLES 9)
+ 2.6.9-42.0.10.EL (RHEL 4)
+ 2.6.12.6 vanilla (kernel.org)
+ 2.6.16.27-0.9 (SLES 10)
+ * Client support for unpatched kernels:
+ (see https://mail.clusterfs.com/wikis/lustre/PatchlessClient)
+ 2.6.16 - 2.6.19 vanilla (kernel.org)
+ 2.6.9-42.0.8.EL (RHEL 4)
+ * Recommended e2fsprogs version: 1.39.cfs6
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel.
+ * bug fixes
+
+Severity : normal
+Frequency : liblustre clients only
+Bugzilla : 12229
+Description: getdirentries does not give error when run on compute nodes
+Details : getdirentries does not fail when the size specified as an argument
+ is too small to contain at least one entry
+
+Severity : enhancement
+Bugzilla : 11548
+Description: Add LNET router traceability for debug purposes
+Details : If a checksum failure occurs with a router as part of the
+ IO path, the NID of the last router that forwarded the bulk data
+ is printed so it can be identified.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 11315
+Description: OST "spontaneously" evicts client; client has imp_pingable == 0
+Details : Due to a race condition, liblustre clients were occasionally
+ evicted incorrectly.
+
+Severity : enhancement
+Bugzilla : 10997
+Description: lfs setstripe use optional parameters instead of postional
+ parameters.
+
+Severity : enhancement
+Bugzilla : 10651
+Description: Nanosecond timestamp support for ldiskfs
+Details : The on-disk ldiskfs filesystem has added support for nanosecond
+ resolution timestamps. There is not yet support for this at
+ the Lustre filesystem level.
+
+Severity : normal
+Frequency : during server recovery
+Bugzilla : 11203
+Description: MDS failing to send precreate requests due to OSCC_FLAG_RECOVERING
+Details : request with rq_no_resend flag not awake l_wait_event if they get a
+ timeout.
+
+Severity : minor
+Frequency : nfs export on patchless client
+Bugzilla : 11970
+Description: connectathon hang when test nfs export over patchless client
+Details : Disconnected dentry cannot be found with lookup, so we do not need
+ to unhash it or make it invalid
+
+Bugzilla : 11757
+Description: fix llapi_lov_get_uuids() to allow many OSTs to be returned
+Details: : Change llapi_lov_get_uuids() to read the UUIDs from /proc instead
+ of using an ioctl. This allows lfsck for > 160 OSTs to succeed.
+
+Severity : minor
+Frequency : rare
+Bugzilla : 11546
+Description: open req refcounting wrong on reconnect
+Details : If reconnect happened between getting open reply from server and
+ call to mdc_set_replay_data in ll_file_open, we will schedule
+ replay for unreferenced request that we are about to free.
+ Subsequent close will crash in variety of ways.
+ Check that request is still eligible for replay in
+ mdc_set_replay_data().
+
+Severity : minor
+Frequency : rare
+Bugzilla : 11512
+Description: disable writes to filesystem when reading health_check file
+Details : the default for reading the health_check proc file has changed
+ to NOT do a journal transaction and write to disk, because this
+ can cause reads of the /proc file to hang and block HA state
+ checking on a healthy but otherwise heavily loaded system. It
+ is possible to return to the previous behaviour during configure
+ with --enable-health-write.
+
+--------------------------------------------------------------------------------
+
+2007-05-03 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.6.0.1
+ * bug fixes
+
+Severity : normal
+Frequency : on some architectures
+Bugzilla : 12404
+Description: 1.6 client sometimes fails to mount from a 1.4 MDT
+Details : Uninitialized flags sometimes cause configuration commands to
+ be skipped.
+
+--------------------------------------------------------------------------------
+
+2007-04-19 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.6.0
* CONFIGURATION CHANGE. This version of Lustre WILL NOT
INTEROPERATE with older versions automatically. In many cases a
this release. See https://mail.clusterfs.com/wikis/lustre/MountConf
for details.
* Support for kernels:
- 2.6.9-42.0.10.EL (RHEL 4)
- 2.6.5-7.283 (SLES 9)
2.4.21-47.0.1.EL (RHEL 3)
+ 2.6.5-7.283 (SLES 9)
+ 2.6.9-42.0.10.EL (RHEL 4)
2.6.12.6 vanilla (kernel.org)
2.6.16.27-0.9 (SLES10)
* Client support for unpatched kernels:
(see https://mail.clusterfs.com/wikis/lustre/PatchlessClient)
2.6.16 - 2.6.19 vanilla (kernel.org)
2.6.9-42.0.8EL (RHEL 4)
- * Recommended e2fsprogs version: 1.39.cfs2-0
+ * Recommended e2fsprogs version: 1.39.cfs6
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel
* bug fixes
Severity : enhancement
Description: startup order invariance
Details : MDTs and OSTs can be started in any order. Clients only
require the MDT to complete startup.
-
+
Severity : enhancement
Bugzilla : 4899
Description: parallel, asynchronous orphan cleanup
Details : stripe assignments are now made based on ost space available,
ost previous usage, and OSS previous usage, in order to try
to optimize storage space and networking resources.
-
+
Severity : enhancement
Bugzilla : 4226
Description: Permanently set tunables
Details : All writable /proc/fs/lustre tunables can now be permanently
set on a per-server basis, at mkfs time or on a live
system.
-
+
Severity : enhancement
Bugzilla : 10547
Description: Lustre message v2
Details : Clients can be started with a list of OSTs that should be
declared "inactive" for known non-responsive OSTs.
+Severity : normal
+Bugzilla : 12123
+Description: ENOENT returned for valid filehandle during dbench.
+Details : Check if a directory has children when invalidating dentries
+ associated with an inode during lock cancellation. This fixes
+ an incorrect ENOENT sometimes seen for valid filehandles during
+ testing with dbench.
+
Severity : minor
Frequency : SFS test only (otherwise harmless)
Bugzilla : 6062
Description: SPEC SFS validation failure on NFS v2 over lustre.
Details : Changes the blocksize for regular files to be 2x RPC size,
and not depend on stripe size.
-
+
+Severity : enhancement
+Bugzilla : 10088
+Description: fine-grained SMP locking inside DLM
+Details : Improve DLM performance on SMP systems by removing the single
+ per-namespace lock and replace it with per-resource locks.
+
+Severity : enhancement
+Bugzilla : 9332
+Description: don't hold multiple extent locks at one time
+Details : To avoid client eviction during large writes, locks are not
+ held on multiple stripes at one time or for very large writes.
+ Otherwise, clients can block waiting for a lock on a failed OST
+ while holding locks on other OSTs and be evicted.
+
Severity : enhancement
Bugzilla : 9293
Description: Multiple MD RPCs in flight.
Description: per-client statistics on server
Details : Add ldlm and operations statistics for each client in
/proc/fs/lustre/mds|obdfilter/*/exports/
-
+
Severity : enhancement
Bugzilla : 22486
Description: improved MDT statistics
Details : Add detailed MDT operations statistics in
/proc/fs/lustre/mds/*/stats
-
+
Severity : enhancement
Bugzilla : 10968
Description: VFS operations stats
Description: Fix client-side osc byte counters
Details : The osc read/write byte counters in
/proc/fs/lustre/osc/*/stats are now working
-
+
Severity : minor
Frequency : always as root on SLES
Bugzilla : 10667
Description: Failure of copying files with lustre special EAs.
Details : Client side always return success for setxattr call for lustre
special xattr (currently only "trusted.lov").
-
+
Severity : minor
Frequency : always
Bugzilla : 10345
Bugzilla : 11229
Description: Easy OST removal
Details : OSTs can be permanently deactivated with e.g. 'lctl
- conf_param lustre-OST0001.osc.active=0'
+ conf_param lustre-OST0001.osc.active=0'
Severity : enhancement
Bugzilla : 11335
Severity : enhancement
Bugzilla : 10998
Description: provide MGS failover
-Details : Added config lock reacquisition after MGS server failover.
-
+Details : Added config lock reacquisition after MGS server failover.
+
Severity : enhancement
Bugzilla : 11461
Description: add Linux 2.4 support
Severity : normal
Bugzilla : 11330
-Description: a large application tries to do I/O to the same resource and dies
+Description: a large application tries to do I/O to the same resource and dies
in the middle of it.
-Details : Check the req->rq_arrival time after the call to
+Details : Check the req->rq_arrival time after the call to
ost_brw_lock_get(), but before we do anything about
processing it & sending the BULK transfer request. This
should help move old stale pending locks off the queue as
quickly as obd_timeout.
Severity : major
-Frequency : when an incorrect nid is specified during startup
+Frequency : when an incorrect nid is specified during startup
Bugzilla : 10734
Description: ptlrpc connect to non-existant node causes kernel crash
Details : LNET can't be re-entered from an event callback, which
happened when we expire a message after the export has been
cleaned up. Instead, hand the zombie cleanup off to another
- thread.
+ thread.
Severity : enhancement
Bugzilla : 10902
and bits policy, thus improving the performance of search through
the granted list.
+Severity : major
+Frequency : only if OST filesystem is corrupted
+Bugzilla : 9829
+Description: client incorrectly hits assertion in ptlrpc_replay_req()
+Details : for a short time RPCs with bulk IO are in the replay list,
+ but replay of bulk IOs is unimplemented. If the OST filesystem
+ is corrupted due to disk cache incoherency and then replay is
+ started it is possible to trip an assertion. Avoid putting
+ committed RPCs into the replay list at all to avoid this issue.
+
+Severity : major
+Frequency : liblustre (e.g. catamount) on a large cluster with >= 8 OSTs/OSS
+Bugzilla : 11684
+Description: System hang on startup
+Details : This bug allowed the liblustre (e.g. catamount) client to
+ return to the app before handling all startup RPCs. This
+ could leave the node unresponsive to lustre network traffic
+ and manifested as a server ptllnd timeout.
+
+Severity : enhancement
+Bugzilla : 11667
+Description: Add "/proc/sys/lustre/debug_peer_on_timeout"
+Details : liblustre envirable: LIBLUSTRE_DEBUG_PEER_ON_TIMEOUT
+ boolean to control whether to print peer debug info when a
+ client's RPC times out.
+
Severity : minor
-Frequency : only for kernels with patches from Lustre below 1.4.3
+Frequency : only for kernels with patches from Lustre below 1.4.3
Bugzilla : 11248
Description: Remove old rdonly API
-Details : Remove old rdonly API which unsed from at least lustre 1.4.3
+Details : Remove old rdonly API which unused from at least lustre 1.4.3
Severity : major
Frequency : only for devices with external journals
Bugzilla : 10719
-Description: Set external device read-only also
+Description: Set external device read-only also
Details : During a commanded failover stop, we set the disk device
read-only while the server shuts down. We now also set any
- external journal device read-only at the same time.
-
+ external journal device read-only at the same time.
+
Severity : minor
-Frequency : when upgrading from 1.4 while trying to change parameters
+Frequency : when upgrading from 1.4 while trying to change parameters
Bugzilla : 11692
Description: The wrong (new) MDC name was used when setting parameters for
upgraded MDT's. Also allows changing of OSC (and MDC)
Description: QOS code breaks on skipped indicies
Details : Add checks for missing OST indicies in the QOS code, so OSTs
created with --index need not be sequential.
+
+Severity : enhancement
+Bugzilla : 11264
+Description: Add uninit_groups feature to ldiskfs2 to speed up e2fsck
+Details : The uninit_groups feature works in conjunction with the kernel
+ filesystem code (ldiskfs2 only) and e2fsprogs-1.39-cfs6 to speed
+ up the pass1 processing of e2fsck. This is a read-only feature
+ in ldiskfs2 only, so older kernels and current ldiskfs cannot
+ mount filesystems that have had this feature enabled.
+
+Severity : enhancement
+Bugzilla : 10816
+Description: Improve multi-block allocation algorithm to avoid fragmentation
+Details : The mballoc3 code (ldiskfs2 only) adds new mechanisms to improve
+ allocation locality and avoid filesystem fragmentation.
+
+------------------------------------------------------------------------------
+
+2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.4.10
+ * Support for kernels:
+ 2.4.21-47.0.1.EL (RHEL 3)
+ 2.6.5-7.283 (SLES 9)
+ 2.6.9-42.0.10.EL (RHEL 4)
+ 2.6.12.6 vanilla (kernel.org)
+ 2.6.16.27-0.9 (SLES 10)
+ * Recommended e2fsprogs version: 1.39.cfs5
+
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel
+ * bug fixes
+
+Severity : critical
+Frequency : occasional, depends on client load and configuration
+Bugzilla : 12181, 12203
+Description: data loss for recently-modified files
+Introduced : 1.4.6
+Details : In some cases it is possible that recently written or created
+ files may not be written to disk in a timely manner (this should
+ normally be within 30s unless client IO load is very high).
+ The problem appears as zero-length files or files that are a
+ multiple of 1MB in size after a client crash or client eviction
+ that are missing data at the end of the file.
+
+ This problem is more likely to be hit on clients where files are
+ repeatedly created and unlinked in the same directory, clients
+ have a large amount of RAM, have many CPUs, the filesystem has
+ many OSTs, the clients are rebooted frequently, and/or the files
+ are not accessed by other nodes after being written.
+
+ The presence of the problem can be detected by looking at
+ /proc/sys/fs/inode-state. If the first number (nr_inodes) is
+ smaller than the second (nr_unused) then dirty files will not
+ be flushed automatically to disk. "sync; sleep 10" should be
+ run several times on the node before unmounting it to update
+ Lustre (this is also safe to run on nodes without this problem).
+
+ There is also a related kernel bug in the RHEL4 4 2.6.9 kernel
+ that can cause this same problem, so customers using that kernel
+ also need to update the kernel in addition to Lustre. In order
+ to properly fix this bug, the RHEL3 2.4.21 kernel is also updated.
+
+ It is normal that files written just before a client crash (less
+ than 30s) may not yet have been flushed to disk, even for local
+ filesystems.
+
+Severity : normal
+Frequency : frequent on thin XT3 nodes
+Bugzilla : 10802
+Description: UUID collision on thin XT3 Linux nodes
+Details : UUIDs on Compute Node Linux XT3 nodes were not generated
+ randomly, since we relied on an insufficiently-seeded PRNG.
Severity : normal
+Frequency : rare
+Bugzilla : 11693
+Description: OSS hangs after "All ost request buffers busy"
+Details : A deadlock between quota and journal operations caused OSS
+ hangs after printing "All ost request buffers busy."
+
+Severity : minor
+Frequency : always on liblustre builds
+Bugzilla : 11175
+Description: Cleanup compiler warnings on liblustre
+
+Severity : minor
+Frequency : always on liblustre builds on XT3
+Bugzilla : 12146
+Description: LC_CONFIG_CDEBUG don't run while build liblustre on XT3.
+
Frequency : always
Bugzilla : 3244
Description: Addition of EXT3_FEATURE_RO_COMPAT_DIR_NLINKS flag for
EXT3_FEATURE_RO_COMPAT_SUPP. This flag will be set whenever
subdirectory count crosses 32000. This will aid e2fsck to
correctly handle more than 32000 subdirectories.
-
-Severity : normal
-Frequency : always
-Bugzilla : 11090
-Description: versioning check is incomplete
-Details : Checking the version difference of client vs. server, report
- error if the gap is too big.
-
-------------------------------------------------------------------------------
-
-TBD Cluster File Systems, Inc. <info@clusterfs.com>
- * version 1.4.10
- * Support for kernels:
- 2.6.16.21-0.8 (SLES10)
- 2.6.9-42.0.8EL (RHEL 4)
- 2.6.5-7.276 (SLES 9)
- 2.4.21-47.0.1.EL (RHEL 3)
- 2.6.12.6 vanilla (kernel.org)
- * Recommended e2fsprogs version: 1.39.cfs2-0
Severity : major
Frequency : liblustre (e.g. catamount) on a large cluster with >= 8 OSTs/OSS
Frequency : always
Bugzilla : 10214
Description: make O_SYNC working on 2.6 kernels
-Details : 2.6 kernels use different method for mark pages for write,
+Details : 2.6 kernels use different method for mark pages for write,
so need add a code to lustre for O_SYNC work.
Severity : minor
Frequency : always
Bugzilla : 11110
Description: Failure to close file and release space on NFS
-Details : Put inode details into lock acquired in ll_intent_file_open.
+Details : Put inode details into lock acquired in ll_intent_file_open.
Use mdc_intent_lock in ll_intent_open to properly
- detect all kind of errors unhandled by mdc_enqueue
+ detect all kind of errors unhandled by mdc_enqueue.
Severity : major
Frequency : rare
Bugzilla : 10866
-Description: proc file read during shutdown sometimes raced obd removal,
+Description: proc file read during shutdown sometimes raced obd removal,
causing node crash
Details : Add lock to prevent obd access after proc file removal.
Frequency : Only for files larger than 4GB on 32-bit clients.
Bugzilla : 11237
Description: improperly doing page alignment of locks
-Details : Modify lustre core code to use CFS_PAGE_* defines instead of
- PAGE_*. Make CFS_PAGE_MASK a 64-bit mask.
+Details : Modify lustre core code to use CFS_PAGE_* defines instead of
+ PAGE_*. Make CFS_PAGE_MASK a 64-bit mask.
Severity : normal
Frequency : rarely
Details : under very unusual load conditions an assertion is hit in
ll_intent_file_open()
-Severity : major
+Severity : major
Frequency : only if OST filesystem is corrupted
Bugzilla : 9829
Description: client incorrectly hits assertion in ptlrpc_replay_req()
allocation failure the allocation is retried with a smaller
buffer and broken into smaller requests.
+Severity : enhancement
+Bugzilla : 11563
+Description: Add -o localflock option to simulate old noflock behaviour.
+Details : This will achieve local-only flock/fcntl locks coherentness.
+
+Severity : normal
+Frequency : always
+Bugzilla : 11090
+Description: versioning check is incomplete
+Details : Checking the version difference of client vs. server, report
+ error if the gap is too big.
+
+Severity : major
+Bugzilla : 11710
+Frequency : always
+Description: add support PG_writeback bit
+Details : add support for PG_writeback bit for Lustre, for more carefull
+ work with page cache in 2.6 kernel. This also fix some deadlocks
+ and remove hack for work O_SYNC with 2.6 kernel.
+
+Severity : enhancement
+Bugzilla : 11264
+Description: Add uninit_groups feature to ldiskfs2 to speed up e2fsck
+Details : The uninit_groups feature works in conjunction with the kernel
+ filesystem code (ldiskfs2 only) and e2fsprogs-1.39-cfs6 to speed
+ up the pass1 processing of e2fsck. This is a read-only feature
+ in ldiskfs2 only, so older kernels and current ldiskfs cannot
+ mount filesystems that have had this feature enabled.
+
+Severity : enhancement
+Bugzilla : 10816
+Description: Improve multi-block allocation algorithm to avoid fragmentation
+Details : The mballoc3 code (ldiskfs2 only) adds new mechanisms to improve
+ allocation locality and avoid filesystem fragmentation.
+
------------------------------------------------------------------------------
-2006-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
+2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.9
* Support for kernels:
- 2.6.16.21-0.8 (SLES10)
- 2.6.9-42.0.3EL (RHEL 4)
+ 2.6.9-42.0.3.EL (RHEL 4)
2.6.5-7.276 (SLES 9)
2.4.21-47.0.1.EL (RHEL 3)
2.6.12.6 vanilla (kernel.org)
- * bug fixes
+ 2.6.16.21-0.8 (SLES10)
+ * Recommended e2fsprogs version: 1.39.cfs2-0
* The backwards-compatible /proc/sys/portals symlink has been removed
in this release. Before upgrading, please ensure that you change
entry in /proc/sys/lnet or sysctl lnet.*. This change can be made
in advance of the upgrade on any system running Lustre 1.4.6 or
newer, since /proc/sys/lnet was added in that version.
- * Note that reiserfs quotas are temporarily disabled on SLES 10 in this
- kernel.
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel
+ * bug fixes
+
+Severity : minor
+Frequency : only when quota is used
+Bugzilla : 11286
+Description: avoid scanning export list for quota master
+Details : Change the algorithms to avoid scanning export list in order
+ to improve the efficiency.
Severity : critical
Frequency : MDS failover only, very rarely
Bugzilla : 11277
Description: clients may get ASSERTION(granted_lock != NULL)
Details : When request was taking a long time, and a client was resending
- a getattr by name lock request. The were multiple lock requests
- with the same client lock handle and
+ a getattr by name lock request. The were multiple lock requests
+ with the same client lock handle and
mds_getattr_name->fixup_handle_for_resent_request found one of the
lock handles but later failed with ASSERTION(granted_lock != NULL).
Frequency : rare
Bugzilla : 10891
Description: handle->h_buffer_credits > 0, assertion failure
-Details : h_buffer_credits is zero after truncate, causing assertion
+Details : h_buffer_credits is zero after truncate, causing assertion
failure. This patch extends the transaction or creates a new
one after truncate.
Bugzilla : 10796
Description: Various nfs/patchless fixes.
Details : fixes reuse disconected alias for lookup process - this fixes
- warning "find_exported_dentry: npd != pd",
+ warning "find_exported_dentry: npd != pd",
fix permission error with open files at nfs.
- fix apply umaks when do revalidate.
+ fix apply umask when do revalidate.
Severity : normal
Frequency : occasional
Bugzilla : 11191
Description: Crash on NFS re-export node
-Details : calling clear_page() on the wrong pointer triggered oops in
+Details : calling clear_page() on the wrong pointer triggered oops in
generic_mapping_read().
Severity : normal
reading it again it is possible to get stale data from the RAID
cache instead of reading it from disk.
+Severity : normal
+Frequency : always for sles10 kernel
+Bugzilla : 10947
+Description: sles10 support
+Details : ll_follow_link: compile fixes and using of nd_set_link
+ under newer kernels.
+
Severity : major
Frequency : depends on arch, kernel and compiler version, always on sles10
kernel and x86_64
ext3 code. The SLES10 kernel turns barrier support on by
default. The fix is to undo that change for ldiskfs.
+
------------------------------------------------------------------------------
2006-12-09 Cluster File Systems, Inc. <info@clusterfs.com>
Frequency : always on ppc64
Bugzilla : 10634
Description: the write to an ext3 filesystem mounted with mballoc got stuck
-Details : ext3_mb_generate_buddy() uses find_next_bit() which does not
+Details : ext3_mb_generate_buddy() uses find_next_bit() which does not
perform endianness conversion.
Severity : major
Description: Error of copying files with lustre special EAs as root
Details : Client side always return success for setxattr call for lustre
special xattr (currently only "trusted.lov").
-
+
Severity : normal
Frequency : rarely on clusters with both ia64+i386 clients
Bugzilla : 10672
another node, we now correctly return ETXTBSY instead of
truncating the file.
+Severity : enhancement
+Bugzilla : 4900
+Description: Async OSC create to avoid the blocking unnecessarily.
+Details : If a OST has no remain object, system will block on the creating
+ when need to create a new object on this OST. Now, ways use
+ pre-created objects when available, instead of blocking on an
+ empty osc while others are not empty. If we must block, we block
+ for the shortest possible period of time.
+
Severity : normal
Frequency : rare
Bugzilla : 2707
Rather --with-portals=<path-to-portals-includes> is used to
enable building on the XT3. In addition to enable XT3 specific
features the option --enable-cray-xt3 must be used.
-
+
Severity : major
Frequency : rare
Bugzilla : 7407
* add hard link support
* change obdfile creation method
* kernel patch changed
-
+
2002-09-19 Peter Braam <braam@clusterfs.com>
* version 0_5_9
* bug fix