tbd Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.6.1
+ * Support for kernels:
+ 2.6.9-42.0.10.EL (RHEL 4)
+ 2.6.5-7.283 (SLES 9)
+ 2.6.12.6 vanilla (kernel.org)
+ 2.6.16.27-0.9 (SLES 10)
+ * Client support for unpatched kernels:
+ (see https://mail.clusterfs.com/wikis/lustre/PatchlessClient)
+ 2.6.16 - 2.6.19 vanilla (kernel.org)
+ 2.6.9-42.0.8EL (RHEL 4)
+ * Recommended e2fsprogs version: 1.39.cfs6
+ * bug fixes
+ * Note that reiserfs quotas are temporarily disabled on SLES 10 in this
+ kernel.
+
+Severity : minor
+Bugzilla : 11512
+Description: Remove write from health_check, add configure option
+Details : While an OSS is under a heavy ost_destroy load reading the
+ proc entry /proc/fs/lustre/health_check can take an unreasonably
+ long time. This disrupts our ability the effectively monitor
+ the health of the filesystem. (LLNL)
+
+Severity : enhancement
+Bugzilla : 11548
+Description: Add LNET router traceability for debug purposes
+Details : If a checksum failure occurs with a router as part of the
+ IO path, the NID of the last router that forwarded the bulk data
+ is printed so it can be identified.
+
+Severity : normal
+Frequency : rare
+Bugzilla : 11315
+Description: OST "spontaneously" evicts client; client has imp_pingable == 0
+Details : Due to a race condition, liblustre clients were occasionally
+ evicted incorrectly.
+
+Severity : enhancement
+Bugzilla : 10997
+Description: lfs setstripe use optional parameters instead of postional
+ parameters.
+
+Severity : enhancement
+Bugzilla : 10651
+Description: Nanosecond timestamp support for ldiskfs
+Details : The on-disk ldiskfs filesystem has added support for nanosecond
+ resolution timestamps. There is not yet support for this at
+ the Lustre filesystem level.
+
+Severity : normal
+Frequency : during server recovery
+Bugzilla : 11203
+Description: MDS failing to send precreate requests due to OSCC_FLAG_RECOVERING
+Details : request with rq_no_resend flag not awake l_wait_event if they get a
+ timeout.
+
+Severity : minor
+Frequency : nfs export on patchless client
+Bugzilla : 11970
+Description: connectathon hang when test nfs export over patchless client
+Details : Disconnected dentry cannot be found with lookup, so we do not need
+ to unhash it or make it invalid
+
+Bugzilla : 11757
+Description: fix llapi_lov_get_uuids() to allow many OSTs to be returned
+Details: : Change llapi_lov_get_uuids() to read the UUIDs from /proc instead
+ of using an ioctl. This allows lfsck for > 160 OSTs to succeed.
+
+Severity : minor
+Frequency : rare
+Bugzilla : 11546
+Description: open req refcounting wrong on reconnect
+Details : If reconnect happened between getting open reply from server and
+ call to mdc_set_replay_data in ll_file_open, we will schedule
+ replay for unreferenced request that we are about to free.
+ Subsequent close will crash in variety of ways.
+ Check that request is still eligible for replay in
+ mdc_set_replay_data().
+
+--------------------------------------------------------------------------------
+
+2007-05-03 Cluster File Systems, Inc. <info@clusterfs.com>
+ * version 1.6.0.1
+ * bug fixes
+
+Severity : normal
+Frequency : on some architectures
+Bugzilla : 12404
+Description: 1.6 client sometimes fails to mount from a 1.4 MDT
+Details : Uninitialized flags sometimes cause configuration commands to
+ be skipped.
+
+--------------------------------------------------------------------------------
+
+2007-04-19 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.6.0
* CONFIGURATION CHANGE. This version of Lustre WILL NOT
INTEROPERATE with older versions automatically. In many cases a
this release. See https://mail.clusterfs.com/wikis/lustre/MountConf
for details.
* Support for kernels:
- 2.6.9-42.0.3EL (RHEL 4)
- 2.6.5-7.276 (SLES 9)
2.4.21-47.0.1.EL (RHEL 3)
+ 2.6.5-7.283 (SLES 9)
+ 2.6.9-42.0.10.EL (RHEL 4)
2.6.12.6 vanilla (kernel.org)
- 2.6.16.21-0.8 (SLES10)
+ 2.6.16.27-0.9 (SLES10)
* Client support for unpatched kernels:
(see https://mail.clusterfs.com/wikis/lustre/PatchlessClient)
2.6.16 - 2.6.19 vanilla (kernel.org)
- 2.6.9-42.0.3EL (RHEL 4)
- * Recommended e2fsprogs version: 1.39.cfs2-0
+ 2.6.9-42.0.8EL (RHEL 4)
+ * Recommended e2fsprogs version: 1.39.cfs6
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel
* bug fixes
Severity : enhancement
Description: startup order invariance
Details : MDTs and OSTs can be started in any order. Clients only
require the MDT to complete startup.
-
+
Severity : enhancement
Bugzilla : 4899
Description: parallel, asynchronous orphan cleanup
Details : stripe assignments are now made based on ost space available,
ost previous usage, and OSS previous usage, in order to try
to optimize storage space and networking resources.
-
+
Severity : enhancement
Bugzilla : 4226
Description: Permanently set tunables
Details : All writable /proc/fs/lustre tunables can now be permanently
set on a per-server basis, at mkfs time or on a live
system.
-
+
Severity : enhancement
Bugzilla : 10547
Description: Lustre message v2
Details : Clients can be started with a list of OSTs that should be
declared "inactive" for known non-responsive OSTs.
+Severity : normal
+Bugzilla : 12123
+Description: ENOENT returned for valid filehandle during dbench.
+Details : Check if a directory has children when invalidating dentries
+ associated with an inode during lock cancellation. This fixes
+ an incorrect ENOENT sometimes seen for valid filehandles during
+ testing with dbench.
+
Severity : minor
Frequency : SFS test only (otherwise harmless)
Bugzilla : 6062
Description: SPEC SFS validation failure on NFS v2 over lustre.
Details : Changes the blocksize for regular files to be 2x RPC size,
and not depend on stripe size.
-
+
+Severity : enhancement
+Bugzilla : 10088
+Description: fine-grained SMP locking inside DLM
+Details : Improve DLM performance on SMP systems by removing the single
+ per-namespace lock and replace it with per-resource locks.
+
+Severity : enhancement
+Bugzilla : 9332
+Description: don't hold multiple extent locks at one time
+Details : To avoid client eviction during large writes, locks are not
+ held on multiple stripes at one time or for very large writes.
+ Otherwise, clients can block waiting for a lock on a failed OST
+ while holding locks on other OSTs and be evicted.
+
Severity : enhancement
Bugzilla : 9293
Description: Multiple MD RPCs in flight.
-Details : Further unserialise some read-only MDS RPCs - learn about intents.
- To avoid overly-overloading MDS, introduce a limit on number of
- MDS RPCs in flight for a single client and add /proc controls
+Details : Further unserialise some read-only MDT RPCs - learn about intents.
+ To avoid overly-overloading MDT, introduce a limit on number of
+ MDT RPCs in flight for a single client and add /proc controls
to adjust this limit.
Severity : enhancement
Description: per-client statistics on server
Details : Add ldlm and operations statistics for each client in
/proc/fs/lustre/mds|obdfilter/*/exports/
-
+
Severity : enhancement
Bugzilla : 22486
-Description: mds statistics
-Details : Add detailed mds operations statistics in
+Description: improved MDT statistics
+Details : Add detailed MDT operations statistics in
/proc/fs/lustre/mds/*/stats
-
+
Severity : enhancement
Bugzilla : 10968
Description: VFS operations stats
Details : Add client VFS call stats, trackable by pid, ppid, or gid
/proc/fs/lustre/llite/*/vfs_ops_stats
- /proc/fs/lustre/llite/*/track_[pid|ppid|gid]
+ /proc/fs/lustre/llite/*/vfs_track_[pid|ppid|gid]
Severity : minor
Frequency : always
Description: Fix client-side osc byte counters
Details : The osc read/write byte counters in
/proc/fs/lustre/osc/*/stats are now working
-
+
Severity : minor
Frequency : always as root on SLES
Bugzilla : 10667
Description: Failure of copying files with lustre special EAs.
Details : Client side always return success for setxattr call for lustre
special xattr (currently only "trusted.lov").
-
+
Severity : minor
Frequency : always
Bugzilla : 10345
Severity : enhancement
Bugzilla : 11229
Description: Easy OST removal
-Details : OSTs can be permanently deactivated with e.g. 'lctl
- conf_param lustre-OST0001.osc.active=0'
+Details : OSTs can be permanently deactivated with e.g. 'lctl
+ conf_param lustre-OST0001.osc.active=0'
Severity : enhancement
Bugzilla : 11335
Description: MGS proc entries
-Details : Added basic proc entries for the MGS showing what filesystems
+Details : Added basic proc entries for the MGS showing what filesystems
are served.
Severity : enhancement
Bugzilla : 10998
Description: provide MGS failover
-Details : Added config lock reacquisition after MGS server failover.
-
+Details : Added config lock reacquisition after MGS server failover.
+
Severity : enhancement
Bugzilla : 11461
Description: add Linux 2.4 support
-Details : Added support for RHEL 2.4.21 kernel for 1.6 servers and clients
+Details : Added support for RHEL 2.4.21 kernel for 1.6 servers and clients
Severity : normal
Bugzilla : 11330
-Description: a large application tries to do I/O to the same resource and dies
+Description: a large application tries to do I/O to the same resource and dies
in the middle of it.
-Details : Check the req->rq_arrival time after the call to
+Details : Check the req->rq_arrival time after the call to
ost_brw_lock_get(), but before we do anything about
processing it & sending the BULK transfer request. This
should help move old stale pending locks off the queue as
quickly as obd_timeout.
Severity : major
-Frequency : when an incorrect nid is specified during startup
+Frequency : when an incorrect nid is specified during startup
Bugzilla : 10734
Description: ptlrpc connect to non-existant node causes kernel crash
Details : LNET can't be re-entered from an event callback, which
happened when we expire a message after the export has been
cleaned up. Instead, hand the zombie cleanup off to another
- thread.
+ thread.
Severity : enhancement
Bugzilla : 10902
and bits policy, thus improving the performance of search through
the granted list.
-Severity : major
+Severity : major
Frequency : only if OST filesystem is corrupted
Bugzilla : 9829
Description: client incorrectly hits assertion in ptlrpc_replay_req()
but replay of bulk IOs is unimplemented. If the OST filesystem
is corrupted due to disk cache incoherency and then replay is
started it is possible to trip an assertion. Avoid putting
- committed RPCs into the replay list at all to avoid this issue.
+ committed RPCs into the replay list at all to avoid this issue.
+
+Severity : major
+Frequency : liblustre (e.g. catamount) on a large cluster with >= 8 OSTs/OSS
+Bugzilla : 11684
+Description: System hang on startup
+Details : This bug allowed the liblustre (e.g. catamount) client to
+ return to the app before handling all startup RPCs. This
+ could leave the node unresponsive to lustre network traffic
+ and manifested as a server ptllnd timeout.
+
+Severity : enhancement
+Bugzilla : 11667
+Description: Add "/proc/sys/lustre/debug_peer_on_timeout"
+Details : liblustre envirable: LIBLUSTRE_DEBUG_PEER_ON_TIMEOUT
+ boolean to control whether to print peer debug info when a
+ client's RPC times out.
Severity : minor
-Frequency : only for kernels with patches from Lustre below 1.4.3
+Frequency : only for kernels with patches from Lustre below 1.4.3
Bugzilla : 11248
Description: Remove old rdonly API
-Details : Remove old rdonly API which unsed from at least lustre 1.4.3
+Details : Remove old rdonly API which unused from at least lustre 1.4.3
+
+Severity : major
+Frequency : only for devices with external journals
+Bugzilla : 10719
+Description: Set external device read-only also
+Details : During a commanded failover stop, we set the disk device
+ read-only while the server shuts down. We now also set any
+ external journal device read-only at the same time.
+
+Severity : minor
+Frequency : when upgrading from 1.4 while trying to change parameters
+Bugzilla : 11692
+Description: The wrong (new) MDC name was used when setting parameters for
+ upgraded MDT's. Also allows changing of OSC (and MDC)
+ parameters if --writeconf is specified at tunefs upgrade time.
+
+Severity : major
+Frequency : when setting specific ost indicies
+Bugzilla : 11149
+Description: QOS code breaks on skipped indicies
+Details : Add checks for missing OST indicies in the QOS code, so OSTs
+ created with --index need not be sequential.
+
+Severity : enhancement
+Bugzilla : 11264
+Description: Add uninit_groups feature to ldiskfs2 to speed up e2fsck
+Details : The uninit_groups feature works in conjunction with the kernel
+ filesystem code (ldiskfs2 only) and e2fsprogs-1.39-cfs6 to speed
+ up the pass1 processing of e2fsck. This is a read-only feature
+ in ldiskfs2 only, so older kernels and current ldiskfs cannot
+ mount filesystems that have had this feature enabled.
+
+Severity : enhancement
+Bugzilla : 10816
+Description: Improve multi-block allocation algorithm to avoid fragmentation
+Details : The mballoc3 code (ldiskfs2 only) adds new mechanisms to improve
+ allocation locality and avoid filesystem fragmentation.
------------------------------------------------------------------------------
-TBD Cluster File Systems, Inc. <info@clusterfs.com>
+2007-04-01 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.10
* Support for kernels:
- 2.6.9-42.0.3EL (RHEL 4)
- 2.6.5-7.276 (SLES 9)
2.4.21-47.0.1.EL (RHEL 3)
+ 2.6.5-7.283 (SLES 9)
+ 2.6.9-42.0.10.EL (RHEL 4)
2.6.12.6 vanilla (kernel.org)
- * Recommended e2fsprogs version: 1.39.cfs2-0
+ 2.6.16.27-0.9 (SLES 10)
+ * Recommended e2fsprogs version: 1.39.cfs5
+
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel
+ * bug fixes
+
+Severity : normal
+Frequency : always
+Bugzilla : 3244
+Description: Addition of EXT3_FEATURE_RO_COMPAT_DIR_NLINKS flag for
+ > 32000 subdirectories
+Details : Add EXT3_FEATURE_RO_COMPAT_DIR_NLINK flag to
+ EXT3_FEATURE_RO_COMPAT_SUPP. This flag will be set whenever
+ subdirectory count crosses 32000. This will aid e2fsck to
+ correctly handle more than 32000 subdirectories.
+
+Severity : major
+Frequency : liblustre (e.g. catamount) on a large cluster with >= 8 OSTs/OSS
+Bugzilla : 11684
+Description: System hang on startup
+Details : This bug allowed the liblustre (e.g. catamount) client to
+ return to the app before handling all startup RPCs. This
+ could leave the node unresponsive to lustre network traffic
+ and manifested as a server ptllnd timeout.
+
+Severity : enhancement
+Bugzilla : 11667
+Description: Add "/proc/sys/lustre/debug_peer_on_timeout"
+ (liblustre envirable: LIBLUSTRE_DEBUG_PEER_ON_TIMEOUT)
+ boolean to control whether to print peer debug info when a
+ client's RPC times out.
Severity : normal
Frequency : always
Description: Failure to close file and release space on NFS
Details : Put inode details into lock acquired in ll_intent_file_open.
Use mdc_intent_lock in ll_intent_open to properly
- detect all kind of errors unhandled by mdc_enqueue
+ detect all kind of errors unhandled by mdc_enqueue.
Severity : major
Frequency : rare
Frequency : Only for files larger than 4GB on 32-bit clients.
Bugzilla : 11237
Description: improperly doing page alignment of locks
-Details : Modify lustre core code to use CFS_PAGE_* defines instead of
- PAGE_*. Make CFS_PAGE_MASK 64bit long.
+Details : Modify lustre core code to use CFS_PAGE_* defines instead of
+ PAGE_*. Make CFS_PAGE_MASK a 64-bit mask.
Severity : normal
Frequency : rarely
Details : under very unusual load conditions an assertion is hit in
ll_intent_file_open()
+Severity : major
+Frequency : only if OST filesystem is corrupted
+Bugzilla : 9829
+Description: client incorrectly hits assertion in ptlrpc_replay_req()
+Details : for a short time RPCs with bulk IO are in the replay list,
+ but replay of bulk IOs is unimplemented. If the OST filesystem
+ is corrupted due to disk cache incoherency and then replay is
+ started it is possible to trip an assertion. Avoid putting
+ committed RPCs into the replay list at all to avoid this issue.
+
Severity : normal
Frequency : always
Bugzilla : 10901
allocation failure the allocation is retried with a smaller
buffer and broken into smaller requests.
+Severity : enhancement
+Bugzilla : 11563
+Description: Add -o localflock option to simulate old noflock behaviour.
+Details : This will achieve local-only flock/fcntl locks coherentness.
+
+Severity : normal
+Frequency : always
+Bugzilla : 11090
+Description: versioning check is incomplete
+Details : Checking the version difference of client vs. server, report
+ error if the gap is too big.
+
+Severity : major
+Bugzilla : 11710
+Frequency : always
+Description: add support PG_writeback bit
+Details : add support for PG_writeback bit for Lustre, for more carefull
+ work with page cache in 2.6 kernel. This also fix some deadlocks
+ and remove hack for work O_SYNC with 2.6 kernel.
+
+Severity : enhancement
+Bugzilla : 11264
+Description: Add uninit_groups feature to ldiskfs2 to speed up e2fsck
+Details : The uninit_groups feature works in conjunction with the kernel
+ filesystem code (ldiskfs2 only) and e2fsprogs-1.39-cfs6 to speed
+ up the pass1 processing of e2fsck. This is a read-only feature
+ in ldiskfs2 only, so older kernels and current ldiskfs cannot
+ mount filesystems that have had this feature enabled.
+
+Severity : enhancement
+Bugzilla : 10816
+Description: Improve multi-block allocation algorithm to avoid fragmentation
+Details : The mballoc3 code (ldiskfs2 only) adds new mechanisms to improve
+ allocation locality and avoid filesystem fragmentation.
+
------------------------------------------------------------------------------
-TBD Cluster File Systems, Inc. <info@clusterfs.com>
+2007-02-09 Cluster File Systems, Inc. <info@clusterfs.com>
* version 1.4.9
* Support for kernels:
2.6.9-42.0.3EL (RHEL 4)
2.6.5-7.276 (SLES 9)
- 2.4.21-40.0.1.EL (RHEL 3)
+ 2.4.21-47.0.1.EL (RHEL 3)
2.6.12.6 vanilla (kernel.org)
+ * Recommended e2fsprogs version: 1.39.cfs2-0
+
+ * The backwards-compatible /proc/sys/portals symlink has been removed
+ in this release. Before upgrading, please ensure that you change
+ any configuration scripts or /etc/sysctl.conf files that access
+ /proc/sys/portals/* or sysctl portals.* to use the corresponding
+ entry in /proc/sys/lnet or sysctl lnet.*. This change can be made
+ in advance of the upgrade on any system running Lustre 1.4.6 or
+ newer, since /proc/sys/lnet was added in that version.
+ * Note that reiserfs quotas are disabled on SLES 10 in this kernel
* bug fixes
+Severity : minor
+Frequency : only when quota is used
+Bugzilla : 11286
+Description: avoid scanning export list for quota master
+Details : Change the algorithms to avoid scanning export list in order
+ to improve the efficiency.
+
Severity : critical
-Frequency : rare
+Frequency : MDS failover only, very rarely
Bugzilla : 11125
Description: "went back in time" messages on mds failover
Details : The greatest transno may be lost when the current operation
Bugzilla : 11277
Description: clients may get ASSERTION(granted_lock != NULL)
Details : When request was taking a long time, and a client was resending
- a getattr by name lock request. The were multiple lock
- requests with the same client lock handle and
- mds_getattr_name->fixup_handle_for_resent_request found one
- of the lock handles but later failed with
- ASSERTION(granted_lock != NULL).
+ a getattr by name lock request. The were multiple lock requests
+ with the same client lock handle and
+ mds_getattr_name->fixup_handle_for_resent_request found one of the
+ lock handles but later failed with ASSERTION(granted_lock != NULL).
Severity : major
Frequency : rare
Bugzilla : 10891
Description: handle->h_buffer_credits > 0, assertion failure
-Details : h_buffer_credits is zero after truncate, causing assertion
+Details : h_buffer_credits is zero after truncate, causing assertion
failure. This patch extends the transaction or creates a new
one after truncate.
Bugzilla : 10796
Description: Various nfs/patchless fixes.
Details : fixes reuse disconected alias for lookup process - this fixes
- warning "find_exported_dentry: npd != pd", fix permission
- error with open files at nfs.
+ warning "find_exported_dentry: npd != pd",
+ fix permission error with open files at nfs.
+ fix apply umask when do revalidate.
Severity : normal
Frequency : occasional
Bugzilla : 11191
Description: Crash on NFS re-export node
-Details : call clear_page on wrong pointer triggered oops in
+Details : calling clear_page() on the wrong pointer triggered oops in
generic_mapping_read().
Severity : normal
reading it again it is possible to get stale data from the RAID
cache instead of reading it from disk.
+Severity : normal
+Frequency : always for sles10 kernel
+Bugzilla : 10947
+Description: sles10 support
+Details : ll_follow_link: compile fixes and using of nd_set_link
+ under newer kernels.
+
Severity : major
Frequency : depends on arch, kernel and compiler version, always on sles10
- kernel and x86_64
+ kernel and x86_64
Bugzilla : 11562
Description: recursive or deep enough symlinks cause stack overflow
Details : getting rid of large stack-allocated variable in
- __vfs_follow_link
+ __vfs_follow_link
Severity : minor
Frequency : depends on hardware
Bugzilla : 11540
Description: lustre write performance loss in the SLES10 kernel
Details : the performance loss is caused by using of write barriers in the
- ext3 code. The SLES10 kernel turns barrier support on by
- default. The fix is to undo that change for ldiskfs.
+ ext3 code. The SLES10 kernel turns barrier support on by
+ default. The fix is to undo that change for ldiskfs.
+
------------------------------------------------------------------------------
{kbytes,files}_{total,free,avail} files, it may appear
as zero or be out of date.
+Severity : minor
+Frequency : systems with MD RAID1 external journal devices
+Bugzilla : 10832
+Description: lconf's call to blkid is confused by RAID1 journal devices
+Details : Use the "blkid -l" flag to locate the MD RAID device instead
+ of returning all block devices that match the journal UUID.
+
Severity : normal
Frequency : always, for aggregate stripe size over 4GB
Bugzilla : 10725
the truncated size. No file data is lost.
Severity : enhancement
-Frequency : liblustre only
Bugzilla : 10452
Description: Allow recovery/failover for liblustre clients.
Details : liblustre clients were unaware of failover configurations until
Description: Error of copying files with lustre special EAs as root
Details : Client side always return success for setxattr call for lustre
special xattr (currently only "trusted.lov").
-
+
Severity : normal
Frequency : rarely on clusters with both ia64+i386 clients
Bugzilla : 10672
another node, we now correctly return ETXTBSY instead of
truncating the file.
+Severity : enhancement
+Bugzilla : 4900
+Description: Async OSC create to avoid the blocking unnecessarily.
+Details : If a OST has no remain object, system will block on the creating
+ when need to create a new object on this OST. Now, ways use
+ pre-created objects when available, instead of blocking on an
+ empty osc while others are not empty. If we must block, we block
+ for the shortest possible period of time.
+
Severity : normal
Frequency : rare
Bugzilla : 2707
invalid by the follow_mount time.
Severity : minor
-Frequency : rare
+Frequency : liblustre clients only
Bugzilla : 10883
Description: Race in 'instant cancel' lock handling could lead to such locks
never to be granted in case of SMP MDS
Rather --with-portals=<path-to-portals-includes> is used to
enable building on the XT3. In addition to enable XT3 specific
features the option --enable-cray-xt3 must be used.
-
+
Severity : major
Frequency : rare
Bugzilla : 7407
* add hard link support
* change obdfile creation method
* kernel patch changed
-
+
2002-09-19 Peter Braam <braam@clusterfs.com>
* version 0_5_9
* bug fix