1 2011-06-30 Whamcloud, Inc.
4 2.6.18-238.12.1.el5 (RHEL 5)
5 2.6.32-131.2.1.el6 (RHEL 6) - patchless client only
6 * Recommended e2fsprogs version: 1.41.90.wc3
7 * Use ext4-based ldiskfs as default for RHEL5
8 * Add support for 24TB LUN (RHEL5 server only)
10 Severity : enhancement
12 Description: Add support for RHEL5.6 (2.6.18-238.12.1.el5)
14 Severity : enhancement
15 Jira : LU-62, LU-73, LU-402
16 Description: Add RHEL6 client support
20 Description: Fix client crash introduced in 1.8.5
21 Details : The patch from bugzilla ticket 18213 landed for 1.8.5 can cause
22 a crash in mdc_exit_request() when the process is interrupted.
26 Description: ls -l reports wrong file size
27 Details : Always update LVB from disk when a glimpse callback returns an
30 Severity : major (2.6.32-based kernels only)
32 Description: Add workaround for a race causing an assertion failure in
33 clear_inode() on lustre clients.
37 Description: Fix interoperability issue between 1.8 clients and 2.x servers
38 Details : The ldlm hash handling in Lustre 2.x is incompatible with the one
39 used in Lustre 1.8. Add support for 64-bit dir name hash to the
44 Description: Reduce cache pressure on OSS
45 Details : Don't keep pages for objects >8MB in cache to alleviate memory
46 pressure on OSSs. This is done by changing the default value
47 of readcache_max_filesize to 8MB.
51 Description: slow IO with read-intense application
52 Details : Align the readahead extent by 1M after when it is trimed by
57 Description: Skip quotacheck on administratively disabled OSTs
61 Description: Some autogen improvements
63 Severity : enhancement
64 Jira : LU-123, LU-296, LU-230
65 Description: Add support for yaml data logging and auster script.
67 Severity : enhancement
69 Description: Allow OSTs to be created with no primary node
70 Details : Add a --servicenode parameter for mkfs.lustre to treat all
71 service nodes equally.
73 -------------------------------------------------------------------------------
75 2011-05-12 Oracle, Inc.
77 * Support for kernels:
78 2.6.16.60-0.42.8 (SLES 10),
79 2.6.27.39-0.3.1 (SLES 11),
80 2.6.18-194.3.1.el5 (RHEL 5)
81 2.6.18-194.3.1.0.1.el5 (OEL 5)
82 * Client support for unpatched kernels:
83 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
84 2.6.16 - 2.6.32 vanilla (kernel.org)
85 * Recommended e2fsprogs version: 1.41.10-sun2
86 * The async journal commit feature (bug 19128) and the cancel
87 lock before replay feature (bug 16774) are disabled by default.
91 Description: llap pages are moved to head of LRU list instead of tail
95 Description: disable statahead by default due to important races found in the
96 statahead implementation.
100 Description: release request ref through ptlrpc_server_drop_request()
101 Details : make sure the request is protected by rq_refcount while
102 being processed in ptlrpc_server_handle_req_in().
106 Description: correct check for transno value in filter_finish_transno()
110 Description: Set body->eadatasize in mdc_getattr_pack()
114 Description: Handle unsent requests with rq_net_err in ptlrpc_check_set()
118 Description: apps stuck in ptlrpc_check_set() during direct I/O
119 Details : start async bulk unregistering at the same time at reply unlink
123 Description: lfs setstripe --pool broken in 1.8.4
124 Details : fix llapi_search_fsname() to handle relative pathnames
128 Description: Only force the mode change if we're changing the size as well
129 Details : Fix regression introduced in 1.8.0. The offending code was added
130 by commit 77ba4b2141d04180211efa8a75c11ab0abf7fafb to remove
131 setgid/setuid bits when do_truncate() is called on the file.
132 We should only force the change when that occurs, similarly to
133 ll_setattr() in lustre/llite/llite_lib.c
135 Severity : enhancement
137 Description: add procfs tunable to enable/disable lockless direct I/O
138 Details : llite.lustre-*.lockless_direct_io=0 will disable default semantics
139 of direct I/O that forces it to be lockless. lockless_direct_io
140 value, however, will be ignored if per-file LL_FILE_LOCKED_DIRECTIO
145 Description: Lustre servers crashing with NULL pointer errors
146 Details : Make sure the request is protected by rq_refcount while being
147 processed in ptlrpc_server_handle_req_in().
151 Description: kernel BUG at drivers/scsi/sd.c
152 Details : Remove sd iostats patch from sles11 patch series.
156 Description: Modified value of at_min is not taken into account
160 Description: lustre grants flock exclusive locks to two fd's in the same process
164 Description: LBUG in the mfs_verify_child()
168 Description: ASSERTION(atomic_read(&client_stat->nid_exp_ref_count) == 0)
169 Details : Reconnecting from a different nid causes old per-nid stats not to be
174 Description: create proper macro check for bdi interface.
176 -------------------------------------------------------------------------------
178 2010-10-29 Oracle, Inc.
180 * Support for kernels:
181 2.6.16.60-0.69.1 (SLES 10),
182 2.6.32.19-0.2.1 (SLES11),
183 2.6.18-194.17.1.el5 (RHEL 5)
184 2.6.18-194.17.1.0.1.el5 (OEL 5)
185 * Client support for unpatched kernels:
186 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
187 2.6.16 - 2.6.32 vanilla (kernel.org)
188 * Recommended e2fsprogs version: 1.41.10-sun2
189 * The async journal commit feature (bug 19128) and the cancel
190 lock before replay feature (bug 16774) are disabled by default.
194 Description: atime is not properly updated on an MDS
196 Severity : enhancement
198 Description: Update to RHEL5.5 kernel 2.6.19-194.17.1.el5.
199 Update to OEL5.5 kernel 2.6.19-194.17.1.0.1.el5.
201 Severity : enhancement
203 Description: Update to SLES10 SP3 kernel 2.6.16.60-0.69.1.
206 Frequency : only with SLES10
208 Description: Use OFED "KMP" provided by Novell
209 Details : SLES10 SP3 ships with OFED in a separate "KMP" package.
210 Lustre is now built against this package. That means you need to
211 install the ofed-kmp package from Novell for the patchless client
212 and from our download site for the server. Note that the ofed-kmp
213 that Novell ships may not exactly match the kernel version but
214 should still be compatible.
216 Severity : enhancement
218 Description: Update SLES11 SP1 kernel to 2.6.32.19-0.2.1.
222 Description: Enabling quotas fails with non-consecutive OST numbering.
226 Description: Fix kernel warning due to lookup_on_len() called without i_mutex
231 Description: Account direct i/o inflight rpcs separately from non-direct i/o so
232 that direct i/o, which is limited by max_rpcs_in_flight, should not
233 block non-direct i/o, which is not limited by max_rpcs_in_flight.
237 Description: Fix per-NID reporting on outstanding writes
241 Description: Reduce stack pressure by uninlining some mds and ptlrpc functions.
245 Description: Remove LASSERT in lprocfs_rd_conn_uuid() since conn == NULL is a
250 Description: fix obdo leak issue in ll_setattr_raw()
254 Description: limit MMP interval
256 Severity : enhancement
258 Description: add several lfs ost enhancements
262 Description: Too many default ACLs break directory access on new directories
266 Description: Lustre inode size is not coherent across nodes.
267 Details : Update lvbo from disk when AST fails with EINVAL. Lvbo is updated
268 on EINVAL error in ldlm_handle_ast_error(). The updates in
269 filter_intent_policy() and ldlm_cb_interpret() have been removed as
274 Description: Oops at __percpu_counter_add+0x1b
275 Details : Use bdi_init()/bdi_destroy() to proper initialize backing_dev_info
280 Description: add mount option to generate 32bit ino, this can be used for 32bit
281 application compatibility.
285 Description: keep reference count for "lli_sai" to prevent it to be released
286 when "statahead_enter()"
290 Description: allow quotacheck over OSTs with sparse indices
294 Description: Objects not getting deleted for files which have been removed
295 Details : ll_have_md_lock() should differentiate between CR and CW OPEN
300 Description: pin object's inode in memory to avoid certain timeouts
304 Description: fix LBUG when obdfilter-survey is interrupted.
306 -------------------------------------------------------------------------------
308 2010-07-31 Oracle, Inc.
310 * Support for kernels:
311 2.6.16.60-0.42.8 (SLES 10),
312 2.6.27.39-0.3.1 (SLES11),
313 2.6.18-194.3.1.el5 (RHEL 5)
314 2.6.18-194.3.1.0.1.el5 (OEL 5)
315 * Client support for unpatched kernels:
316 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
317 2.6.16 - 2.6.30 vanilla (kernel.org)
318 * Recommended e2fsprogs version: 1.41.10-sun2
319 * The async journal commit feature (bug 19128) and the cancel
320 lock before replay feature (bug 16774) are disabled by default.
322 Severity : enhancement
324 Description: Update RHEL5.5 kernel to 2.6.18-194.8.1.el5 and OEL5.5 kernel to
325 2.6.18-194.8.1.0.1.el5.
327 Severity : enhancement
329 Description: using inkernel OFED stack for rhel5 & oel5.
331 Severity : enhancement
333 Description: Add "lfs_migrate" script from manual into lustre/scripts and RPMs
334 Details : lfs_migrate does a "poor man's" migration of files from their
335 current OST layout to a new OST layout as chosen by the MDS.
339 Description: mds_orphan_add_link()) error linking orphan to PENDING
340 Details : quota limits might disallow linking orphans to PENDING
341 when unlinking a file - temporary raise threads' privileges
342 when processing unlinks.
344 Severity : enhancement
346 Description: add conf-param -d option to remove permanent settings.
347 Details : Add the ability to remove permanent lctl conf_param settings.
348 (Previously conf_param settings could only be changed, not
349 removed.) This also provides a method to change failover
350 nid locations. Improve lctl man page.
352 Severity : enhancement
354 Description: add list_param to b1_8 and add "-R" option to list params
357 Severity : enhancement
359 Description: lfs quota output is not very convenient for awk/sed-parsing
360 Details : Some positions in lfs quota output table could be empty or
361 non-empty which made it hard to parse it with scripts, now a dash
362 is put instead of space where there is not supposed to be any data.
364 Severity : enhancement
366 Description: fix obdfilter-survey script to work properly with remote oss-s
368 Severity : enhancement
370 Description: add new OBDFILTER_SURVEY test suite
372 Severity : enhancement
374 Description: add new multiple mount protection (MMP) test suite
376 Severity : enhancement
378 Description: add support for async journal commit in echo client
380 Severity : enhancement
382 Description: allow userland programs to include <lustre/lustre_idl.h>
383 from stardard include directories
385 Severity : enhancement
387 Description: The prune-icache-use-trylock is no longer needed now that
388 the patch from bug 20008 is landed.
392 Description: The shrink grant feature is still active on the client although the
393 connect flag is not set.
397 Description: Don't leak grant space if the write failed with quota exceeded.
401 Description: Don't consume grant space twice on recoverable resent.
405 Description: a race condition could lead to SIGBUS being sent to an
406 application using mmap-ped files from Lustre
407 Details : truncate_complete_page implementation for the patchless
408 client could arbitrarily unset PG_Uptodate flag for a
409 page being kicked from the page cache, an uptodate check
410 right after a readpage call in filemap_fault could fail
411 because of that as though the page read had been unsuccessful.
415 Description: dlm lock slab shrinking is not efficient
416 Details : The dlm_locks slab can grow significantly and consumes a lot of
417 memory on the server. Set a hardlimit to grant_plan.
421 Description: Lustre does not do 1MB IOs to HW RAID
422 Details : Bump MAX_PHYS/HW_SEGMENTS and SG_ALL to 256 in the RHEL5 kernel.
423 This is what we do already for SLES kernels.
427 Description: bump maximum number of phys/hw segments in the SLES11 kernel
428 until s/g chaining works properly.
433 Description: LSI Fusion MPT driver hacks to improve performance
434 Details : Set CONFIG_FUSION_MAX_SGE to 256 for RHEL5
436 Severity : enhancement
438 Description: increase default md stripe_cache_size to 16k
441 Bugzilla : 15587/21439
442 Description: don't handle security.capability xattr
443 Details : CONFIG_SECURITY_FILE_CAPABILITIES is enabled by default on SLES11.
444 This results in additional getxattr calls, causing VBR test
445 failures as well as a preformance drop when writing.
449 Description: obdfilter-survey is no longer working
450 Details : revert patch from bug 20355 to resolve an issue with lctl --threads
451 not working correctly with $(PTHREAD_LIBS) being linked to lctl.
455 Description: ll_shrink_cache does not handle __GFP_FS properly
459 Description: lfs getstripe shows wrong info for directories
460 Details : Set correct LOVEA default values for filesystem-wide.
464 Description: FSX checksum false positves due to mmap IO
465 Details : Use OBD_FL_MMAP flag for IOs on a memory mapped file. Do not print
466 checksum errors, if the flag is set on a request.
470 Description: file operations after eviction have successful return values
471 Details : use vfs ->flush callback to return any pending async errors
476 Description: mdsrate fails to write after 1.3+M files opened
477 Details : decrease memory usage on clients by recycling dentries and
482 Description: obdfilter-survey gives unreasonably high numbers
483 Details : Wait for all threads to complete when running test_brw.
487 Description: do not set lustre read_only device when server umount and keep
488 client records for recoverable ones
492 Description: move sync_on_lock_cancel tunable to the obdfilter layer
493 Details : move the tunable to trigger a journal flush on lock cancel from
494 the ost layer to the obdfilter layer. This tunable is useful
495 when using the async journal commit feature.
499 Description: exp->exp_nid_stats == NULL in filter_tally()
500 Details : fix race with per-nid stats by delaying procfs cleanup until
505 Description: extent lock cancellation on client can keep the cpu busy for too
510 Description: Do not fail OST activation when a llog is not found, just
511 issue an error message.
515 Description: Don't enable extents by default for MDT.
519 Description: Protect bitfield access to ptlrpc_request's rq_flags, since
520 the AT code can access it concurrently while sending early
525 Description: Disable lockless truncate by default since it is sometimes flawed
526 and causes the write_disjoint test to fail.
530 Description: OSSs which don't have the patch from bug 20278 can trigger an
533 Severity : enhancement
535 Description: don't print message to the console when we have not managed to
540 Description: The MDS fails to synchronize OSTs which registered with the MGS
541 after the MDT. The problem is that OBD_NOTIFY_CREATE events are
542 raised too early and thus discarded by the MDT stack.
543 The fix consists of issuing OBD_NOTIFY_CREATE event in the lov
548 Description: Fix race when the ping evictor and a service thread execute
549 target_recovery_check_and_stop() concurrently.
553 Description: quota broadcast can trigger a LBUG on the MDT if there are
558 Description: Resetting the lov_objid values to last_id reported by the OST
559 during orphan recovery is incorrect and can cause the same
560 objects to be allocated twice.
562 Severity : enhancement
564 Description: "weak-modules" support
565 Details : Implement "weak-modules" support which enables kernel modules
566 to be used with any kernel that implements the same kABI. In
567 order to achieve this modules are now installed in
568 /lib/modules/$(uname -r)/updates/kernel on all distributions.
570 Severity : enhancement
572 Description: add writeconf as mount option
574 Severity : enhancement
576 Description: produce debuginfo packages for SLES.
578 Severity : enhancement
580 Description: add failover nidlist to the import proc file.
582 Severity : enhancement
584 Description: fix LUSTRE_SEQ_MAX_WIDTH for interoperability between 1.8
585 clients and 2.0 servers.
587 Severity : enhancement
589 Description: lfs find -s does not work correctly because of a bug in
594 Description: ll_read_ahead_page() must validate the dlm lock before using
599 Description: Prevent failover nids from registering with MGS first.
603 Description: fix lock inversion in ll_setattr_raw().
607 Description: object allocation is not balanced across OSTs.
608 Details : osc_precreate() should return 0, if there are enough objects left.
610 -------------------------------------------------------------------------------
612 2010-04-30 Oracle, Inc.
614 * Support for kernels:
615 2.6.16.60-0.42.8 (SLES 10),
616 2.6.27.39-0.3.1 (SLES11),
617 2.6.18-164.11.1.el5 (RHEL 5)
618 2.6.18-164.11.1.0.1.el5 (OEL 5)
619 * Client support for unpatched kernels:
620 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
621 2.6.16 - 2.6.30 vanilla (kernel.org)
622 * Recommended e2fsprogs version: 1.41.10-sun2
623 * The async journal commit feature (bug 19128) and the cancel
624 lock before replay feature (bug 16774) are disabled by default.
628 Description: open-unlinked directories trigger MDS LBUG
629 Details : Fix regression introduced by the patch from bug 19640.
630 ext3_inc_count() can reset nlink to 1 when the directory
631 is indexed and inode->i_nlink == 2. Work around the problem
632 by incrementing nlink by 2 instead of 1.
636 Description: Reconnects are not throttled
637 Details : Don't wake up pinger on reconnect failures and rely on regular
638 pings to trigger the next reconnection. Please note that the
639 pinger already uses a smaller interval if the import is
643 Frequency : only with NFS export
645 Description: Console flooded with error message from ll_inode_from_lock()
647 Details : in mds_open, initialize the child_res_id before enqueuing
648 the OPEN lock for the child inode, then to avoid senting
649 wrong ldlm_res_id to client.
653 Description: allow multiple instances of the same nid in NID hash
654 Details : Case of multiple separate clients from the same NID (as
655 with liblustre) is legitimate and so we should allow
656 multiple instances of the same NID in nid hash.
660 Description: the readahead code can sleep on a semaphore while holding a
662 Details : in ras_update, "lov_get_info" could be called during increasing
663 readahead windows, which tries to get the mutex lock "lov_lock"
664 while holding the spin_lock "ras_lock", then causes system
669 Description: ASSERTION(cli->cl_avail_grant >= 0) failed
670 Details : fix assertion failure in the grant code.
674 Description: Use CNETERR (which is rate limited) in specific places in
675 the portal's LNET driver to avoid flooding the console.
679 Description: include last created object in precreate slow case
683 Description: don't do rep-ack if not created anything
684 Details : mds_open currently always put a lock into a rep-ack regardless
685 if something was created or not. This is pointless and only
686 creates needless contention. In fact the entire idea was to do
687 this for real creates as a recovery protection.
691 Description: Spurious error messages from smp_processor_id() on preemptible
693 Details : Disable a preemption by grabbing the lock in fs_trace_get_tcd()
694 first. The function fs_trace_get_tcd() was moved up.
698 Description: interval_erase() fix
699 Details : interval_erase() calls update_maxhigh() properly when child
704 Description: "lfs df" does not print stats for all mountpoints
705 Details : Print all mounted lustre filesystems with "lfs df"
709 Description: lfs setstripe -p not longer work with a relative pathname
710 Details : Use realpath() to provide absolute pathname.
714 Description: fix for truncated reply buffer
715 Details : reply buffer could be referred by reply_in_callback after released
720 Description: lustre.lov error when backing up symlinks with extended attributes
721 Details : Improved logic in ll_listxattr()
725 Description: properly handle null value for setattr -n lustre.lov
726 Details : Running "setfattr -n trusted.lov ." causes a NULL dereference
727 in ll_setxattr() due to no checking if "value" is NULL.
728 This command now resets to the default striping when executed
733 Description: stack overflow on lock cancellation due to fsync call
734 Details : sync_on_lock_cancel is needed for recovery when async journal
735 is enabled, but we actually just need to make sure that
736 metadata blocks have hit the journal, so doing a fs sync
737 should be enough and should consume less stack (just create an
738 empty handle and commmit it).
742 Description: using current->journal_info to store per-thread data leads
743 to problem under memory pressure
744 Details : disable the per-thread data (current->journal_info) containing
745 the lock info during I/O to work around the issue for short tem
749 Description: control DCACHE_LUSTRE_INVALID flag with MDS_INODELOCK_LOOKUP lock
750 Details : DCACHE_LUSTRE_INVALID is controlled by MDS_INODELOCK_LOOKUP
751 lock which is corresponding to "IT_LOOKUP", do not skip invalidate
756 Description: Cannot send after transport shutdown
757 Details : Clear imp_vbr_failed flag upon eviction
761 Description: soft lock in request set code during recovery
762 Details : during recovery, uses req->rq_set itself to replay the request
763 instead of ptlrpcd_recovery_pc
767 Description: Use CFS_ALLOC_IO instead of _STD in llap_from_page_with_lockh
768 Details : During an ll_readahead under ll_readpage, we have seen the the
769 OBD_SLAB_ALLOC hang under ldlm_pools_shrink when trying to lock
770 a page that is already locked by the readahead code.
775 Description: stop waitting for next replay transno if shutdown
776 Details : if the system is shutting down, wake up service thread blocked
777 to wait for next replay transno during recovery, then all the
778 references held by queued requests can be dropped and device
783 Description: lov_merge_lvb()) ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed
784 Details : Protect lli->lli_smd pointer updates with lli->lli_lock.
788 Description: per-nid stats should not access lustre-hash internal structures
793 Description: mount.lustre fails to pass some options to mount()
798 Description: ext4 extent allocation is slower than in ext3
799 Details : Increase the default value of MB_DEFAULT_ORDER2_REQS to 8,
800 enlarge ext4 preallocation table for 2048 4K blocks extents
805 Description: incorrect triggering of synchronous IO
806 Details : The OSC can mistakenly fall back to synchronous IO when the
807 max_dirty_mb limit is reached and no write requests have yet
808 been issued. This can occur when the dirty pages are spread
809 over many files all of which are below the optimal request size.
813 Description: Optimize quota_ctl operations by sending requests in parallel
814 Details : Send MDS->OST quota_ctl requests in parallel, do not resend.
815 Compiled from two attachments in the ticket.
819 Description: ordering issue between transaction start & i_mutex
820 Details : start the transaction earlier in llog_lvfs_destroy to get
821 transaction start and inode mutex lock nested properly.
825 Description: lru resize SLV can get stuck
826 Details : calculate SLV with a greater precision to not lose small
827 changes due to interger math truncation; round up SLV only
828 if the amount of granted locks less than the limit to not
829 get stuck with this SLV
833 Description: avoid divide-by-zero in lprocfs_rd_import()
837 Description: lfs quota failed when OSTs are down
838 Details : really return approximate block/inode usage when OSTs are down
842 Description: llobdstat fix and enhancement
843 Details : add a counter to set a limit to how many samples will be returned
844 fix a wildcard in the path to limit to obdfilter stats only
848 Description: recovery-small 51 hang - MDT can not stop, Mount still busy with
850 Details : abort lock enqueue processing sooner when we umount is in progress.
852 Severity : enhancement
854 Description: Update RHEL5.4 kernel to 2.6.18-164.11.1.el5 and OEL5.4 kernel to
855 2.6.18-164.11.1.0.1.el5.
857 Severity : enhancement
859 Description: error message improvements
860 Details : Use INFO/WARN instead of WARN/ERROR for the slow messages.
861 Simplify MDT/OST service start message.
862 Suppress "changing the import ..." warning.
865 Bugzilla : 21961/17914
866 Description: ignore trailing -mdc when determining index number
869 Frequency : only with SLES11
871 Description: fix for a race condition in linux quotas implementation
872 Details : dq_flags(struct dquot) access is not properly locked which could
873 lead to certain inconsistencies when accessing it using non-atomic
874 bit operations like __set_bit in do_set_dqblk.
878 Description: return any pending async errors in close(2) using flush callback
880 -------------------------------------------------------------------------------
882 2010-01-29 Sun Microsystems, Inc.
884 * Support for kernels:
885 2.6.16.60-0.42.8 (SLES 10),
886 2.6.27.39-0.3.1 (SLES11),
887 2.6.18-164.11.1.el5 (RHEL 5)
888 2.6.18-164.6.1.0.1.el5 (OEL 5)
889 * Client support for unpatched kernels:
890 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
891 2.6.16 - 2.6.30 vanilla (kernel.org)
892 * Recommended e2fsprogs version: 1.41.6.sun1
893 * The async journal commit feature (bug 19128) and the cancel
894 lock before replay feature (bug 16774) are disabled by default.
896 Severity : enhancement
898 Description: Update RHEL5.4 kernel to 2.6.18-164.11.1.el5 and
899 OEL5.4 kernel to 2.6.18-164.11.1.0.1.el5.
901 Severity : enhancement
902 Bugzilla : 21511/19848
903 Description: Update SLES11 kernel to 2.6.27.39-0.3.1.
905 Severity : enhancement
907 Description: Update supported SLES10 kernel to 2.6.16.60-0.42.8.
909 Severity : enhancement
911 Description: Update kernel to RHEL5.4 2.6.18-164.6.1.el5 and
912 OEL5 2.6.18-164.6.1.0.1.el5(Both in-kernel OFED enabled).
914 Severity : enhancement
916 Description: Build kernels (RHEL5, OEL5 and SLES10/11) using the vendor's own
919 Severity : enhancement
921 Description: Vanilla kernel 2.6.30 patchless client support.
926 Description: bad entry in directory xxx: inode out of bounds
927 Details : fix locking issue in the rename path which could race with any
928 other operations updating the same directory.
930 Severity : enhancement
932 Description: Make watchdog timer messages to be more clear and descriptive.
936 Description: cp -p command does not preserve the dates and timestamp
937 Details : mtime could be spoiled by a write callback
941 Description: Clear imp_force_reconnect correctly in ptlrpc_connect_interpret()
945 Description: Allow non-root access for "lfs check".
946 Details : Added a check in obd_class_ioctl() for OBD_IOC_PING_TARGET.
948 Severity : enhancement
950 Description: quotacheck performance/scaling issues
951 Details : reduce quotacheck time on empty filesystem by skipping uninit
954 Severity : enhancement
956 Description: Enhancement for lfs(1) command to use numeric uid/gid.
958 Severity : enhancement
960 Description: Adjust locks' extents on their first enqueue, so that at the time
961 they get granted, there is no need for another pass through the
962 queues since they are already shaped into the proper forms.
966 Description: Fix mds_shrink_intent_reply()/mds_intent_policy() to pass correct
967 arguments and prevent LBUG() in lustre_shrink_reply_v2().
971 Description: Change tunefs.lustre and mkfs.lustre --mountfsoptions so that
972 exactly the specified mount options are used. Leaving off
973 any "mandatory" mount options is an error. Leaving off any
974 default mount options causes a warning, but is allowed.
975 Change errors=remount-ro from mandatory to default. Sanitize
976 the mount string before storing it. Update man pages accordingly.
980 Description: mds_getattr() should return 0, even if mds_fid2entry() fails with
981 -ENOENT. Also fix in ptlrpc_expire_one_request() to print signed
984 Severity : enhancement
986 Description: Remove set_info(KEY_UNLINKED) from MDS/OSC
988 Severity : enhancement
990 Description: Clients can replay thousands of unused locks during recovery
991 Details : Don't replay unused locks (only read locks for now) during
992 recovery. This feature is disabled by default and can be
993 enabled by running the following command on the clients:
994 lctl get_param ldlm.cancel_unused_locks_before_replay
998 Description: can't stat file in some situation.
999 Details : improve initialize osc date when target is added to mds and
1000 ability to resend too big getattr request is client isn't have info
1005 Description: Prevent inconsistences between linux and lustre mount structures.
1006 Details : Wait indefinitely in server_wait_finished() until mnt_count drops.
1007 Make the sleep interruptible.
1009 Severity : enhancement
1011 Description: Communicate OST degraded/readonly state via statfs to MDS
1012 Details : Flags in the statfs returned from OSTs indicate whether the
1013 OST is in a degraded RAID state, or if the filesystem has
1014 turned read-only after a filesystem error is detected.
1019 Description: don't panic if EPROTO was hit when reading symlink
1020 Details : correctly handling request reference in error cases.
1025 Description: open sometimes returns ENOENT instead of EACCES
1026 Details : checking permission should be part of open part of mds_open, not
1027 lookup part. so server should be set DISP_OPEN_OPEN disposition
1028 before starting permission check.
1029 Also not need revalidate dentry if client already have LOOKUP lock.
1033 Description: enable client interface failover
1034 Frequency : on servers with multiple network interfaces
1035 Details : When a child reconnects from another NID, properly update export
1036 nid hash position and ldlm reverse import.
1038 Severity : enhancement
1040 Description: implemented direct I/O with arbitrary (nonaligned) memory
1041 addresses and file offsets.
1043 Severity : enhancement
1045 Description: added more recovery timeout options.
1047 Severity : enhancement
1049 Description: added llapi_file_open, llapi_file_create, llapi_file_get_stripe
1054 Description: Avoid deadlock for local client writes
1055 Frequency : only on systems with clients writing to an OST on the same node
1056 Details : Use new OBD_BRW_MEMALLOC flag to notify OST about writes in the
1057 memory freeing context. This allows OST threads to set the
1058 PF_MEMALLOC flag on task structures in order to allocate memory
1059 from reserved pools and complete IO.
1060 Use GFP_HIGHUSER for OST allocations for non-local client writes,
1061 so that the OST threads generate memory pressure and allow
1062 inactive pages to be reclaimed.
1067 Descriptoin: lock ordering violation between &cli->cl_sem and _lprocfs_lock
1068 Details : move ldlm namespace creation in setup phase to avoid grab
1069 _lprocfs_lock with cli_sem held.
1073 Frequency : only during format of test systems
1074 Description: Unable to run several mkfs.lustre on loop devices at the same time
1075 Details : mkfs.lustre returns error 256 on the concurrent loop devices
1076 formatting. The solution is to proper handle the error.
1078 Severity : enhancement
1080 Description: implement async create (obd_async_create) method for osc, to avoid
1081 too long waiting new ost objects with holding ldlm lock.
1085 Frequency : occasionally during network problems
1086 Description: client not allowed to reconnect to OST because of active request
1087 Details : abort bulk requests received by the OST once the client has timed
1088 out since the client will resend the request anyway. The client
1089 also now retries to reconnect to the same server if a connect
1090 request failed with EBUSY or -EAGAIN.
1093 Frequency : rare, if used wide striped file and one ost in down.
1095 Descriptoin: don't return error if we created a subset of objects for file.
1096 Details : lov_update_create_set() uses set->set_success as index for created
1097 objects, so if some requests failed, they will have hole at end of
1098 array and we can use qos_shrink_lsm for allocate correct lsm.
1102 Description: Slow stale export processing during normal start up
1103 Details : The global mgc lock prevents OST setup to be run in parallel.
1104 Replace the global lock with a per-config_llog_data semaphore.
1108 Description: Out or order replies might be lost on replay
1109 Details : In ptlrpc_retain_replayable_request if we cannot find retained
1110 request with tid smaller then one currently being added, add it
1111 to the start, not end of the list.
1115 Description: BUG: soft lockup - CPU#1 stuck for 10s! [ll_mdt_07:4523]
1116 Details : add cond_resched() calls to avoid hogging the cpu for too long
1117 in the hash code. Make also lustre_hash_for_each_empty() more
1120 Severity : enhancement
1122 Description: Performance improvements for debug messages with D_RPCTRACE,
1123 D_LDLM, D_QUOTA options.
1126 Frequency : only with NFS export
1128 Description: (lov_merge.c:74:lov_merge_lvb())
1129 ASSERTION(spin_is_locked(&lsm->lsm_lock)) failed (SR 71691004)
1130 Details : Fix a race in the nfs export code by populating inode
1131 info while the new inode is still locked
1133 Severity : enhancement
1135 Description: add a new file in procfs called force_lbug. Writting to this
1136 file triggers a LBUG. Only for test purpose.
1140 Description: OOM killer causes node hang
1141 Details : really interrupt the sleep in osc_enter_cache on signals
1145 Description: LustreError: 9153:0:(quota_context.c:622:dqacq_completion()) LBUG
1146 Details : fix race during quota release on the slave.
1148 Severity : enhancement
1150 Description: smaller hash bucket sizes, cleanups
1151 Details : increase hash table sizes and enabled rehashing for pools, quota,
1152 uuid, nid & per-nid stats.
1154 Severity : enhancement
1156 Description: Add ldiskfs maxdirsize mount option
1157 Details : add max_dir size mount option
1161 Description: panic in ll_statahead_thread
1162 Details : prevent parent thread to be killed before its child
1165 Frequency : only with 16TB device
1167 Description: unable to perform "mount -t lustre" of 16TB OST device
1168 Details : Mounting 16TB LUNs failed due to three bugs in mkfs.lustre.
1172 Description: ASSERTION(atomic_read(&imp->imp_inflight) == 0) failed
1173 Details : unregistering should be zero if no RPC inflight.
1177 Description: hyperion: Oops during metabench
1178 Details : Correct the refcount of lov_request_set
1180 Severity : enhancement
1182 Description: Add mptlinux and nxge drivers to Lustre builds
1184 Severity : enhancement
1186 Description: Fix watchdog timer message to be more clear
1187 Details : Make watchdog timer messages more clear and descriptive.
1191 Description: LNET soft lockups in socknal_cd thread
1192 Details : don't hog CPU for active-connecting if another connd is
1193 accepting connecting-requst from the same peer
1197 Description: recovery-small test_17 hang
1198 Details : Land several AT improvements & fixes.
1202 Description: MDS panic and hanging client processes
1203 Details : Replace exp_ops_stats with exp_nid_stats->nid_stats
1207 Description: OSS stuck in recovery.
1208 Details : fix race during recovery. class_unlink_export,
1209 class_set_export_delayed and target_queue_last_replay_reply
1210 may race while increasing/decreasing obd_recoverable_clients
1211 and obd_delayed_clients, causing recovery to wait forever.
1213 Severity : enhancement
1215 Description: add cascading_rw.c to lustre/tests
1219 Description: filter_last_id() NULL deref
1220 Details : lprocfs_filter_rd_last_id() should check for the fully
1221 setup obd device, before proceeding further.
1223 Severity : enhancement
1225 Description: Loadgen improvements
1226 Details : stacksize and locking fixes for loadgen
1230 Description: Quiet CERROR("dirty %d > system dirty_max %d\n"
1231 Details : The atomic_read() allowing the atomic_inc() are not covered
1232 by a lock. Thus they may safely race and trip this CERROR()
1233 unless we add in a small fudge factor (+1).
1235 Severity : enhancement
1237 Description: shrink_slab: nr=-9223362083340912175
1238 Details : fix spurious message from shrink_slab reporing negative nr
1242 Description: Quiet bogus previously committed transno error
1243 Details : suppress the "server went back in time" error message which
1244 is always printed even in the common case after a client eviction
1246 Severity : enhancement
1248 Description: Parallel statfs() calls result in client eviction
1249 Details : cache statfs data for 1s.
1253 Description: parallel-scale test_compilebench: @@@@@@ FAIL: compilebench
1255 Details : fix serveral issues in pinger code causing clients not to ping
1256 servers for too long, resulting in evictions.
1260 Description: e2fsck should warn when MMP update interval is extended
1261 Details : print mmp_check_interval and make it possible to abort mount
1262 operation in case it takes too long.
1266 Description: mdsrate-create-large.sh, BUG: soft lockup - CPU#0 stuck for 10s!
1267 Details : fix bug in the RHEL5's jbd2 callback patch.
1271 Description: drop number of active requests when queued for recovery
1272 Details : Now that we take a reference on the original request instead of
1273 making a copy of it for recovery. We need to drop the number of
1274 active requests or the queued requests will prevent all request
1275 processing when they exceed (srv->srv_threads_running - 1).
1277 Severity : enhancement
1279 Description: refuse to invalidate operational quota files when they are in use
1280 Details : an attempt to invalidate operational quota files on the quota
1281 master is not actually permitted by VFS (returning -EPERM), but we
1282 should not depend on that and should return the error earlier.
1286 Description: Applications stuck in jbd2_log_wait_commit during exit
1287 Details : fix deadlock between kjournald2 trying to acquire the page lock
1288 owned by an ost_io thread waiting for journal commit.
1290 -------------------------------------------------------------------------------
1292 2009-10-16 Sun Microsystems, Inc.
1294 * Support for kernels:
1295 2.6.16.60-0.42.4 (SLES 10),
1296 2.6.27.29-0.1 (SLES11, i686 & x84_64 only),
1297 2.6.18-128.7.1.el5 (RHEL 5),
1298 2.6.18-128.7.1.el5 (OEL 5),
1299 * Client support for unpatched kernels:
1300 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
1301 2.6.16 - 2.6.27 vanilla (kernel.org)
1302 * Recommended e2fsprogs version: 1.41.6.sun1
1303 * ext4 support for RHEL5 is experimental and thus should not be
1306 Severity : enhancement
1308 Description: Add OEL5 support.
1310 Severity : enhancement
1312 Description: Update kernel to SLES11 2.6.27.29-0.1.
1316 Description: File checksum failures with OST read cache on
1317 Details : Disable page poisoning when the bulk transfer has to be aborted
1318 because the client got evicted.
1322 Description: Don't allow make backward step on assiging osc next id.
1323 Details : race between allocation next id and ll_sync thread can be cause
1324 of set wrong osc next id and can be kill valid ost objects.
1326 Severity : enhancement
1328 Description: Update kernel to RHEL5 2.6.18-128.7.1.el5.
1330 Severity : enhancement
1332 Description: Update kernel to SLES10 SP2 2.6.16.60-0.42.4.
1336 Description: Changes in raid5-large-io-rhel5.patch to calculate sectors properly
1340 Description: Increase the default BLK_DEF_MAX_SECTORS value for RHEL5 and SLES11
1344 Description: Error handling in osc_statfs_interpret() has been improved.
1345 Details : Check in osc_statfs_interpret() for EBADR.
1349 Description: Do not send statfs() requests to OSTs disabled by administrator.
1350 Details : Check in lov_prep_statfs_set() for non-NULL ltd_exp.
1354 Description: Do not update ctime for the deleted inode.
1355 Details : Check in mds_reint_unlink() before calling fsfilt_setattr().
1359 Description: Increase of the size of the LDLM resource hash.
1360 Details : Bump up RES_HASH_BITS=12.
1364 Description: correctly send lsm on open replay
1365 Details : MDS is trust to LSM size on replay open, but client can set wrong size
1370 Description: Deadlock between filter_destroy() and filter_commitrw_write().
1371 Details : filter_destroy() does not hold the DLM lock over the whole
1372 operation. If the DLM lock is dropped, filter_commitrw() can go
1373 through, causing the deadlock between page lock and i_mutex.
1374 The i_alloc_sem should also be hold in filter_destroy() while
1375 truncating the file.
1379 Description: truncate starts GFP_FS allocation under transaction causing deadlock
1380 Details : ldiskfs_truncate calls grab_cache_page which may start page
1381 allocation under an open transaction. This may lead to
1382 calling prune_icache with consequent lustre reentrance.
1385 Frequency : only when down/upgrading the MDS to 1.6/1.8 while 1.8 clients are
1386 still up and when the OST pool feature is used
1388 Description: interop testing got LBUG when run dd with OST pool
1389 :LustreError: 30032:0:(llite_lib.c:1913:ll_replace_lsm()) LBUG
1390 Details : down/upgrading the MDS to a version that doesn't/does support OST
1391 pool can cause clients to crash because the lsm has changed
1396 Description: missing tree_status on 1.8.1 RPM build
1397 Details : make rpms failed due because the tree_status file is missing.
1401 Description: continuing LustreError "mds adjust qunit failed!"
1402 Details : don't print message on the console when ->adjust_qunit fails.
1406 Description: don't increase ldlm timeout if previous client was evicted
1407 Details : if a client doesn't respond to a blocking callback within the
1408 adaptive ldlm enqueue timeout, don't adjust the adaptive estimate
1409 when the lock is next granted.
1413 Description: ost is being unmounted w/o all writes to last_rcvd landing on disk.
1414 affects recovery negatively.
1415 Details : make sure all exports have been properly destroyed by the zombie
1416 thread processed before stopping the target.
1420 Description: Performance degradation with O_DIRECT between 1.6 & 1.8.1 b190
1421 Details : disable write barrier for ext4/SLES11.
1425 Description: Kernel panic - not syncing: Out of memory and no killable
1426 processes... on OSS when iozone
1427 Details : fix memory leak in the journal checksum patch.
1431 Description: group quota "too many blocks" OSS crashes
1432 Details : we should keep the same uid/gid for lquota_chkquota() and
1433 lquota_pending_commit()
1437 Description: LustreError: 9153:0:(quota_context.c:622:dqacq_completion()) LBUG
1438 Details : don't LBUG on release quota error. Just a workaround until the
1439 problem is understood.
1441 ------------------------------------------------------------------------------
1442 2009-07-31 Sun Microsystems, Inc.
1444 * Support for kernels:
1445 2.6.16.60-0.39.3 (SLES 10),
1446 2.6.27.23-0.1 (SLES11, i686 & x84_64 only),
1447 2.6.18-128.1.14.el5 (RHEL 5),
1448 * Client support for unpatched kernels:
1449 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
1450 2.6.16 - 2.6.27 vanilla (kernel.org)
1451 * Recommended e2fsprogs version: 1.41.6.sun1
1452 * File join has been disabled in this release, refer to Bugzilla 16929.
1453 * NFS export disabled when stack size < 8192. Since the NFSv4 export
1454 of Lustre filesystem with 4K stack may cause a stack overflow. For
1455 more information, please refer to bugzilla 17630.
1456 * ext4 support for RHEL5 is experimental and thus should not be
1459 Severity : enhancement
1461 Description: Update kernel to SLES10 SP2 2.6.16.60-0.39.3.
1464 Frequency : with 1.8 server and 1.6 clients
1466 Description: correctly shrink reply for avoid send too big message to client.
1467 Details : 1.8 mds is allocate to big buffer to LOV EA data and this produce
1468 some problems with sending this reply to 1.6 client.
1472 Description: Repeated atomic allocation failures.
1473 Details : Use GFP_HIGHUSER | __GFP_NOMEMALLOC flags for memory allocations
1474 to generate memory pressure and allow reclaiming of inactive pages.
1475 At the same time, do not allow to exhaust emergency pools.
1476 For local clients the use of GFP_NOFS will be introduced in 1.8.2
1478 Severity : enhancement
1479 Bugzilla : 19846, 18289
1480 Description: Update kernel to RHEL5 2.6.18-128.1.14.el5.
1482 Severity : enhancement
1483 Bugzilla : 19625, 16893, 18668, 19848
1484 Description: Add support for SLES11 2.6.27.23-0.1.
1486 Severity : enhancement
1488 Description: Update client support to vanila kernels up to 2.6.27.
1490 Severity : enhancement
1492 Description: Update kernel to SLES10 SP2 2.6.16.60-0.37.
1494 Severity : enhancement
1496 Description: Compile with -Werror by default for i686 and x86_64.
1500 Description: resolve race between obd_disconnect and class_disconnect_exports
1501 Details : if obd_disconnect will be called to already disconnected export he
1502 forget release one reference and osc module can't unloaded.
1504 Severity : enhancement
1506 Description: move AT tunable parameters for more consistent usage
1507 Details : add AT tunables under /proc/sys/lustre, add to conf_param parsing
1511 Descriptoin: correctly skip time estimate if in recovery
1512 Details : rq_send_state insn't bitmask so using bitwise ops is forbid.
1516 Descriptoin: OSS DeadLock
1517 Details : Use trylock to prevent deadlock when shrink icache.
1519 Severity : enhancement
1521 Description: Allow tuning service thread via /proc
1522 Details : For each service a new
1523 /proc/fs/lustre/{service}/*/thread_{min,max,started} entry is
1524 created that can be used to set min/max thread counts, and get the
1525 current number of running threads.
1527 Severity : enhancement
1529 Description: Add state history info file, enhance import info file
1530 Details : Track import connection state changes in a new osc/mdc proc file;
1531 add overview-type data to the osc/mdc import proc file.
1535 Description: Reduce small size read RPC
1536 Details : Set read-ahead limite for every file and only do read-ahead when
1537 available read-ahead pages are bigger than 1M to avoid small size
1542 Description: free_entry erroneously used groups_free instead of put_group_info
1544 Severity : enhancement
1546 Description: Make read-ahead stripe size aligned.
1548 Severity : enhancement
1550 Description: MDS create should not wait for statfs RPC while holding DLM lock.
1553 Frequency : rare, connect and disconnect target at same time
1555 Descriptoin: ASSERTION(atomic_read(&imp->imp_inflight) == 0
1556 Details : don't call obd_disconnect under lov_lock. this long time
1557 operation and can block ptlrpcd which answer to connect request.
1560 Frequency : start MDS on uncleanly shutdowned MDS device
1562 Descriptoin: ll_sync thread stay in waiting mds<>ost recovery finished
1563 Details : stay in waiting mds<>ost recovery finished produce random bugs
1564 due race between two ll_sync thread for one lov target. send
1565 ACTIVATE event only if connect realy finished and import have
1569 Frequency : start MDS on uncleanly shutdowned MDS device
1571 Descriptoin: aborting recovery hang on MDS
1572 Details : don't throttle destroy RPCs for the MDT.
1576 Description: Slow reads beyond 8Tb offsets.
1577 Details : Page index integer overflow in ll_read_ahead_page
1581 Description: Soft lockup on OSS after MDS failover
1582 Details : MSG_CONNECT_INITIAL is not set on the initial MDS->OST connect.
1583 As a conseqence, the patch from bug 18224 is not operational
1584 and the MDS export cannot be reused on the OSTs until it gets
1588 Frequency : rare, only if using MMP with Linux RAID
1590 Description: MMP doesn't work with Linux RAID
1591 Details : While using HA for Lustre servers with Linux RAID, it is possible
1592 that MMP will not detect multiple mounts. To make this work we
1593 need to unplug the device queue in RAID when the MMP block is being
1594 written. Also while reading the MMP block, we should read it from
1595 disk and not the cached one.
1598 Frequency : rare, during recovery
1600 Description: Assertion failure in ldlm_lock_put
1601 Details : Do not put cancelled locks into replay list, hold references on
1602 locks in replay list
1606 Description: 1.6.5 mdsrate performance is slower than 1.4.11/12 (MDS is not cpu bound!)
1607 Details : create_count always drops to the min value (=32) because grow_count
1608 is being changed before the precreate RPC completes.
1611 Frequency : Only in RHEL5 when mounting multiple ext3 filesystems
1614 Description: "kmem_cache_create: duplicate cache jbd_4k" error message
1615 Details : add proper locking for creation of jbd_4k slab cache
1619 Description: MMP check in ext3_remount() fails without displaying any error
1620 Details : When multiple mount protection fails during remount, proper error
1625 Description: Rare Client crash on resend if the file was deleted.
1626 Details : When file is opened, but open reply is lost and file is
1627 subsequently deleted before resend, resend processing logic
1628 breaks trying to open the file again, should not try to open.
1632 Description: add check for >8TB ldiskfs filesystems
1633 Details : ext3-based ldiskfs does not support greater than 8TB LUNs.
1634 Don't allow >8TB ldiskfs filesystems to be mounted without
1635 force_over_8tb mount option
1639 Description: Client locked up when running multiple instances of an app. on
1640 multiple mount points
1641 Details : ll_shrink_cache() can sleep while holding the ll_sb_lock.
1642 Convert ll_sb_lock to a read/write semaphore to fix the problem.
1646 Description: Cannot acces an NFS-mounted Lustre filesystem
1647 Details : An NFS client cannot access the Lustre filesystem NFS-mounted
1648 from a Lustre-client exporting the Lustre filesystem via NFS.
1652 Description: panic in ll_statahead_thread
1653 Details : grab dentry reference in parent process.
1656 -------------------------------------------------------------------------------
1658 tbd Sun Microsystems, Inc.
1660 * Support for kernels:
1661 2.6.16.60-0.31 (SLES 10),
1662 2.6.18-128.1.6.el5 (RHEL 5),
1663 2.6.22.14 vanilla (kernel.org)
1664 * Client support for unpatched kernels:
1665 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
1666 2.6.16 - 2.6.22 vanilla (kernel.org)
1667 * Recommended e2fsprogs version: 1.40.11-sun1
1668 * File join has been disabled, refer to Bugzilla 16929.
1669 * A new Lustre ADIO driver is available for MPICH2-1.0.7.
1670 * NFS export disabled when stack size < 8192. Since the NFSv4 export of
1671 Lustre filesystem with 4K stack may cause a stack overflow. For more
1672 information, please refer to bugzilla 17630.
1674 Severity : enhancement
1676 Description: Update to RHEL5.3 kernel-2.6.18-128.1.6.el5.
1678 Severity : enhancement
1680 Description: Update OFED release to 1.4.1 RC4
1682 Severity : major, only with big OST
1684 Description: Very poor metadata performance on Infiniband lustre configuration
1685 Details : OST object precreation becomes very slow on big OSTs. This is due
1686 to the ialloc patch spending too much time scanning groups.
1689 Frequency : during recovery
1691 Description: don't mix llog inodes with normal.
1692 Details : allocate inodes for log in last inode group
1697 Description: fix lqs' reference which won't be put in some situations
1698 Details : This patch fixes:
1699 1. In quota_check_common(), this function will check quota
1700 for user and group, but only send one return via "pending".
1701 In most cases, the pendings should be same. But that is not
1703 2. If quotaoff runs between lquota_chkquota() and
1704 lquota_pending_commit(), the same thing will happen too.
1705 That is why it comes:
1706 - if (!ll_sb_any_quota_active(qctxt->lqc_sb))
1709 -------------------------------------------------------------------------------
1711 2008-12-31 Sun Microsystems, Inc.
1713 * Support for kernels:
1714 2.6.5-7.314 (SLES 9),
1715 2.6.9-67.0.22.EL (RHEL 4),
1716 2.6.16.60-0.31 (SLES 10),
1717 2.6.18-92.1.17.el5 (RHEL 5),
1718 2.6.22.14 vanilla (kernel.org)
1719 * Client support for unpatched kernels:
1720 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
1721 2.6.16 - 2.6.22 vanilla (kernel.org)
1722 * Client support for unpatched kernels:
1723 we do not recommend using patchless RHEL4 clients with kernels
1724 prior to 2.6.9-55EL (RHEL4U5).
1725 * Recommended e2fsprogs version: 1.40.11-sun1
1726 * Note that reiserfs quotas are disabled on SLES 10 in this kernel.
1727 * RHEL 4 and RHEL 5/SLES 10 clients behaves differently on 'cd' to a
1728 removed cwd "./" (refer to Bugzilla 14399).
1729 * A new quota file format has been introduced in 1.6.5.
1730 The format conversion from prior releases is handled transparently,
1731 but releases older than 1.4.12/1.6.5 don't understand this new
1732 format. The automatic format conversion can be avoided by running
1733 the following command on the MDS:
1734 'tunefs.lustre --param="mdt.quota_type=ug1" $MDTDEV'.
1735 For more information, please refer to bugzilla 13904.
1736 * A new quota file format was introduced in 1.6.6/1.8.0 (kernels 2.6.16+).
1737 The format conversion from prior releases is handled transparently,
1738 but releases older than 1.6.6/1.8.0 don't understand this new
1739 format. The automatic format conversion can be avoided by running
1740 the following commands on the MDS and OSS servers (for
1741 pre 1.4.12-1.6.5 quota files):
1742 'tunefs.lustre --param="mdt.quota_type=ug1" $MDTDEV',
1743 'tunefs.lustre --param="ost.quota_type=ug1" $MDTDEV'
1744 or (for 1.4.12/1.6.5 quota files)
1745 'tunefs.lustre --param="mdt.quota_type=ug2" $MDTDEV',
1746 'tunefs.lustre --param="ost.quota_type=ug2" $MDTDEV'
1747 For more information, please refer to bugzilla 13904.
1748 * Output of lfs quota has been made less detailed by default,
1749 old (verbose) output can be obtained by using -v option.
1750 * File join has been disabled in this release, refer to Bugzilla 16929.
1751 * A new Lustre ADIO driver is available for MPICH2-1.0.7.
1752 * NFS export disabled when stack size < 8192. Since the NFSv4 export of
1753 Lustre filesystem with 4K stack may cause a stack overflow. For more
1754 information, please refer to bugzilla 17630.
1756 Severity : enhancement
1758 Description: Caching OSS
1759 Details : introduce data caching on the OSS. The OSS now relies on the linux
1760 kernel page cache to keep recently accessed data in memory.
1761 It is worth noting that all write requests are still flushed
1762 synchronously as in lustre 1.6.
1764 Severity : enhancement
1766 Description: Version based recovery
1767 Details : introduce finer grained recovery able to detect transaction
1768 dependencies and can deal with transaction gaps caused by clients
1769 failing at the same time as the server.
1771 Severity : enhancement
1773 Description: Enable adaptive timeouts by default
1774 Details : The Lustre timeout value in /proc/sys/lustre/timeout is now
1775 managed dynamically based on server load and should not need
1776 to be tuned manually based on cluster size. This allows Lustre
1777 to work under a wider variety of system sizes and loads, without
1778 unnecessarily causing lengthy recovery times.
1780 Severity : enhancement
1782 Description: Add OST Pools support
1783 Details : File striping can now be set to use an arbitrary pool of OSTs
1785 Severity : enhancement
1787 Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs
1788 Details : allow skip disconnected ost for send statfs request and hide error
1792 Frequency : rare, on llog test 6
1794 Descriptoin: don't allow connect to already connected import
1795 Details : allowing connect to already connected import is hide connecting problem.
1798 Frequency : rare, on failed llog setup
1800 Descriptoin: don't leak obd reference on failed llog setup
1801 Details : for failed llog setup - mgc forget call class_destroy_import for
1802 client import, move destroy import to more generic place.
1807 Descriptoin: allow kill process which wait statahead result
1808 Details : for some reasons 'ls' can stick in waiting result from statahead,
1809 in this case need way for kill this process.
1812 Frequency : rare, at shutdown
1814 Descriptoin: panic at umount
1815 Details : llap_shrinker can be raced with killing super block from list and
1816 this produce panic with access to already freeded pointer
1821 Descriptoin: don't lose wakeup for imp_recovery_waitq
1822 Details : recover_import_no_retry or invalidate_import and import_close can
1823 both sleep on imp_recovery_waitq, but we was send only one wakeup
1829 Descriptoin: panic in mds_open
1830 Details : don't confuse mds_finish_transno() with PTR_ERR(-ENOENT)
1835 Descriptoin: stuck in cache_remove_extent() or panic with accessing to already
1837 Details : release lock refernce only after add page to pages list.
1840 Frequency : always with long access acl
1842 Descriptoin: mds can't pack reply with long acl.
1843 Details : mds don't control size of acl but they limited by reint/getattr
1848 Frequency : on remount
1850 Description: external journal device not working after the remount
1851 Details : clear dev_rdonly flag for external journal devices in
1857 Description: shutdown vs evict race
1858 Details : client_disconnect_export vs connect request race.
1859 if client will evicted at this time - we start invalidate
1860 thread without referece to import and import can be freed
1866 Description: shrink LOV EAs before replying
1867 Details : correctly adjust LOV EA buffer for reply.
1872 Description: don't skip ost target if they assigned to file
1873 Details : Drop slow OSCs if we can, but not for requested start idx.
1874 This means "if OSC is slow and it is not the requested
1875 start OST, then it can be skipped, otherwise skip it only
1876 if it is inactive/recovering/out-of-space.
1878 Severity : enhancement
1880 Description: Update to RHEL5 kernel-2.6.18-92.1.17.el5.
1883 Frequency : rare, need acl's on inode.
1885 Description: client can't handle ost additional correctly
1886 Details : if ost was added after client connected to mds client can have
1887 hit lnet_try_match_md ... to big messages to wide striped files.
1888 in this case need teach client to handle config events about add
1889 lov target and update client max ea size at that event.
1891 Severity : enhancement
1893 Description: Update to sles9 kernel-2.6.5-7.314.
1895 Severity : enhancement
1897 Description: Update to SLES10 SP2 kernel-2.6.16.60-0.31.
1900 Frequency : Create a symlink file with a very long name
1902 Description: ldlm_cancel_pack()) ASSERTION(max >= dlm->lock_count + count)
1903 Details : If there is no extra space in the request for early cancels,
1904 ldlm_req_handles_avail() returns 0 instead of a negative value.
1909 Description: mds is deadlocked
1910 Details : in rare cases, inode in catalog can have i_no less than have parent
1911 i_no, this produce wrong order for locking during open, and
1912 parallel unlink can be lock open. this need teach mds_open to grab
1913 locks in resource id order, not at parent -> child order.
1915 Severity : enhancement
1917 Description: Add /proc entry for import status
1918 Details : The mdc, osc, and mgc import directories now have
1919 an import directory that contains useful import data for debugging
1920 connection problems.
1922 Severity : enhancement
1924 Description: Re-disable certain /proc logging
1925 Details : Enable and disable client's offset_stats, extents_stats and
1926 extents_stats_per_process stats logging on the fly.
1929 Frequency : Only on FC kernels 2.6.22+
1931 Description: oops in statahead
1932 Details : Do not drop reference count for the dentry from VFS when lookup,
1933 VFS will do that by itself.
1935 Severity : enhancement
1937 Description: Generic /proc file permissions
1938 Details : Set /Proc file permissions in a more generic way to enable non-
1939 root users operate on some /proc files.
1943 Description: Hitting mdc_commit_close() ASSERTION
1944 Details : Properly handle request reference release in
1945 ll_release_openhandle().
1949 Frequency : only patchless client
1950 Description: add workaround for race between add/remove dentry from hash
1952 Severity : enhancement
1954 Description: Allow OST glimpses to return PW locks
1958 Description: LBUG when llog conf file is full
1959 Details : When llog bitmap is full, ENOSPC should be returned for plain
1964 Description: Prevent import from entering FULL state when server in recovery
1968 Description: service mount cannot take device name with ":"
1969 Details : Only when device name contains ":/" will mount treat it as
1975 Description: replace ptlrpcd with the statahead thread to interpret the async
1976 statahead RPC callback
1980 Frequency : on recovery
1981 Description: I/O failures after umount during fail back
1982 Details : if client reconnected to restarted server we need join to recovery
1983 instead of find server handler is changed and process self eviction
1984 with cancel all locks.
1986 Severity : enhancement
1988 Description: Update to RHEL5 kernel-2.6.18-92.1.10.el5.
1992 Description: Kernel BUG tries to release flock
1993 Details : Lustre does not destroy flock lock before last reference goes
1994 away. So always drop flock locks when client is evicted and
1995 perform unlock regardless of successfulness of speaking to MDS.
1997 Severity : enhancement
1999 Description: Update to SLES10 SP2 kernel-2.6.16.60-0.27.
2001 Severity : enhancement
2003 Description: Upcall on Lustre log has been dumped
2004 Details : Allow for a user mode script to be called once a Lustre log has
2005 been dumped. It passes the filename of the dumped log to the
2006 script, the location of the script can be specified via
2007 /proc/sys/lnet/debug_log_upcall.
2012 Description: avoid messages about idr_remove called for id that is not allocated
2013 Details : Move assigment s_dev for clustered nfs to end of initialization,
2014 for avoid problem with error handling.
2019 Description: avoid Already found the key in hash [CONN_UNUSED_HASH] messages
2020 Details : When connection is reused this not moved from CONN_UNUSED_HASH into
2021 CONN_USED_HASH and this prodice warning when put connection again
2027 Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed
2028 Details : release reference to stats when client disconnected, not
2029 when export destroyed for avoid races when client destroyed
2030 after main ost export.
2034 Description: more cleanup in mds_lov
2035 Details : add workaround for get valid ost count for avoid warnings about
2036 drop too big messages, not init llog cat under semphore which
2037 can be blocked on reconnect and break normal replay, fix access
2040 Severity : enhancement
2042 Description: Export bytes_read/bytes_write count on OSC/OST.
2046 Description: Early reply size mismatch, MGC loses connection
2047 Details : Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so
2048 the connect flags are properly negotiated.
2052 Description: Properly propagate oinfo flags from lov to osc for statfs
2053 Details : restore missing copy oi_flags to lov requests.
2057 Description: exports in /proc are broken
2058 Details : recreate /proc entries for clients when they reconnect.
2060 Severity : enhancement
2062 Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8),
2063 : l_getgroups(8), lst(8), routerstat(8)
2064 Details : included man pages for llobdstat(8), llstat(8),
2065 : plot-llstat(8), l_getgroups(8), lst(8), routerstat(8)
2067 Severity : enhancement
2069 Description: Implement lustre ll_show_options method.
2071 Severity : enhancement
2073 Description: Update to SLES9 kernel-2.6.5-7.312.
2075 Severity : enhancement
2077 Description: Update to RHEL4 kernel-2.6.9-67.0.22.EL.
2081 Description: exports in /proc are broken
2082 Details : recreate /proc entries for clients when they reconnect.
2086 Description: don't fail open with -ERANGE
2087 Details : if client connected until mds will be know about real ost count
2088 get LOV EA can be fail because mds not allocate enougth buffer
2093 Description: Resolve device initialization race
2094 Details : Prevent proc handler from accessing devices added to the
2095 obd_devs array but yet be intialized.
2097 Severity : enhancement
2099 Description: configure's --enable-quota should check the
2100 : kernel .config for CONFIG_QUOTA
2101 Details : configure is terminated if --enable-quota is passed but
2102 : no quota support is in kernel
2104 Severity : enhancement
2106 Description: Update to SLES10 SP2 kernel-2.6.16.60-0.23.
2108 Severity : enhancement
2110 Description: Update to RHEL5 kernel-2.6.18-92.1.6.el5.
2114 Frequency : rare, on PPC clients
2115 Description: don't swab ost objects in response about directory, because
2117 Details : bug similar bug 14856, but in different function.
2119 Severity : enhancement
2121 Description: lfs quota tool enhancement
2122 Details : added units specifiers support for setquota, default to
2123 current uid/gid for quota report, short quota stats by
2124 default, nonpositional parameters for setquota, added
2125 llapi_quotactl manual page.
2127 Severity : enhancement
2129 Description: *optional* service tags registration
2130 Details : if the "service tags" package is installed on a Lustre node
2131 When the filesystem is mounted, a local-node service tag will
2132 be created. See http://inventory.sun.com/ for more information
2133 about the Service Tags asset management system.
2137 Description: Client runs out of low memory
2138 Details : Consider only lowmem when counting initial number of llap pages
2141 Frequency : occasional
2143 Description: add refcount for osc callbacks, so avoid panic on shutdown
2145 Severity : enhancement
2147 Description: Update to RHEL4 kernel-2.6.9-67.0.20.
2150 Frequency : testing only
2152 Description: sanity test 65a fails if stripecount of -1 is set
2153 Details : handle -1 striping on filesystem in ll_dirstripe_verify
2156 Frequency : only in unusual configurations
2158 Description: Kernel panic with find ost index.
2159 Details : lov_obd have panic if some OST's have sparse indexes.
2161 Severity : enhancement
2163 Description: Update to RHEL5 kernel-2.6.18-53.1.21.el5.
2166 Frequency : rarely, if filesystem is mounted with -o flock
2168 Description: do not process already freed flock
2169 Details : flock can possibly be freed by another thread before it reaches
2170 to ldlm_flock_completion_ast.
2173 Frequency : rarely, if filesystem is mounted with -o flock
2175 Description: LBUG during stress test
2176 Details : Need properly lock accesses the flock deadlock detection list.
2179 Frequency : rarely, if binaries are being run from Lustre
2181 Description: oops in page fault handler
2182 Details : kernel page fault handler can return two special 'pages' in
2183 error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM.
2186 Frequency : rarely, during shutdown
2188 Description: timeout with invalidate import.
2189 Details : ptlrpcd_check call obd_zombie_impexp_cull and wait request which
2190 should be handled by ptlrpcd. This produce long age waiting and
2191 -ETIMEOUT ptlrpc_invalidate_import and as result LASSERT.
2197 Description: ASSERTION(CheckWriteback(page,cmd)) failed
2198 Details : badly clear PG_Writeback bit in ll_ap_completion can produce false
2202 Frequency : only with broken builds/installations
2204 Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions
2205 Details : just return an error to a user, put a console error message
2207 Severity : enhancement
2209 Description: Update to RHEL5 kernel-2.6.18-53.1.19.el5.
2211 Severity : enhancement
2213 Description: Update to RHEL4 kernel-2.6.9-67.0.15.
2215 Severity : enhancement
2217 Description: enable MGS and MDT services start separately
2218 Details : add a 'nomgs' option in mount.lustre to enable start a MDT with
2219 a co-located MGS without starting the MGS, which is a complement
2220 to 'nosvc' mount option.
2223 Frequency : always, on big-endian systems
2225 Description: cleanup in ptlrpc code, related to PPC platform
2226 Details : store magic in native order avoid panic's in recovery on PPC
2227 node and forbid from this error in future. Also fix posibily
2228 of twice swab data. Fix get lov striping to userpace.
2231 Frequency : rarely, if replay get lost on server
2233 Description: server incorrectly drop resent replays lead to recovery failure.
2234 Details : do not drop replay according to msg flags, instead we check the
2235 per-export recovery request queue for duplication of transno.
2238 Frequency : after recovery
2240 Description: precreate to many object's after del orphan.
2241 Details : del orphan st in oscc last_id == next_id and this triger growing
2242 count of precreated objects. Set flag LOW to skip increase count
2243 of precreated objects.
2246 Frequency : rare, on clear nid stats
2248 Description: ASSERTION(client_stat->nid_exp_ref_count == 0)
2249 Details : when clean nid stats sometimes try destroy live entry,
2250 and this produce panic in free.
2253 Frequency : occasionally since 1.6.4
2255 Description: Stack overflow during MDS log replay
2256 Details : ease stack pressure by using a thread dealing llog_process.
2259 Frequency : very rare
2261 Description: MDT cannot be unmounted, reporting "Mount still busy"
2262 Details : Mountpoint references were being leaked during open reply
2263 reconstruction after an MDS restart. Drop mountpoint reference
2264 in reconstruct_open() and free dentry reference also.
2269 Description: wait until IO finished before start new when do lock cancel.
2270 Details : VM protocol want old IO finished before start new, in this case
2271 need wait until PG_writeback is cleared until check dirty flag
2272 and call writepages in lock cancel callback.
2277 Description: mds_mfd_close() ASSERTION(rc == 0)
2278 Details : In mds_mfd_close(), we need protect inode's writecount change
2279 within its orphan write semaphore to prevent possible races.
2282 Frequency : rare, on shutdown ost
2284 Description: don't hit live lock with umount ost.
2285 Details : shrink_dcache_parent can be in long loop with destroy dentries,
2286 use shrink_dcache_sb instead.
2289 Frequency : only when echo_client is used
2291 Description: don't panic with use echo_client
2292 Details : echo client pass NULL as client nid pointer and this produce NULL
2293 pointer dereference.
2296 Frequency : Always on 32-bit PowerPC systems
2298 Description: fix build on PPC32
2299 Details : compile code with -m64 flag produce wrong object file for PPC32.
2304 Description: MDS LBUG: ASSERTION(!IS_ERR(dchild))
2305 Details : In reconstruct_* functions, LASSERTs on both the data supplied
2306 by a client, and the data on disk are dangerous and incorrect.
2307 Change them with client eviction.
2309 Severity : enhancement
2311 Description: skiplist implementation simplification
2312 Details : skiplists are used to group compatible locks on granted list
2313 that was implemented as tracking first and last lock of each lock
2314 group the patch changes that to using doubly linked lists
2318 Description: delete compatibility for 32bit qdata
2319 Details : as planned, when lustre is beyond b1_8, lquota won't support 32bit
2320 qunit. That means servers of b1_4 and servers of b1_8 can't be
2321 used together if users want to use quota.
2324 Frequency : only with administrator action
2326 Description: mount failure if config log has invalid conf_param setting
2327 Details : If administrator specified an incorrect configuration parameter
2328 with "lctl conf_param" this would cause an error during future
2329 client mounts. Instead, ignore the bad configuration parameter.
2332 Frequency : blocks per group < blocksize*8 and uninit_groups is enabled
2334 Description: ldiskfs error: XXX blocks in bitmap, YYY in gd
2335 Details : If blocks per group is less than blocksize*8, set rest of the
2339 Frequency : Application do stride read on lustre
2341 Description: The read performance will drop a lot if the application does
2343 Details : Because the stride_start_offset are missing in stride read-ahead,
2344 it will cause clients read a lot of unused pages in read-ahead,
2345 then the read-performance drops.
2349 Description: more ldlm soft lockups
2350 Details : In ldlm_resource_add_lock(), call to ldlm_resource_dump()
2351 starve other threads from the resource lock for a long time in
2352 case of long waiting queue, so change the debug level from
2353 D_OTHER to the less frequently used D_INFO.
2355 Severity : enhancement
2357 Description: add -gid, -group, -uid, -user options to lfs find
2359 Severity : enhancement
2361 Description: ll_recover_lost_found_objs - recover objects in lost+found
2362 Details : OST corruption and subsequent e2fsck can leave objects in the
2363 lost+found directory. Using the "ll_recover_lost_found_objs"
2364 tool, these objects can be retrieved and data can be salvaged
2365 by using the object ID saved in the fid EA on each object.
2370 Description: this bug _only_ happens when inode quota limitation is very low
2371 (less than 12), so that inode quota unit is 1 at initialization.
2372 Details : if remaining quota equates 1, it is a sign to demonstate that quota
2373 is effective now. So least quota qunit should be 2.
2377 Description: Hung threads in invalidate_inode_pages2_range
2378 Details : The direct IO path doesn't call check_rpcs to submit a new RPC once
2379 one is completed. As a result, some RPCs are stuck in the queue
2384 Description: Procfs and llog threads access destoryed import sometimes.
2385 Details : Sync the import destoryed process with procfs and llog threads by
2386 the import refcount and semaphore.
2390 Description: mds fails to respond, threads stuck in ldlm_completion_ast
2391 Details : Sort source/child resource pair after updating child resource.
2396 Description: kernel BUG at ldiskfs2_ext_new_extent_cb
2397 Details : If insertion of an extent fails, then discard the inode
2398 preallocation and free data blocks else it can lead to duplicate
2403 Description: don't always update ctime in ext3_xattr_set_handle()
2404 Details : Current xattr code updates inode ctime in ext3_xattr_set_handle()
2405 In some cases the ctime should not be updated, for example for
2406 2.0->1.8 compatibility it is necessary to delete an xattr and it
2407 should not update the ctime.
2411 Description: add quota statistics
2412 Details : 1. sort out quota proc entries and proc code.
2413 2. add quota statistics
2418 Description: quotas are not honored with O_DIRECT
2419 Details : all writes with the flag O_DIRECT will use grants which leads to
2420 this problem. Now using OBD_BRW_SYNC to guard this.
2424 Bugzilla : 15713/16362
2425 Description: Assertion in iopen_connect_dentry in 1.6.3
2426 Details : looking up an inode via iopen with the wrong generation number can
2427 populate the dcache with a disconneced dentry while the inode
2428 number is in the process of being reallocated. This causes an
2429 assertion failure in iopen since the inode's dentry list contains
2430 both a connected and disconnected dentry.
2434 Description: assertion failure in ldlm_handle2lock()
2435 Details : fix a race between class_handle_unhash() and class_handle2object()
2436 introduced in lustre 1.6.5 by bug 13622.
2438 Severity : enhancement
2440 Description: superblock lock contention with many SMP cores on one client
2441 Details : several client filesystem locks were highly contended on SMP
2442 NUMA systems with 8 or more cores. Per-CPU datastructures
2443 and more efficient locking implemented to reduce contention.
2448 Description: Kernel BUG: sd_iostats_bump: unexpected disk index
2449 Details : remove the limit of 256 scsi disks in the sd_iostat patch
2454 Description: oops in sd_iostats_seq_show()
2455 Details : unloading/reloading the scsi low level driver triggers a kernel
2456 bug when trying to access the sd iostat file.
2461 Description: Kernel panics during QLogic driver reload
2462 Details : REQ_BLOCK_PC requests are not handled properly in the sd iostat
2463 patch, causing memory corruption.
2468 Description: journal_dev option does not work in b1_6
2469 Details : pass mount option during pre-mount.
2471 Severity : enhancement
2473 Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs
2474 Details : FIEMAP ioctl will allow an application to efficiently fetch the
2475 extent information of a file. It can be used to map logical blocks
2476 in a file to physical blocks in the block device.
2479 Frequency : only with adaptive timeout enabled
2481 Description: DEBUG_REQ() bad paging request
2482 Details : ptlrpc_at_recv_early_reply() should not modify req->rq_repmsg
2483 because it can be accessed by reply_in_callback() without the
2487 Frequency : only on Cray X2
2489 Description: X2 build failures
2490 Details : fix build failures on Cray X2.
2494 Description: xid & resent requests
2495 Details : Initialize RPC XID from clock at startup (randomly if clock is
2500 Description: quota recovery deadlock during mds failover
2501 Details : This patch includes att18982, att18236, att18237 in bz14840.
2503 1. fix osts hang when mds does failover with quotaon
2504 2. prevent watchdog storm when osts threads wait for the
2509 Description: kernel panic on racer
2510 Details : Do not access dchild->d_inode when IS_ERR(dchild) is true.
2512 Severity : enhancement
2514 Description: Add lustre_start utility to start or stop multiple Lustre servers
2519 Description: Lustre GPF in {:ptlrpc:ptlrpc_server_free_request+373}
2520 Details : In case of memory pressure, list_del() can be called twice on
2521 req->rq_history_list, causing a kernel oops.
2525 Description: (ptllnd_peer.c:557:kptllnd_peer_check_sends()) ASSERTION(!in_interrupt()) failed
2526 Details : fix stack overflow in the distributed lock manager by defering
2527 export eviction after a failed ast to the elt thread instead of
2528 handling it in the dlm interpret routine.
2530 Severity : enhancement
2532 Description: More exported tunables for mballoc
2533 Details : Add support for tunable preallocation window and new tunables for
2534 large/small requests
2538 Description: Detect corruption of block bitmap and checking for preallocations
2539 Details : Checks validity of on-disk block bitmap. Also it does better
2540 checking of number of applied preallocations. When corruption is
2541 found, it turns filesystem readonly to prevent further corruptions.
2545 Frequency : only for big-endian servers
2546 Description: Check for big-endian system while mounting fs with extents feature
2547 Details : Mounting a filesystem with extents feature will fail on big-endian
2548 systems since ext3-based ldiskfs is not supported on big-endian
2549 systems. Can be over-ridden with "bigendian_extents" mount option.
2553 Description: Excessive recovery window
2554 Details : With AT enabled, the recovery window can be excessively long (6000+
2555 seconds). To address this problem, we no longer use
2556 OBD_RECOVERY_FACTOR when extending the recovery window (the connect
2557 timeout no longer depends on the service time, it is set to
2558 INITIAL_CONNECT_TIMEOUT now) and clients report the old service
2559 time via pb_service_time.
2562 Descriptoin: Don't sync journal after every i/o
2563 Details : Implement write RPC replay to allow server replies for write RPCs
2564 before data is on disk. However, this feature is disabled by
2565 default since some issues leading to data corruptions have been
2566 found during recovery (e.g. bug 19128). This feature can be enabled
2567 by running the following command on the OSSs:
2568 lctl set_param obdfilter.*.sync_journal=0
2572 Description: Watchdog triggered on MDS failover
2573 Details : enable OBD_CONNECT_MDT flag when connecting from the MDS so that
2574 the OSTs know that the MDS "UUID" can be reused for the same export
2575 from a different NID, so we do not need to wait for the export to
2580 Descriptoin: Lustre detected file system corruption with inode out of bounds
2581 Details : don't update i_size on MDS_CLOSE for directories. This causes
2582 directory corruptions on the MDT.
2584 -------------------------------------------------------------------------------
2586 2008-05-26 Sun Microsystems, Inc.
2588 * Support for kernels:
2589 2.6.5-7.311 (SLES 9),
2590 2.6.9-67.0.7.EL (RHEL 4),
2591 2.6.16.54-0.2.5 (SLES 10),
2592 2.6.18-53.1.14.el5 (RHEL 5),
2593 2.6.22.14 vanilla (kernel.org)
2594 * Client support for unpatched kernels:
2595 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
2596 2.6.16 - 2.6.22 vanilla (kernel.org)
2597 * Due to problems with nested symlinks and FMODE_EXEC (bug 12652),
2598 we do not recommend using patchless RHEL4 clients with kernels
2599 prior to 2.6.9-55EL (RHEL4U5).
2600 * Recommended e2fsprogs version: 1.40.7-sun1
2601 * Note that reiserfs quotas are disabled on SLES 10 in this kernel.
2602 * RHEL 4 and RHEL 5/SLES 10 clients behaves differently on 'cd' to a
2603 removed cwd "./" (refer to Bugzilla 14399).
2604 * A new quota file format has been introduced in 1.6.5.
2605 The format conversion from prior releases is handled transparently,
2606 but releases older than 1.4.12/1.6.5 will not understand this new
2607 format. The automatic format conversion can be avoided by running
2608 the following command on the MDS before upgrading:
2609 'tunefs.lustre --param="mdt.quota_type=ug1" $MDTDEV'.
2610 For more information, please refer to bugzilla 13904.
2614 Description: quota performance fix
2615 Details : quota data is written in journalled mode instead of ordered to
2616 increase performance
2620 Description: lfs support for human-readable quota grace time strings
2621 Details : lfs setquota -t and lfs quota -t represent quota grace times
2622 in "XXwXXdXXhXXmXXs" format instead of large values in seconds
2625 Frequency : always with o2ib 1.3 and sles10
2627 Description: fix build with SLES10 and o2ib v3.
2628 Details : sles10 uses diffrent name for Module.symver file but configure
2629 assume this file has same name on RHEL/SLES/vanila kernels.
2632 Frequency : very rare, if additional xattrs are used on kernels >= 2.6.12
2634 Description: MDS may lose file striping (and hence file data) in some cases
2635 Details : If there are additional extended attributes stored on the MDS,
2636 in particular ACLs, SELinux, or user attributes (if user_xattr
2637 is specified for the client mount options) then there is a risk
2638 of attribute loss. Additionally, the Lustre file striping
2639 needs to be larger than default (e.g. striped over all OSTs),
2640 and an additional attribute must be stored initially in the
2641 inode and then increase in size enough to be moved to the
2642 external attribute block (e.g. ACL growing in size) for file
2645 Severity : enhancement
2647 Description: add message levels for liblustreapi
2652 Description: MDT cannot be unmounted, reporting "Mount still busy"
2653 Details : Mountpoint references were being leaked during open reply
2654 reconstruction after an MDS restart. Drop mountpoint reference
2655 in reconstruct_open() and free dentry reference also.
2660 Description: fix for occasional failure case of -ENOSPC in recovery-small tests
2661 Details : Move the 'good_osts' check before the 'total_bavail' check. This
2662 will result in an -EAGAIN and in the exit call path we call
2663 alloc_rr() which will with increasing aggressiveness attempt to
2664 aquire precreated objects on the minimum number of required OSCs.
2668 Description: Use old size assignment to avoid deadlock
2669 Details : This reverts the changes in bugs 2369 and bug 14138 that introduced
2670 the scheduling while holding a spinlock. We do not need locking
2671 for size in ll_update_inode() because size is only updated from
2672 the MDS for directories or files without objects, so there is no
2673 other place to do the update, and concurrent access to such inodes
2674 are protected by the inode lock.
2678 Description: Use __u64 instead of int for valid bits
2682 Description: resolve "_IOWR redefined" build error on SLES10
2686 Description: dump the memory debugging after all modules are unloaded to
2687 suppress false negative in conf_sanity test 39
2691 Description: the recovery timer never expires
2692 Details : for new client connect request, the recovery timer should not be
2693 reset, otherwise recovery timer will never expired, if the old
2694 client never come. Only old client connect and first connection
2695 req should trigger recovery timer reset.
2699 Description: the min numbers of lproc stats are wrong
2700 Details : adding a new constant LC_MIN_INIT and use it for initialization
2704 Frequency : always with interactive lfs
2706 Description: Reinitialize optind to 0 so that interactive lfs works in all cases
2709 Frequency : with multiple concurrent readdir processes in same directory
2710 Bugzilla : 15406, 15169, 15175
2711 Description: misc fixes for directory readahead.
2712 Details : prevent previous statahead async RPC callback from processing the
2713 current "statahead_info", race condition between async RPC callback
2714 add dentry into dentry hash table and "ls" thread revalidate such
2715 dentry, statahead his/miss control for hidden items, and so on.
2717 Severity : enhancement
2719 Description: build kernel-ib packages for OFED 1.3 in our release cycle
2723 Description: incore types cleaning in quota code (with respect to 64-bit limits)
2724 Details : several u32 variables declarations are replaced with u64 declarations
2729 Description: fix SLES kernel versioning
2730 Details : the kernel version for our SLES 10 kernel did not include a "-"
2731 before the "smp" at the end. while this was not a problem in
2732 general, it did mean that software trying to use the kernel
2733 version to try to detect a vendor specific kernel would fail.
2734 this was most evident by the OFED build scripts.
2739 Description: Don't update lov_desc members until making sure they are valid
2740 Details : When updating lov_desc members via proc fs, need fix their
2741 validities before doing the real update.
2744 Frequency : very rare
2746 Description: don't put request into delay list while invalidate in flight.
2747 Details : ptlrpc_delay_request sometimes put in delay list while invalidate
2748 import in flight. this produce timeout for invalidate and sometimes
2749 can cause stale data.
2751 Severity : enhancement
2753 Description: Update kernel to SLES9 2.6.5-7.311.
2755 Severity : enhancement
2757 Description: Update kernel to RHEL4 2.6.9-67.0.7.
2762 Frequency : on PPC only
2763 Description: not convert ost objects for directory because it's not exist.
2764 Details : ll_dir_getstripe assume dirrectory has ost objects but this wrong.
2766 Severity : enhancement
2768 Description: Fix warnings with compile liblustre at sles10/rhel5 which have
2769 __u64 as usingied long long type.
2772 Frequency : rare, on shutdown
2774 Description: race process ast vs remove callback
2775 Details : removing callback before disconnect import open race with
2776 processing callback.
2778 Severity : enhancement
2780 Description: Update kernel to SLES9 2.6.5-7.311.
2782 Severity : enhancement
2784 Description: Files open for execute are not marked busy on SLES10
2785 Details : Add FMODE_EXEC to SLES10 SP1 server kernel series.
2787 Severity : enhancement
2789 Description: Add server support for vanilla-2.6.22.14.
2792 Frequency : occasional
2794 Description: Avoid lov_create() getting stuck in obd_statfs_rqset()
2795 Details : If an OST is down the MDS will hang indefinitely in
2796 obd_statfs_rqset() waiting for the statfs data. While for
2797 MDS QOS usage of statfs, it should not stuck in waiting.
2799 Severity : enhancement
2801 Description: Disable adaptive timeouts by default
2804 Frequency : on network error
2806 Description: panic with double free request if network error
2807 Details : mdc_finish_enqueue is finish request if any network error occuring,
2808 but it's true only for synchronus enqueue, for async enqueue
2809 (via ptlrpcd) this incorrect and ptlrpcd want finish request
2813 Frequency : rare, on recovery
2815 Description: read procfs can produce deadlock in some situation
2816 Details : Holding lprocfs lock with send rpc can produce block for destroy
2817 obd objects and this also block reconnect with -EALREADY.
2818 This isn't fix all lprocfs bugs - but make it rare.
2820 Severity : enhancement
2822 Description: Update kernel to RHEL5 2.6.18-53.1.14.el5.
2825 Frequency : frequent on X2 node
2827 Description: mdc_set_open_replay_data LBUG
2828 Details : Set replay data for requests that are eligible for replay.
2833 Description: lustre_mgs: operation 101 on unconnected MGS
2834 Details : When MGC is disconnected from MGS long enough, MGS will evict the
2835 MGC, and late on MGC cannot successfully connect to MGS and a lot
2836 of the error messages complaining that MGS is not connected.
2839 Frequency : rare, depends on device drivers and load
2841 Description: MDS or OSS nodes crash due to stack overflow
2842 Details : Code changes in 1.6.4 increased the stack usage of some functions.
2843 In some cases, in conjunction with device drivers that use a lot
2844 of stack, the MDS (or possibly OSS) service threads could overflow
2845 the stack. One change which was identified to consume additional
2846 stack has been reworked to avoid the extra stack usage.
2848 Severity : enhancement
2850 Description: Update to RHEL5 latest kernel-2.6.18-53.1.13.el5.
2852 Severity : enhancement
2854 Description: Update to SLES10 SP1 latest kernel-2.6.16.54-0.2.5.
2856 Severity : enhancement
2858 Description: Update to RHEL5 latest kernel-2.6.18-53.1.6.el5.
2860 Serverity : enhancement
2862 Description: Update RHEL4 kernel to 2.6.9-67.0.4.
2865 Frequency : rare on shutdown OST
2867 Description: Don't allow skipping OSTs if index has been specified.
2868 Details : Don't allow skipping OSTs if index has been specified, make locking
2869 in internal create lots better.
2874 Description: ASSERTION(!PageDirty(page)) failed
2875 Details : Wrong check could lead to an assertion failure under specific
2881 Description: LBUG in ptlrpc_check_set() bad phase ebc0de00
2882 Details : access to bitfield in structure is always rounded to long
2883 and this produce problem with not atomic change any bit.
2888 Description: Lustre make rpms failed.
2889 Details : Remove ldiskfs spec file to avoids rpmbuild be confused when
2890 builds Lustre rpms from tarball.
2892 Severity : enhancement
2894 Description: Update to SLES9 SP4 kernel-2.6.5-7.308.
2897 Frequency : rare on shutdown OST
2899 Description: If llog cancel was not send before clean_exports phase, this can
2900 produce deadlock in llog code.
2901 Details : If llog thread has last reference to obd and call class_import_put
2902 this produce deadlock because llog_cleanup_commit_master wait when
2903 last llog_commit_thread exited, but this never success because was
2904 called from llog_commit_thread.
2907 Frequency : only if OST index is skipped
2909 Description: NULL lov_tgts causing MDS oops
2910 Details : more safe checks for NULL lov_tgts for avoid oops.
2912 Severity : enhancement
2914 Description: Update to RHEL4 latest kernel-2.6.9-67.0.1.EL.
2916 Severity : enhancement
2918 Description: Update to RHEL5 latest kernel-2.6.18-53.1.4.el5.
2923 Description: make mgs_setparam() handle fsname containing dash
2924 Details : fsname containing a dash does not work with lctl conf_param
2926 Severity : enhancement
2928 Description: Update to RHEL4 Update-6 kernel-2.6.9-67.EL.
2931 Frequency : rare, in recovery and (or) destroy lovobjid file.
2933 Description: rewrite lov objid code.
2934 Details : Cleanup for lov objid code, remove scability problems and wrong
2935 locking. Fix sending last_id into ost.
2937 Severity : enhancement
2939 Description: Update to SLES10 SP1 latest kernel-2.6.16.54-0.2.3.
2941 Severity : enhancement
2943 Description: Update to RHEL5 Update-1 kernel 2.6.18-53.el5.
2944 Details : Use d_move_locked instead of __d_move.
2947 Frequency : rare, at shutdown
2949 Description: access already free / zero obd_namespace.
2950 Details : if client_disconnect_export was called without force flag set,
2951 and exist connect request in flight, this can produce access to
2952 NULL pointer (or already free pointer) when connect_interpret
2953 store ocd flags in obd_namespace.
2956 Frequency : only at startup
2958 Description: not alloc memory with spinlock held.
2959 Details : allocation memory with GFP_KERNEL can produce sleep deadlock,
2960 if any spinlock held.
2965 Description: lfs find does not continue on file error
2966 Details : Continue other files processing when a file/dir is absent.
2970 Description: Inconsistent usage of lustre_pack_reply()
2971 Details : Standardize the usage of lustre_pack_reply() such that it
2972 always generate a CERROR on failure.
2975 Frequency : very rare
2977 Description: Fix replay if there is an un-replied request and open
2978 Details : In some cases, older replay request will revert the
2979 mcd->mcd_last_xid on MDS which is used to record the client's
2980 latest sent request.
2982 Severity : enhancement
2984 Description: Update to RHEL5 kernel 2.6.18-8.1.15.el5.
2986 Severity : enhancement
2988 Description: Update to SLES10 SP1 kernel 2.6.16.53-0.16
2990 Severity : enhancement
2992 Description: Update to SLES9 kernel-2.6.5-7.287.3.
2994 Severity : enhancement
2996 Description: Update to RHEL4 kernel-2.6.9-55.0.12.EL.
2998 Severity : enhancement
3000 Description: Build SLES10 patchless client fails
3001 Details : The configure was broken by run ./configure with
3002 --with-linux-obj=.... argument for patchless client. When the
3003 configure use --with-linux-obj, the LINUXINCLUDE= -Iinclude
3004 can't search header adequately. Use absolute path such as
3005 -I($LINUX)/include instead.
3007 Severity : enhancement
3009 Description: Lustre Page Accounting
3010 Details : New macros for page alloc and free which enable accounting
3011 of page allocation of Lustre. Use percpu counters to store memory
3012 and page statistics.
3015 Frequency : only if debugging is disabled
3017 Description: LASSERT_{REQ,REP}SWAB macros are buggy
3018 Details : If SWAB_PARANOIA is disabled, the LASSERT_REQSWAB and
3019 LASSERT_REPSWAB macros become no-ops, which is incorrect. Drop
3020 these macros and replace them with their definitions instead.
3025 Description: interrupt oig_wait produce painc on resend.
3026 Details : brw_redo_request can be used for resend requests from ptlrpcd and
3027 private set, and this produce situation when rq_ptlrpcd_data not
3028 copyed to new allocated request and triggered LBUG on assert
3029 req->rq_ptlrpcd_data != NULL. But this member used only for wakeup
3030 ptlrpcd set if request is changed and can be safety changed to use
3033 Severity : enhancement
3035 Description: organize the server-side client stats on per-nid basis
3036 Details : Change the structure of stats under obdfilter and mds to
3044 The "uuid"s file would list the uuids of _active_ exports.
3045 And the clear entry is to clear all stats and stale nids.
3050 Description: Processes looping in ll_readdir() on Lustre clients finally causing
3051 a full node pseudo-hang
3052 Details : Concurrent access to the same directory from multiple clients with
3053 intensive file creation/removal can cause a client node to spin in
3054 ll_readdir(). i_version must be increased every time the lock is
3055 cancelled to ensure a revalidate is done.
3060 Description: touch file failed when fs is not full
3061 Details : OST in recovery should not be discarded by MDS in alloc_qos(),
3062 otherwise we can get ENOSP while fs is not full.
3065 Frequency : only for Cray XT3
3066 Bugzilla : 12829/13455
3067 Description: Changing primary group doesn't change the group lustre assigns to
3069 Details : When CRAY_XT3 is defined, the fsgid supplied by the client is
3070 overridden with the primary group provided by the group upcall,
3071 whereas the supplied fsgid can be trusted if it is in the list of
3072 supplementary groups returned by the group upcall.
3074 Severity : enhancement
3076 Description: Root Squash Functionality
3077 Details : Implementation of NFS-like root squash capability. Specifically,
3078 don't allow someone with root access on a client node to be able
3079 to manipulate files owned by root on a server node.
3081 Severity : enhancement
3083 Description: Slow trucate/writes to huge files at high offsets.
3084 Details : Directly associate cached pages to lock that protect those pages,
3085 this allows us to quickly find what pages to write and remove
3086 once lock callback is received.
3091 Description: Too many locks accumulating on client during NFS usage
3092 Details : mds_open improperly used accmode to find out access mode to a
3093 file. Also mdc_intent_lock logic to find out if we already have
3094 lock similar to just received was flawed since introduction of
3095 skiplists - locks are now added to the front of the granted
3100 Description: Hit ASSERTION(obd->obd_stopping == 1) failed in some setup failed
3102 Details : In obd setup failure handler, obd_stopping will not necessarily to
3103 be 1, and obd_set_up should also be checked to make sure whether
3104 obd is completely setup.
3106 Severity : enhancement
3108 Description: Allow masking D_WARNING, D_ERROR messages from console
3109 Details : Console messages can now be disabled via lnet.printk.
3114 Description: User code with malformed file open parameter crashes client node
3115 Details : Before packing join_file req, all the related reference should be
3116 checked carefully in case some malformed flags cause fake join_file
3122 Description: shrink/enlarge qunit size when needed; fix the problem of coarse
3123 grain of quota doing harm to quota's accuracy
3124 Details : qunit size will be changed when quota limitation is too low/high;
3125 record the pending quota write in order to get more accureate
3126 quota; delete the patch for bug12588, which is unnecessary when
3127 this patch is landed. This bug also contains fixes for bug 14526,
3128 14299, 14601 and 13794.
3132 Description: LDLM_ENQUEUE races with LDLM_CP_CALLBACK
3133 Details : ldlm_completion_ast() assumes that a lock is granted when the req
3134 mode is equal to the granted mode. However, it should also check
3135 that LDLM_FL_CP_REQD is not set.
3139 Description: Heavy nfs access might result in deadlocks
3140 Details : After ELC code landed, it is now improper to enqueue any mds
3141 locks under och_sem, because enqueue might want to decide to
3142 cancel open locks for same inode we are holding och_sem for.
3146 Description: 35% write performance drop with ldiskfs2 when quotas are on
3147 Details : Enable ext3 journalled quota by default to improve performance
3148 when quotas are turned on.
3152 Description: Client eviction while running blogbench
3153 Details : A lot of unlink operations with concurrent I/O can lead to a
3154 deadlock causing evictions. To address the problem, the number of
3155 oustanding OST_DESTROY requests is now throttled to
3156 max_rpcs_in_flight per OSC and LDLM_FL_DISCARD_DATA blocking
3157 callbacks are processed in priority.
3160 Frequency : RHEL4 only
3162 Description: mkfs is very slow on IA64/RHEL4
3163 Details : A performance regression has been discovered in the MPT Fusion
3164 driver between versions 3.02.73rh and 3.02.99.00rh. As a
3165 consequence, we have downgraded the MPT Fusion driver in the RHEL4
3166 kernel from 3.02.99.00 to 3.02.73 until this problem is fixed.
3169 Frequency : PPC/PPC64 only
3171 Description: conflicts between asm-ppc64/types.h and lustre_types.h
3172 Details : fix duplicated definitions between asm-ppc64/types.h and
3173 lustre_types.h on PPC.
3176 Frequency : PPC/PPC64 only
3178 Description: asm-ppc/segment.h does not exist
3179 Details : fix compile issue on PPC.
3183 Description: data checksumming impacts single node performance
3184 Details : add support for several checksum algorithms. Currently, CRC32 and
3185 Adler-32 are supported. The checksum type can be changed on the fly
3186 through /proc/fs/lustre/osc/*/checksum_type.
3190 Description: use adler32 for page checksums
3191 Details : when available, use the Adler-32 algorithm instead of CRC32 for
3196 Description: better handle error messages in extents code
3198 Severity : enhancement
3200 Description: SNMP support enhancement
3201 Details : Adding total number of sampled request for an MDS node in snmp
3204 Severity : enhancement
3206 Description: Optimize ldlm waiting list processing for PR extent locks
3207 Details : When processing waiting list for read extent lock and meeting read
3208 lock that is same or wider to it that is not contended, skip
3209 processing rest of the list and immediatelly return current
3210 status of conflictness, since we are guaranteed there are no
3211 conflicting locks in the rest of the list.
3215 Description: Time out and refuse to reconnect
3216 Details : When the failover node is the primary node, it is possible
3217 to have two identical connections in imp_conn_list. We must
3218 compare not conn's pointers but NIDs, otherwise we
3219 can defeat connection throttling.
3223 Description: Client not clear own cache if answer to reconnect is lost.
3224 Details : Client gets evicted from server. Now client also thinks it is
3225 disconnected (or gets ENOTCONN on its operation) and decides to
3226 reconnect. Server receives reconnect message, but cannot find
3227 export. New export is created that is fully valid (new cookie!),
3228 but reply is lost and not reported to client. Client reconnects
3229 again and gets back a just-created connection, but it is not new
3230 so client thinks it was not evicted and does not do recovery.
3234 Description: Detect stride IO mode in read-ahead
3235 Details : When a client does stride read, read-ahead should detect that and
3236 read-ahead pages according to the detected stride pattern.
3240 Description: build for x2 fails
3241 Details : fix compile issue on Cray systems.
3243 Severity : enhancement
3245 Description: implement readv/writev APIs(aio_read/aio_writes in newer kernels)
3246 Details : This greatly improves speed of NFS writes on 2.6 kernels.
3249 Frequency : only on PPC/SLES10
3251 Description: "BITS_PER_LONG is not 32 or 64" in linux/idr.h
3252 Details : On SLES10/PPC, fs.h includes idr.h which requires BITS_PER_LONG to
3253 be defined. Add a hack in mkfs_lustre.c to work around this compile
3258 Description: LASSERT on MDS when client holding flock lock dies
3259 Details : ldlm pool logic depends on number of granted locks equal to
3260 number of released locks which is not true for flock locks, so
3261 just exclude such locks from consideration.
3265 Description: MDS deadlock with many ll_sync_lov threads and I/O stalled
3266 Details : Use fsfilt_sync() for both the whole filesystem sync and
3267 individual file sync to eliminate dangerous inode locking
3268 with I_LOCK that can lead to a deadlock.
3272 Description: Update an obsolete wirecheck.c generator
3273 Details : Update wirecheck.c/wirehdr.c and regenerate wiretest.c
3277 Description: Client can panic on open sometimes
3278 Details : It is possible that we try to free already freed request in
3279 ll_file_open in some error cases when we send request from
3284 Description: performance in 1.6.3
3285 Details : Force q->max_phys_segments to MAX_PHYS_SEGMENTS on SLES10 to be
3286 sure that 1MB requests are not fragmented by the block layer.
3290 Description: LDLM soft lockups - improvement
3291 Details : It is be possible to send the lock handle along with each read
3292 or write request because the client is already doing a lock match
3293 itself so there isn't any reason the OST should have to re-do that
3299 Description: lfs quota fails with deactivated OSTS
3300 Details : With this patch, three improvements are included:
3301 1. detete the softlimit in mds and osts when use "lfs quota".
3302 2. display the inaccurate data in the output of "lfs quota".
3303 3. try to get quota info when "lfs quota" is executed.
3308 Description: Extent locks not granted with no conflicts sometimes.
3309 Details : When race occurs in glimpse handler and nothing is returned,
3310 we do not reprocess the queue after lock cancel, and that leads
3311 to a stall until next activity on a resource
3314 Frequency : failover with quotaon
3316 Description: during mds failovers with quota on, OSTs got into deadlock state
3317 and causing dumpstack.
3318 Details : for every quota slave, at any time, there is only one quota req
3319 is sent to quota master for every uid/gid. Before that quota req
3320 returns, all the thread relative to the same uid/gid will wait.
3321 So if the quota req is lost because mds failovers or any other
3322 reasons, this bug will be hit. Now, dqacq_interpret() will handle
3323 quota reqs who time out.
3325 Severity : enhancement
3328 Description: when quota slave checks if quota is enough, there is an unnecessary
3330 Details : place this wait on necessary place instead of always waiting.
3332 --------------------------------------------------------------------------------
3334 2007-12-07 Cluster File Systems, Inc. <info@clusterfs.com>
3336 * Support for kernels:
3337 2.6.5-7.286 (SLES 9),
3338 2.6.9-55.0.9.EL (RHEL 4),
3339 2.6.16.53-0.8 (SLES 10),
3340 2.6.18-8.1.14.el5 (RHEL 5),
3341 2.6.18.8 vanilla (kernel.org)
3342 * Client support for unpatched kernels:
3343 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
3344 2.6.16 - 2.6.22 vanilla (kernel.org)
3345 * Due to recently discovered recovery problems, we do not recommend
3346 using patchless RHEL 4 clients with this or any earlier release.
3347 * Recommended e2fsprogs version: 1.40.2-cfs1
3348 * Note that reiserfs quotas are disabled on SLES 10 in this kernel.
3351 Frequency : occasional
3353 Description: MDS hang or stay in waiting lock
3354 Details : If client receive lock with CBPENDING flag ldlm need send lock
3355 cancel as separate rpc, to avoid situation when cancel request
3356 can't processed due all i/o threads stay in wait lock.
3359 Frequency : occasional
3361 Description: Do not fail import if osc_interpret_create gets -EAGAIN
3362 Details : If osc_interpret_create got -EAGAIN it immediately exits and
3363 wakeup oscc_waitq. After wakeup oscc_wait_for_objects call
3364 oscc_has_objects and see OSC has no objests and call
3365 oscc_internal_create to resend create request.
3367 Severity : enhancement
3369 Description: Update kernel patches for SLES10 2.6.16.53-0.8.
3370 Details : Update which_patch & target file for SLES10 latest kernel.
3372 Severity : enhancement
3374 Description: add --type and --size parameters to lfs find
3375 Details : Enhance lfs find by adding filetype and filesize parameters. Also
3376 multiple OBDs can now be specified for the --obd option.
3378 Severity : enhancement
3380 Description: eliminate client locks in face of contention
3381 Details : file contention detection and lockless i/o implementation
3382 for contended files.
3384 Severity : enhancement
3386 Description: Remove client patches from SLES 10 kernel.
3387 Details : This causes SLES 10 clients to behave as patchless clients
3388 even on a Lustre-patched (server) kernel.
3390 Severity : enhancement
3392 Description: use i_size_read and i_size_write in 2.6 port
3393 Details : replace inode->i_size access with i_size_read/write()
3396 Frequency : when removing large files
3398 Description: scheduling issue during removal of large Lustre files
3399 Details : Don't take the BKL in fsfilt_ext3_setattr() for 2.6 kernels.
3400 It causes scheduling issues when removing large files (17TB in the
3406 Description: 1.4.11 Can't handle directories with stripe set and extended ACLs
3407 Details : Impossible (EPROTO is returned) to access a directory that has a
3408 non-default striping and ACLs.
3411 Frequency : only on ppc
3413 Description: /proc/fs/lustre/devices broken on ppc
3414 Details : The patch as applied to 1.6.2 doesn't look correct for all arches.
3415 We should make sure the type of 'index' is loff_t and then cast
3416 explicitly as needed below. Do not assign an explicitly cast
3420 Frequency : only for rhel5
3422 Description: Kernel patches update for RHEL5 2.6.18-8.1.10.el5.
3423 Details : Modify the target file & which_kernel.
3426 Frequency : if the uninit_groups feature is enabled on ldiskfs
3428 Description: e2fsck reports "invalid unused inodes count"
3429 Details : If a new ldiskfs filesystem is created with the "uninit_groups"
3430 feature and only a single inode is created in a group then the
3431 "bg_unused_inodes" count is incorrectly updated. Creating a
3432 second inode in that group would update it correctly.
3437 Description: buffer overruns could theoretically occur
3438 Details : llapi_semantic_traverse() modifies the "path" argument by
3439 appending values to the end of the origin string, and a buffer
3440 overrun may occur. Adding buffer overrun check in liblustreapi.
3442 Severity : enhancement
3444 Description: Add jbd statistics patch for RHEL5 and 2.6.18-vanilla.
3447 Frequency : only if filesystem is inconsistent
3449 Description: handle "serious error: objid * already exists" more gracefully
3450 Details : If LAST_ID value on disk is smaller than the objects existing in
3451 the O/0/d* directories, it indicates disk corruption and causes an
3452 LBUG(). If the object is 0-length, then we should use the existing
3453 object. This will help to avoid a full fsck in most cases.
3455 Severity : enhancement
3457 Description: Kernel patches update for RHEL4 2.6.9-55.0.6.
3458 Details : Modify vm-tunables-rhel4.patch.
3460 Severity : enhancement
3462 Description: Kernel config for 2.6.18-vanilla.
3463 Details : Modify targets/2.6-vanilla.target.in.
3464 Add config file kernel-2.6.18-2.6-vanilla-i686.config.
3465 Add config file kernel-2.6.18-2.6-vanilla-i686-smp.config.
3466 Add config file kernel-2.6.18-2.6-vanilla-x86_64.config.
3467 Add config file kernel-2.6.18-2.6-vanilla-x86_64-smp.config.
3470 Frequency : occasional
3472 Description: improve handling recoverable errors
3473 Details : If request processed with error which can be recoverable on server
3474 request should be resend, otherwise page released from cache and
3480 Description: Kernel patches update for RHEL5 2.6.18-8.1.14.el5.
3481 Details : Modify target file & which_patch.
3482 A flaw was found in the IA32 system call emulation provided
3483 on AMD64 and Intel 64 platforms. An improperly validated 64-bit
3484 value could be stored in the %RAX register, which could trigger an
3485 out-of-bounds system call table access. An untrusted local user
3486 could exploit this flaw to run code in the kernel
3487 (ie a root privilege escalation). (CVE-2007-4573).
3491 Description: change order of libsysio includes
3492 Details : '#include sysio.h' should always come before '#include xtio.h'
3494 Severity : enhancement
3496 Description: adapt the lustre_config script to support the upgrade case
3497 Details : Add "-u" option for lustre_config script to support upgrading 1.4
3498 server targets to 1.6 in parallel.
3503 Description: To avoid grant space > avaible space when the disk is almost
3504 full. Without this patch you might see the error "grant XXXX >
3505 available" or some LBUG about grant, when the disk is almost
3507 Details : In filter_check_grant, for non_grant cache write, we should
3508 check the left space by if (*left > ungranted + bytes), instead
3509 of (*left > ungranted), because only we are sure the left space
3510 is enough for another "bytes", then the ungrant space should be
3511 increase. In client, we should update cl_avail_grant only there is
3512 OBD_MD_FLGRANT in the reply.
3517 Description: Update RHEL 4 kernel to fix local root privilege escalation.
3518 Details : Update to the latest RHEL 4 kernel to fix the vulnerability
3519 described in CVE-2007-4573. This problem could allow untrusted
3520 local users to gain root access.
3523 Frequency : when using O_DIRECT and quotas
3525 Description: Incorrect file ownership on O_DIRECT output files
3526 Details : block usage reported by 'lfs quota' does not take into account
3527 files that have been written with O_DIRECT.
3531 Description: (rw.c:1323:ll_read_ahead_pages()) ASSERTION(page_idx > ria->ria_stoff) failed
3532 Details : Once the unmatched stride IO mode is detected, shrink the stride-ahead
3533 window to 0. If it does hit cache miss, and read-pattern is still
3534 stride-io mode, does not reset the stride window, but also does not
3535 increase the stride window length in this case.
3537 --------------------------------------------------------------------------------
3539 2007-09-27 Cluster File Systems, Inc. <info@clusterfs.com>
3541 * Support for kernels:
3542 2.6.5-7.286 (SLES 9),
3543 2.6.9-55.0.2.EL (RHEL 4),
3544 2.6.16.46-0.14 (SLES 10),
3545 2.6.18-8.1.8.el5 (RHEL 5),
3546 2.6.18.8 vanilla (kernel.org)
3547 * Client support for unpatched kernels:
3548 (see http://wiki.lustre.org/index.php?title=Patchless_Client)
3549 2.6.16 - 2.6.21 vanilla (kernel.org)
3550 * Due to recently discovered recovery problems, we do not recommend
3551 using patchless RHEL 4 clients with this or any earlier release.
3552 * Recommended e2fsprogs version: 1.40.2-cfs1
3553 * Note that reiserfs quotas are disabled on SLES 10 in this kernel.
3557 Description: Fix errors in lfs documentation
3558 Details : Fixes man pages
3560 Severity : enhancement
3562 Description: Adaptive timeouts
3563 Details : RPC timeouts adapt to changing server load and network
3564 conditions to reduce resend attempts and improve recovery time.
3566 Severity : enhancement
3568 Description: llapi_file_create() does not allow some changes
3569 Details : add llapi_file_open() that allows specifying the file creation