X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=blobdiff_plain;f=lustre%2FChangeLog;h=57ae3e7c9fe32f60ed70bc3793c392ace3ea9b5e;hp=456edeeddd23c20eb8216b0efec417c33ea4df53;hb=db32b2829f86aaa5124e45d120b900d83a206877;hpb=c4ec46c0ca89b9c8ed5dec0763b6d7537363e65f diff --git a/lustre/ChangeLog b/lustre/ChangeLog index 456edee..57ae3e7 100644 --- a/lustre/ChangeLog +++ b/lustre/ChangeLog @@ -1,22 +1,253 @@ tbd Sun Microsystems, Inc. * version 1.8.0 * Support for kernels: - 2.6.9-67.0.4.EL (RHEL 4), - 2.6.16.54-0.2.3 (SLES 10), - 2.6.18-53.1.6.el5 (RHEL 5). + 2.6.16.54-0.2.5 (SLES 10), + 2.6.18-53.1.19.el5 (RHEL 5), + 2.6.22.14 vanilla (kernel.org). * Client support for unpatched kernels: (see http://wiki.lustre.org/index.php?title=Patchless_Client) 2.6.16 - 2.6.21 vanilla (kernel.org) - * Recommended e2fsprogs version: 1.40.4-cfs1 + * Recommended e2fsprogs version: 1.40.7-sun3 * Note that reiserfs quotas are disabled on SLES 10 in this kernel. * RHEL 4 and RHEL 5/SLES 10 clients behaves differently on 'cd' to a removed cwd "./" (refer to Bugzilla 14399). +Severity : enhancement +Bugzilla : 15741 +Description: Update to RHEL5 kernel-2.6.18-53.1.19.el5. + +Severity : major +Bugzilla : 14134 +Description: enable MGS and MDT services start separately +Details : add a 'nomgs' option in mount.lustre to enable start a MDT with + a co-located MGS without starting the MGS, which is a complement + to 'nosvc' mount option. + +Severity : normal +Frequency : always with o2ib 1.3 and sles10 +Bugzilla : 15870 +Description: fix build with SLES10 and o2ib v3. +Details : sles10 uses diffrent name for Module.symver file but configure + assume this file has same name on RHEL/SLES/vanila kernels. + +Severity : normal +Bugzilla : 14835 +Frequency : after recovery +Description: precreate to many object's after del orphan. +Details : del orphan st in oscc last_id == next_id and this triger growing + count of precreated objects. Set flag LOW to skip increase count + of precreated objects. + +Severity : normal +Bugzilla : 15139 +Frequency : rare, on clear nid stats +Description: ASSERTION(client_stat->nid_exp_ref_count == 0) +Details : when clean nid stats sometimes try destroy live entry, + and this produce panic in free. + +Severity : major +Bugzilla : 15575 +Description: Stack overflow during MDS log replay + ease stack pressure by using a thread dealing llog_process. + +Severity : normal +Bugzilla : 15443 +Description: wait until IO finished before start new when do lock cancel. +Details : VM protocol want old IO finished before start new, in this case + need wait until PG_writeback is cleared until check dirty flag and + call writepages in lock cancel callback. + +Severity : enhancement +Bugzilla : 14929 +Description: using special macro for print time and cleanup in includes. + +Severity : normal +Bugzilla : 12888 +Description: mds_mfd_close() ASSERTION(rc == 0) +Details : In mds_mfd_close(), we need protect inode's writecount change + within its orphan write semaphore to prevent possible races. + +Severity : minor +Bugzilla : 14929 +Description: Obsolete CURRENT_SECONDS and use cfs_time_current_sec() instead. + +Severity : minor +Bugzilla : 14645 + rare, on shutdown ost +Description: don't hit live lock with umount ost. +Details : shrink_dcache_parent can be in long loop with destroy dentries, + use shrink_dcache_sb instead. + +Severity : minor +Bugzilla : 14949 +Description: don't panic with use echo client +Details : echo client pass NULL as client nid pointer and this produce null + pointer dereference. + +Severity : normal +Bugzilla : 15278 +Description: fix build on ppc32 +Details : compile code with -m64 flag produce wrong object file for ppc32. + +Severity : normal +Bugzilla : 12191 +Description: add message levels for liblustreapi + +Severity : normal +Bugzilla : 13380 +Description: fix for occasional failure case of -ENOSPC in recovery-small tests +Details : Move the 'good_osts' check before the 'total_bavail' check. This + will result in an -EAGAIN and in the exit call path we call + alloc_rr() which will with increasing aggressiveness attempt to + aquire precreated objects on the minimum number of required OSCs. + +Severity : major +Bugzilla : 14326 +Description: Use old size assignment to avoid deadlock +Details : This reverts the changes in bugs 2369 and bug 14138 that introduced + the scheduling while holding a spinlock. We do not need locking + for size in ll_update_inode() because size is only updated from + the MDS for directories or files without objects, so there is no + other place to do the update, and concurrent access to such inodes + are protected by the inode lock. + +Severity : normal +Bugzilla : 14746 +Description: resolve "_IOWR redefined" build error on SLES10 + +Severity : normal +Bugzilla : 14763 +Description: dump the memory debugging after all modules are unloaded to + suppress false negative in conf_sanity test 39 + +Severity : enhancement +Bugzilla : 15316 +Description: build kernel-ib packages for OFED 1.3 in our release cycle + +Severity : minor +Bugzilla : 13969 +Frequency : always +Description: fix SLES kernel versioning +Details : the kernel version for our SLES 10 kernel did not include a "-" + before the "smp" at the end. while this was not a problem in + general, it did mean that software trying to use the kernel + version to try to detect a vendor specific kernel would fail. + this was most evident by the OFED build scripts. + +Severity : normal +Bugzilla : 14803 +Description: Don't update lov_desc members until making sure they are valid +Details : When updating lov_desc members via proc fs, need fix their + validities before doing the real update. + +Severity : normal +Bugzilla : 15069 +Description: don't put request into delay list while invalidate in flight. +Details : ptlrpc_delay_request sometimes put in delay list while invalidate + import in flight. this produce timeout for invalidate and sometimes + can cause stale data. + +Severity : minor +Bugzilla : 14856 +Frequency : on ppc only +Description: not convert ost objects for directory because it's not exist. +Details : ll_dir_getstripe assume dirrectory has ost objects but this wrong. + +Severity : normal +Bugzilla : 12652 +Description: Add FMODE_EXEC file flag for SLES10 SP1 kernel. + +Severity : enhancement +Bugzilla : 13397 +Description: Update to support 2.6.22.14 vanilla kernel. + +Severity : normal +Bugzilla : 14533 +Frequency : rare, on recovery +Description: read procfs can produce deadlock in some situation +Details : Holding lprocfs lock which send rpc can produce block for destroy + obd objects and this also block reconnect with -EALREADY. This isn't + fix all lprocfs bugs - but make it rare. + +Severity : enhancement +Bugzilla : 15152 +Description: Update kernel to RHEL5 2.6.18-53.1.14.el5. + +Severity : major +Frequency : frequent on X2 node +Bugzilla : 15010 +Description: mdc_set_open_replay_data LBUG +Details : Set replay data for requests that are eligible for replay. + +Severity : normal +Bugzilla : 14321 +Description: lustre_mgs: operation 101 on unconnected MGS +Details : When MGC is disconnected from MGS long enough, MGS will evict the + MGC, and late on MGC cannot successfully connect to MGS and a lot + of the error messages complaining that MGS is not connected. + +Severity : major +Bugzilla : 15027 +Frequency : on network error +Description: panic with double free request if network error +Details : mdc_finish_enqueue is finish request if any network error ocuring, + but it's true only for synchronus enqueue, for async enqueue + (via ptlrpcd) this incorrect and ptlrpcd want finish request + himself. + +Severity : enhancement +Bugzilla : 11401 +Description: client-side metadata stat-ahead during readdir(directory readahead) +Details : perform client-side metadata stat-ahead when the client detects + readdir and sequential stat of dir entries therein + +Severity : major +Frequency : on start mds +Bugzilla : 14884 +Description: Implement get_info(last_id) in obdfilter. + +Severity : normal +Frequency : occasional +Bugzilla : 13537 +Description: Correctly check stale fid, not start epoch if ost not support SOM +Details : open with flag O_CREATE need set old fid in op_fid3 because op_fid2 + overwrited with new generated fid, but mds can anwer with one of these + two fids and both is not stale. setattr incorectly start epoch and + assume will be called done_writeting, but without SOM done_writing + never called. + +Severity : major +Frequency : rare, depends on device drivers and load +Bugzilla : 14529 +Description: MDS or OSS nodes crash due to stack overflow +Details : Code changes in 1.8.0 increased the stack usage of some functions. + In some cases, in conjunction with device drivers that use a lot + of stack the MDS (or possibly OSS) service threads could overflow + the stack. One change which was identified to consume additional + stack has been reworked to avoid the extra stack usage. + +Severity : normal +Frequency : occasional +Bugzilla : 13730 +Description: Do not fail import if osc_interpret_create gets -EAGAIN +Details : If osc_interpret_create got -EAGAIN it immediately exits and + wakeup oscc_waitq. After wakeup oscc_wait_for_objects call + oscc_has_objects and see OSC has no objests and call + oscc_internal_create to resend create request. + +Severity : enhancement +Bugzilla : 14858 +Description: Update to SLES10 SP1 latest kernel-2.6.16.54-0.2.5. + +Severity : enhancement +Bugzilla : 14876 +Description: Update to RHEL5 latest kernel-2.6.18-53.1.13.el5. + Severity : normal Frequency : very rare Bugzilla : 3462 Description: Fix replay if there is an un-replied request and open -Details : In some cases, older replay request will revert the +Details : In some cases, older replay request will revert the mcd->mcd_last_xid on MDS which is used to record the client's latest sent request. @@ -37,35 +268,35 @@ Frequency : rare Bugzilla : 13196 Description: Don't allow skipping OSTs if index has been specified. Details : Don't allow skipping OSTs if index has been specified, make locking - in internal create lots better. + in internal create lots better. Severity : normal Bugzilla : 12228 Description: LBUG in ptlrpc_check_set() bad phase ebc0de00 -Details : access to bitfield in structure is always rounded to long - and this produce problem with not atomic change any bit. +Details : access to bitfield in structure is always rounded to long + and this produce problem with not atomic change any bit. Severity : normal Bugzilla : 13647 Description: Lustre make rpms failed. Details : Remove ldiskfs spec file to avoids rpmbuild be confused when - builds Lustre rpms from tarball. + builds Lustre rpms from tarball. Severity : normal Frequency : rare on shutdown ost Bugzilla : 14608 Description: If llog cancel was not send before clean_exports phase, this can - produce deadlock in llog code. + produce deadlock in llog code. Details : If llog thread has last reference to obd and call class_import_put - this produce deadlock because llog_cleanup_commit_master wait when - last llog_commit_thread exited, but this never success because was + this produce deadlock because llog_cleanup_commit_master wait when + last llog_commit_thread exited, but this never success because was called from llog_commit_thread. Severity : normal Bugzilla : 9977 Description: allow userland application know is lost one of stripes. Details : fill lvb_blocks with error code on ost and return it to - application if error flag found. + application if error flag found. Severity : normal Bugzilla : 14607 @@ -80,16 +311,16 @@ Severity : normal Bugzilla : 13375 Descriptoin: make lov_create() will not stuck in obd_statfs_rqset() Details : If an OST is down the MDS will hang indefinitely in - obd_statfs_rqset() waiting for the statfs data. While for + obd_statfs_rqset() waiting for the statfs data. While for MDS QOS usage of statfs, it should not stuck in waiting. Severity : enhancement Bugzilla : 11842 Description: remote_acl support Details : Support ACL-based permission check for remote user. - Support setfacl/getfacl for remote user with the utils - "lfs {l,r}{s,g}etfacl" which follow the same parameter format as - the system "{s,g}etfacl" utils. + Support setfacl/getfacl for remote user with the utils + "lfs {l,r}{s,g}etfacl" which follow the same parameter format as + the system "{s,g}etfacl" utils. Severity : enhancement Bugzilla : 14288 @@ -118,7 +349,7 @@ Frequency : rare, at shutdown Description: access already free / zero obd_namespace. Details : if client_disconnect_export was called without force flag set, and exist connect request in flight, this can produce access to - NULL pointer (or already free pointer) when connect_interpret + NULL pointer (or already free pointer) when connect_interpret store ocd flags in obd_namespace. Severity : minor @@ -136,7 +367,7 @@ Details : Make lustre randomly failed allocating memory for testing purpose. Severity : enhancement Bugzilla : 12702 Description: lost problems with lov objid file -Details : Fixes some scability and access to not inited memory problems +Details : Fixes some scability and access to not inited memory problems in work with lov objdid file. Severity : major @@ -178,18 +409,18 @@ Description: Update to RHEL4 latest kernel. Severity : enhancement Bugzilla : 13690 Description: Build SLES10 patchless client fails -Details : The configure was broken by run ./configure with +Details : The configure was broken by run ./configure with --with-linux-obj=.... argument for patchless client. When the configure use --with-linux-obj, the LINUXINCLUDE= -Iinclude - can't search header adequately. Use absolute path such as - -I($LINUX)/include instead. + can't search header adequately. Use absolute path such as + -I($LINUX)/include instead. Severity : normal Bugzilla : 13888 Description: interrupt oig_wait produce painc on resend. Details : brw_redo_request can be used for resend requests from ptlrpcd and private set, and this produce situation when rq_ptlrpcd_data not - copyed to new allocated request and triggered LBUG on assert + copyed to new allocated request and triggered LBUG on assert req->rq_ptlrpcd_data != NULL. But this member used only for wakeup ptlrpcd set if request is changed and can be safety changed to use rq_set directly. @@ -214,10 +445,10 @@ Details : This causes SLES 10 clients to behave as patchless clients Severity : enhancement Bugzilla : 2262 Description: self-adjustable client's lru lists -Details : use adaptive algorithm for managing client cached locks lru +Details : use adaptive algorithm for managing client cached locks lru lists according to current server load, other client's work - pattern, memory activities, etc. Both, server and client - side namespaces provide number of proc tunables for controlling + pattern, memory activities, etc. Both, server and client + side namespaces provide number of proc tunables for controlling things Severity : enhancement @@ -339,7 +570,7 @@ Details : set obd_health_check_timeout as 1.5x of obd_timeout Severity : normal Bugzilla : 12192 Description: llapi_file_create() does not allow some changes -Details : add llapi_file_open() that allows specifying the mode and +Details : add llapi_file_open() that allows specifying the mode and open flags, and also returns an open file handle. Severity : normal @@ -350,9 +581,9 @@ Details : Remove mnt_lustre_list in vfs_intent-2.6-rhel4.patch. Severity : normal Bugzilla : 10657 Description: Add journal checksum support.(Kernel part) -Details : The journal checksum feature adds two new flags i.e - JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and - JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag +Details : The journal checksum feature adds two new flags i.e + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and + JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Now commit record can be sent to disk without waiting for descriptor @@ -372,7 +603,7 @@ Details : execute lfs setstripe on client Severity : major Bugzilla : 12223 Description: mds_obd_create error creating tmp object -Details : When the user sets quota on root, llog will be affected and can't +Details : When the user sets quota on root, llog will be affected and can't create files and write files. Severity : normal @@ -380,7 +611,7 @@ Frequency : Always on ia64 patchless client, and possibly others. Bugzilla : 12826 Description: Add EXPORT_SYMBOL check for node_to_cpumask symbol. Details : This allows the patchless client to be loaded on architectures - without this export. + without this export. Severity : normal Bugzilla : 13039 @@ -440,10 +671,10 @@ Description: when mds and osts use different quota unit(32bit and 64bit), Details : void sending multiple quota reqs to mds, which will keep the status between the reqs. -Severity : normal +Severity : normal Bugzilla : 13125 Description: osts not allocated evenly to files -Details : change the condition to increase offset_idx +Details : change the condition to increase offset_idx Severity : critical Frequency : Always for filesystems larger than 2TB on 32-bit systems. @@ -458,7 +689,7 @@ Details : When generating the bio request for lustre file writes the Severity : normal Bugzilla : 11230 -Description: Tune the kernel for good SCSI performance. +Description: Tune the kernel for good SCSI performance. Details : Set the value of /sys/block/{dev}/queue/max_sectors_kb to the value of /sys/block/{dev}/queue/max_hw_sectors_kb in mount_lustre. @@ -500,8 +731,8 @@ Frequency : only on ppc Bugzilla : 12234 Description: /proc/fs/lustre/devices broken on ppc Details : The patch as applied to 1.6.2 doesn't look correct for all arches. - We should make sure the type of 'index' is loff_t and then cast - explicitly as needed below. Do not assign an explicitly cast + We should make sure the type of 'index' is loff_t and then cast + explicitly as needed below. Do not assign an explicitly cast loff_t to an int. Severity : normal @@ -520,14 +751,14 @@ Severity : normal Bugzilla : 13304 Frequency : Always, for kernels after 2.6.16 Description: Fix warning idr_remove called for id=.. which is not allocated. -Details : Last kernels save old s_dev before kill super and not allow +Details : Last kernels save old s_dev before kill super and not allow to restore from callback - restore it before call kill_anon_super. Severity : minor Bugzilla : 12948 Description: buffer overruns could theoretically occur Details : llapi_semantic_traverse() modifies the "path" argument by - appending values to the end of the origin string, and a + appending values to the end of the origin string, and a overrun may occur. Adding buffer overrun check in liblustreapi. Severity : normal @@ -560,12 +791,12 @@ Severity : critical Bugzilla : 13751 Description: Kernel patches update for RHEL5 2.6.18-8.1.14.el5. Details : Modify target file & which_patch. - A flaw was found in the IA32 system call emulation provided - on AMD64 and Intel 64 platforms. An improperly validated 64-bit - value could be stored in the %RAX register, which could trigger an - out-of-bounds system call table access. An untrusted local user - could exploit this flaw to run code in the kernel - (ie a root privilege escalation). (CVE-2007-4573). + A flaw was found in the IA32 system call emulation provided + on AMD64 and Intel 64 platforms. An improperly validated 64-bit + value could be stored in the %RAX register, which could trigger an + out-of-bounds system call table access. An untrusted local user + could exploit this flaw to run code in the kernel + (ie a root privilege escalation). (CVE-2007-4573). Severity : major Bugzilla : 13093 @@ -579,7 +810,7 @@ Bugzilla : 13454 Description: Add jbd statistics patch for RHEL5 and 2.6.18-vanilla Severity : minor -Bugzilla : 13732 +Bugzilla : 13732 Description: change order of libsysio includes Details : '#include sysio.h' should always come before '#include xtio.h' @@ -687,6 +918,165 @@ Bugzilla : 13829 Description: enable ACLs on MDS by default Details : ACLs must be enabled on MDS by default. +Severity : normal +Frequency : PPC/PPC64 only +Bugzilla : 14845 +Description: conflicts between asm-ppc64/types.h and lustre_types.h +Details : fix duplicated definitions between asm-ppc64/types.h and + lustre_types.h on PPC. + +Severity : normal +Frequency : PPC/PPC64 only +Bugzilla : 14844 +Description: asm-ppc/segment.h does not exist +Details : fix compile issue on PPC. + +Severity : normal +Bugzilla : 14864 +Description: better handle error messages in extents code + +Severity : normal +Frequency : RHEL4 only +Bugzilla : 14618 +Description: mkfs is very slow on IA64/RHEL4 +Details : A performance regression has been discovered in the MPT Fusion + driver between versions 3.02.73rh and 3.02.99.00rh. As a + consequence, we have downgraded the MPT Fusion driver in the RHEL4 + kernel from 3.02.99.00 to 3.02.73 until this problem is fixed. + +Severity : enhancement +Bugzilla : 14729 +Description: SNMP support enhancement +Details : Adding total number of sampled request for an MDS node in snmp + support. + +Severity : enhancement +Bugzilla : 14748 +Description: Optimize ldlm waiting list processing for PR extent locks +Details : When processing waiting list for read extent lock and meeting read + lock that is same or wider to it that is not contended, skip + processing rest of the list and immediatelly return current + status of conflictness, since we are guaranteed there are no + conflicting locks in the rest of the list. + +Severity : normal +Bugzilla : 14774 +Description: Time out and refuse to reconnect +Details : When the failover node is the primary node, it is possible + to have two identical connections in imp_conn_list. We must + compare not conn's pointers but NIDs, otherwise we can defeat + connection throttling. + +Severity : normal +Bugzilla : 13821 +Description: port llog fixes from b1_6 into HEAD +Details : Port llog reference couting and some llog cleanups from b1_6 + (bug 10800) into HEAD, for protect from panic and access to already + free llog structures. + +Severity : normal +Bugzilla : 14483 +Description: Detect stride IO mode in read-ahead +Details : When a client does stride read, read-ahead should detect that and + read-ahead pages according to the detected stride pattern. + +Severity : normal +Bugzilla : 13805 +Description: data checksumming impacts single node performance +Details : add support for several checksum algorithm. Currently, only CRC32 + and Adler-32 are supported. The checksum type can be changed on + the fly via /proc/fs/lustre/osc/*/checksum_type. + +Severity : normal +Bugzilla : 14648 +Description: use adler32 for page checksums +Details : when available, use the Adler-32 algorithm instead of CRC32 for + page checksums. + +Severity : normal +Bugzilla : 15033 +Description: build for x2 fails +Details : fix compile issue on Cray systems. + +Severity : normal +Bugzilla : 14379 +Description: Properly match for duplicate locks +Details : Due to different lock order from skiplists code, we need to + traverse entire list for now + +Severity : normal +Frequency : only on PPC/SLES10 +Bugzilla : 14855 +Description: "BITS_PER_LONG is not 32 or 64" in linux/idr.h +Details : On SLES10/PPC, fs.h includes idr.h which requires BITS_PER_LONG to + be defined. Add a hack in mkfs_lustre.c to work around this compile + issue. + +Severity : normal +Bugzilla : 14257 +Description: LASSERT on MDS when client holding flock lock dies +Details : ldlm pool logic depends on number of granted locks equal to + number of released locks which is not true for flock locks, so + just exclude such locks from consideration. + +Severity : normal +Bugzilla : 15188 +Description: MDS deadlock with many ll_sync_lov threads and I/O stalled +Details : Use fsfilt_sync() for both the whole filesystem sync and + individual file sync to eliminate dangerous inode locking + with I_LOCK that can lead to a deadlock. + +Severity : normal +Bugzilla : 14410 +Description: performance in 1.6.3 +Details : Force q->max_phys_segments to MAX_PHYS_SEGMENTS on SLES10 to be + sure that 1MB requests are not fragmented by the block layer. + +Severity : enhancement +Bugzilla : 11089 +Description: organize the server-side client stats on per-nid basis +Details : Change the structure of stats under obdfilter and mds to + New structure: + +- exports + +- nid#1 + | + stats + | + uuids + +- nid#2... + +- clear + The "uuid"s file would list the uuids of _active_ exports. + And the clear entry is to clear all stats and stale nids. + +Severity : enhancement +Bugzilla : 11270 +Description: eliminate client locks in face of contention +Details : file contention detection and lockless i/o implementation + for contended files. + +Severity : normal +Bugzilla : 15212 +Description: Reinitialize optind to 0 so that interactive lfs works in all cases + +Severity : critical +Frequency : very rare, if additional xattrs are used on kernels >= 2.6.12 +Bugzilla : 15777 +Description: MDS may lose file striping (and hence file data) in some cases +Details : If there are additional extended attributes stored on the MDS, + in particular ACLs, SELinux, or user attributes (if user_xattr + is specified for the client mount options) then there is a risk + of attribute loss. Additionally, the Lustre file striping + needs to be larger than default (e.g. striped over all OSTs), + and an additional attribute must be stored initially in the + inode and then increase in size enough to be moved to the + external attribute block (e.g. ACL growing in size) for file + data to be lost. + +Severity : normal +Bugzilla : 15346 +Description: skiplist implementation simplification +Details : skiplists are used to group compatible locks on granted list + that was implemented as tracking first and last lock of each lock group + the patch changes that to using doubly linked lists + -------------------------------------------------------------------------------- 2007-08-10 Cluster File Systems, Inc.