X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=blobdiff_plain;f=lustre%2FChangeLog;h=6daa824652c013899e29d4e3e0cfc6d299b05edf;hp=4b82bf75a14ca706fc4a617a578f31dfa1f15d88;hb=42a660527279112258b35ec840b59d3f4ad9420b;hpb=c6a162fe2726bd78b446648b19203d0195b61edb diff --git a/lustre/ChangeLog b/lustre/ChangeLog index 4b82bf7..6daa824 100644 --- a/lustre/ChangeLog +++ b/lustre/ChangeLog @@ -1,17 +1,171 @@ tbd Sun Microsystems, Inc. * version 1.8.0 * Support for kernels: - 2.6.5-7.287.3 (SLES 9), - 2.6.9-67.EL (RHEL 4), - 2.6.16.54-0.2.3 (SLES 10), - 2.6.18-53.1.4.el5 (RHEL 5). - 2.6.18.8 vanilla (kernel.org) + 2.6.9-67.0.4.EL (RHEL 4), + 2.6.16.54-0.2.5 (SLES 10), + 2.6.18-53.1.14.el5 (RHEL 5), + 2.6.22.14 vanilla (kernel.org). * Client support for unpatched kernels: (see http://wiki.lustre.org/index.php?title=Patchless_Client) - 2.6.9-42.0.10.EL (RHEL 4), 2.6.16 - 2.6.21 vanilla (kernel.org) - * Recommended e2fsprogs version: 1.40.2-cfs5 + * Recommended e2fsprogs version: 1.40.7-sun1 * Note that reiserfs quotas are disabled on SLES 10 in this kernel. + * RHEL 4 and RHEL 5/SLES 10 clients behaves differently on 'cd' to a + removed cwd "./" (refer to Bugzilla 14399). + +Severity : normal +Bugzilla : 12652 +Description: Add FMODE_EXEC file flag for SLES10 SP1 kernel. + +Severity : enhancement +Bugzilla : 13397 +Description: Update to support 2.6.22.14 vanilla kernel. + +Severity : normal +Bugzilla : 14533 +Frequency : rare, on recovery +Description: read procfs can produce deadlock in some situation +Details : Holding lprocfs lock which send rpc can produce block for destroy + obd objects and this also block reconnect with -EALREADY. This isn't + fix all lprocfs bugs - but make it rare. + +Severity : enhancement +Bugzilla : 15152 +Description: Update kernel to RHEL5 2.6.18-53.1.14.el5. + +Severity : major +Frequency : frequent on X2 node +Bugzilla : 15010 +Description: mdc_set_open_replay_data LBUG +Details : Set replay data for requests that are eligible for replay. + +Severity : normal +Bugzilla : 14321 +Description: lustre_mgs: operation 101 on unconnected MGS +Details : When MGC is disconnected from MGS long enough, MGS will evict the + MGC, and late on MGC cannot successfully connect to MGS and a lot + of the error messages complaining that MGS is not connected. + +Severity : major +Frequency : on start mds +Bugzilla : 14884 +Description: Implement get_info(last_id) in obdfilter. + +Severity : normal +Frequency : occasional +Bugzilla : 13537 +Description: Correctly check stale fid, not start epoch if ost not support SOM +Details : open with flag O_CREATE need set old fid in op_fid3 because op_fid2 + overwrited with new generated fid, but mds can anwer with one of these + two fids and both is not stale. setattr incorectly start epoch and + assume will be called done_writeting, but without SOM done_writing + never called. + +Severity : major +Frequency : rare, depends on device drivers and load +Bugzilla : 14529 +Description: MDS or OSS nodes crash due to stack overflow +Details : Code changes in 1.8.0 increased the stack usage of some functions. + In some cases, in conjunction with device drivers that use a lot + of stack the MDS (or possibly OSS) service threads could overflow + the stack. One change which was identified to consume additional + stack has been reworked to avoid the extra stack usage. + +Severity : normal +Frequency : occasional +Bugzilla : 13730 +Description: Do not fail import if osc_interpret_create gets -EAGAIN +Details : If osc_interpret_create got -EAGAIN it immediately exits and + wakeup oscc_waitq. After wakeup oscc_wait_for_objects call + oscc_has_objects and see OSC has no objests and call + oscc_internal_create to resend create request. + +Severity : enhancement +Bugzilla : 14858 +Description: Update to SLES10 SP1 latest kernel-2.6.16.54-0.2.5. + +Severity : enhancement +Bugzilla : 14876 +Description: Update to RHEL5 latest kernel-2.6.18-53.1.13.el5. + +Severity : normal +Frequency : very rare +Bugzilla : 3462 +Description: Fix replay if there is an un-replied request and open +Details : In some cases, older replay request will revert the + mcd->mcd_last_xid on MDS which is used to record the client's + latest sent request. + +Severity : enhancement +Bugzilla : 14720 +Description: Update to RHEL5 latest kernel-2.6.18-53.1.6.el5. + +Severity : enhancement +Bugzilla : 14482 +Description: Add rhel5 support to HEAD. + +Serverity : enhancement +Bugzilla : 14793 +Description: Update RHEL4 kernel to 2.6.9-67.0.4. + +Severity : minor +Frequency : rare +Bugzilla : 13196 +Description: Don't allow skipping OSTs if index has been specified. +Details : Don't allow skipping OSTs if index has been specified, make locking + in internal create lots better. + +Severity : normal +Bugzilla : 12228 +Description: LBUG in ptlrpc_check_set() bad phase ebc0de00 +Details : access to bitfield in structure is always rounded to long + and this produce problem with not atomic change any bit. + +Severity : normal +Bugzilla : 13647 +Description: Lustre make rpms failed. +Details : Remove ldiskfs spec file to avoids rpmbuild be confused when + builds Lustre rpms from tarball. + +Severity : normal +Frequency : rare on shutdown ost +Bugzilla : 14608 +Description: If llog cancel was not send before clean_exports phase, this can + produce deadlock in llog code. +Details : If llog thread has last reference to obd and call class_import_put + this produce deadlock because llog_cleanup_commit_master wait when + last llog_commit_thread exited, but this never success because was + called from llog_commit_thread. + +Severity : normal +Bugzilla : 9977 +Description: allow userland application know is lost one of stripes. +Details : fill lvb_blocks with error code on ost and return it to + application if error flag found. + +Severity : normal +Bugzilla : 14607 +Description: NULL lov_tgts causing MDS oops +Details : more safe checks for NULL lov_tgts for avoid oops. + +Severity : enhancement +Bugzilla : 14531 +Description: Update to RHEL4 latest kernel-2.6.9-67.0.1.EL. + +Severity : normal +Bugzilla : 13375 +Descriptoin: make lov_create() will not stuck in obd_statfs_rqset() +Details : If an OST is down the MDS will hang indefinitely in + obd_statfs_rqset() waiting for the statfs data. While for + MDS QOS usage of statfs, it should not stuck in waiting. + +Severity : enhancement +Bugzilla : 11842 +Description: remote_acl support +Details : Support ACL-based permission check for remote user. + Support setfacl/getfacl for remote user with the utils + "lfs {l,r}{s,g}etfacl" which follow the same parameter format as + the system "{s,g}etfacl" utils. Severity : enhancement Bugzilla : 14288 @@ -39,8 +193,8 @@ Bugzilla : 14260 Frequency : rare, at shutdown Description: access already free / zero obd_namespace. Details : if client_disconnect_export was called without force flag set, - and exist connect request in flight, this can produce access to - NULL pointer (or already free pointer) when connect_interpret + and exist connect request in flight, this can produce access to + NULL pointer (or already free pointer) when connect_interpret store ocd flags in obd_namespace. Severity : minor @@ -48,7 +202,7 @@ Bugzilla : 14418 Frequency : only at startup Description: not alloc memory with spinlock held. Details : allocation memory with GFP_KERNEL can produce sleep deadlock, - if any spinlock held. + if any spinlock held. Severity : enhancement Bugzilla : 12211 @@ -58,8 +212,8 @@ Details : Make lustre randomly failed allocating memory for testing purpose. Severity : enhancement Bugzilla : 12702 Description: lost problems with lov objid file -Details : Fixes some scability and access to not inited memory problems - in work with lov objdid file. +Details : Fixes some scability and access to not inited memory problems + in work with lov objdid file. Severity : major Frequency : always @@ -71,7 +225,7 @@ Severity : normal Bugzilla : 11791 Description: Inconsistent usage of lustre_pack_reply() Details : Standardize the usage of lustre_pack_reply() such that it - always generate a CERROR on failure. + always generate a CERROR on failure. Severity : major Frequency : occasional @@ -100,18 +254,18 @@ Description: Update to RHEL4 latest kernel. Severity : enhancement Bugzilla : 13690 Description: Build SLES10 patchless client fails -Details : The configure was broken by run ./configure with +Details : The configure was broken by run ./configure with --with-linux-obj=.... argument for patchless client. When the configure use --with-linux-obj, the LINUXINCLUDE= -Iinclude - can't search header adequately. Use absolute path such as - -I($LINUX)/include instead. + can't search header adequately. Use absolute path such as + -I($LINUX)/include instead. Severity : normal Bugzilla : 13888 Description: interrupt oig_wait produce painc on resend. Details : brw_redo_request can be used for resend requests from ptlrpcd and private set, and this produce situation when rq_ptlrpcd_data not - copyed to new allocated request and triggered LBUG on assert + copyed to new allocated request and triggered LBUG on assert req->rq_ptlrpcd_data != NULL. But this member used only for wakeup ptlrpcd set if request is changed and can be safety changed to use rq_set directly. @@ -136,10 +290,10 @@ Details : This causes SLES 10 clients to behave as patchless clients Severity : enhancement Bugzilla : 2262 Description: self-adjustable client's lru lists -Details : use adaptive algorithm for managing client cached locks lru +Details : use adaptive algorithm for managing client cached locks lru lists according to current server load, other client's work - pattern, memory activities, etc. Both, server and client - side namespaces provide number of proc tunables for controlling + pattern, memory activities, etc. Both, server and client + side namespaces provide number of proc tunables for controlling things Severity : enhancement @@ -261,7 +415,7 @@ Details : set obd_health_check_timeout as 1.5x of obd_timeout Severity : normal Bugzilla : 12192 Description: llapi_file_create() does not allow some changes -Details : add llapi_file_open() that allows specifying the mode and +Details : add llapi_file_open() that allows specifying the mode and open flags, and also returns an open file handle. Severity : normal @@ -272,9 +426,9 @@ Details : Remove mnt_lustre_list in vfs_intent-2.6-rhel4.patch. Severity : normal Bugzilla : 10657 Description: Add journal checksum support.(Kernel part) -Details : The journal checksum feature adds two new flags i.e - JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and - JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag +Details : The journal checksum feature adds two new flags i.e + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and + JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Now commit record can be sent to disk without waiting for descriptor @@ -294,7 +448,7 @@ Details : execute lfs setstripe on client Severity : major Bugzilla : 12223 Description: mds_obd_create error creating tmp object -Details : When the user sets quota on root, llog will be affected and can't +Details : When the user sets quota on root, llog will be affected and can't create files and write files. Severity : normal @@ -302,7 +456,7 @@ Frequency : Always on ia64 patchless client, and possibly others. Bugzilla : 12826 Description: Add EXPORT_SYMBOL check for node_to_cpumask symbol. Details : This allows the patchless client to be loaded on architectures - without this export. + without this export. Severity : normal Bugzilla : 13039 @@ -380,7 +534,7 @@ Details : When generating the bio request for lustre file writes the Severity : normal Bugzilla : 11230 -Description: Tune the kernel for good SCSI performance. +Description: Tune the kernel for good SCSI performance. Details : Set the value of /sys/block/{dev}/queue/max_sectors_kb to the value of /sys/block/{dev}/queue/max_hw_sectors_kb in mount_lustre. @@ -422,8 +576,8 @@ Frequency : only on ppc Bugzilla : 12234 Description: /proc/fs/lustre/devices broken on ppc Details : The patch as applied to 1.6.2 doesn't look correct for all arches. - We should make sure the type of 'index' is loff_t and then cast - explicitly as needed below. Do not assign an explicitly cast + We should make sure the type of 'index' is loff_t and then cast + explicitly as needed below. Do not assign an explicitly cast loff_t to an int. Severity : normal @@ -442,14 +596,14 @@ Severity : normal Bugzilla : 13304 Frequency : Always, for kernels after 2.6.16 Description: Fix warning idr_remove called for id=.. which is not allocated. -Details : Last kernels save old s_dev before kill super and not allow +Details : Last kernels save old s_dev before kill super and not allow to restore from callback - restore it before call kill_anon_super. Severity : minor Bugzilla : 12948 Description: buffer overruns could theoretically occur Details : llapi_semantic_traverse() modifies the "path" argument by - appending values to the end of the origin string, and a + appending values to the end of the origin string, and a overrun may occur. Adding buffer overrun check in liblustreapi. Severity : normal @@ -482,12 +636,12 @@ Severity : critical Bugzilla : 13751 Description: Kernel patches update for RHEL5 2.6.18-8.1.14.el5. Details : Modify target file & which_patch. - A flaw was found in the IA32 system call emulation provided - on AMD64 and Intel 64 platforms. An improperly validated 64-bit - value could be stored in the %RAX register, which could trigger an - out-of-bounds system call table access. An untrusted local user - could exploit this flaw to run code in the kernel - (ie a root privilege escalation). (CVE-2007-4573). + A flaw was found in the IA32 system call emulation provided + on AMD64 and Intel 64 platforms. An improperly validated 64-bit + value could be stored in the %RAX register, which could trigger an + out-of-bounds system call table access. An untrusted local user + could exploit this flaw to run code in the kernel + (ie a root privilege escalation). (CVE-2007-4573). Severity : major Bugzilla : 13093 @@ -501,7 +655,7 @@ Bugzilla : 13454 Description: Add jbd statistics patch for RHEL5 and 2.6.18-vanilla Severity : minor -Bugzilla : 13732 +Bugzilla : 13732 Description: change order of libsysio includes Details : '#include sysio.h' should always come before '#include xtio.h' @@ -551,13 +705,13 @@ Frequency : always Bugzilla : 13976 Description: touch file failed when fs is not full Details : OST in recovery should not be discarded by MDS in alloc_qos(), - otherwise we can get ENOSP while fs is not full. + otherwise we can get ENOSP while fs is not full. Severity : normal Bugzilla : 11301 Description: parallel lock callbacks Details : Instead of sending blocking and completion callbacks as separated - requests, adding them to a set and sending in parallel. + requests, adding them to a set and sending in parallel. Severity : normal Frequency : only for Cray XT3 @@ -574,6 +728,149 @@ Bugzilla : 14398 Description: Allow masking D_WARNING, D_ERROR messages from console Details : Console messages can now be disabled via lnet.printk. +Severity : normal +Bugzilla : 14614 +Description: User code with malformed file open parameter crashes client node +Details : Before packing join_file req, all the related reference should be + checked carefully in case some malformed flags cause fake + join_file req on client. + +Severity : normal +Bugzilla : 14225 +Description: LDLM_ENQUEUE races with LDLM_CP_CALLBACK +Details : ldlm_completion_ast() assumes that a lock is granted when the req + mode is equal to the granted mode. However, it should also check + that LDLM_FL_CP_REQD is not set. + +Severity : normal +Bugzilla : 14360 +Description: Heavy nfs access might result in deadlocks +Details : After ELC code landed, it is now improper to enqueue any mds + locks under och_sem, because enqueue might want to decide to + cancel open locks for same inode we are holding och_sem for. + +Severity : normal +Bugzilla : 13843 +Description: Client eviction while running blogbench +Details : A lot of unlink operations with concurrent I/O can lead to a + deadlock causing evictions. To address the problem, the number of + oustanding OST_DESTROY requests is now throttled to + max_rpcs_in_flight per OSC and LDLM_FL_DISCARD_DATA blocking + callbacks are processed in priority. + +Severity : normal +Bugzilla : 13829 +Description: enable ACLs on MDS by default +Details : ACLs must be enabled on MDS by default. + +Severity : normal +Frequency : PPC/PPC64 only +Bugzilla : 14845 +Description: conflicts between asm-ppc64/types.h and lustre_types.h +Details : fix duplicated definitions between asm-ppc64/types.h and + lustre_types.h on PPC. + +Severity : normal +Frequency : PPC/PPC64 only +Bugzilla : 14844 +Description: asm-ppc/segment.h does not exist +Details : fix compile issue on PPC. + +Severity : normal +Bugzilla : 14864 +Description: better handle error messages in extents code + +Severity : normal +Frequency : RHEL4 only +Bugzilla : 14618 +Description: mkfs is very slow on IA64/RHEL4 +Details : A performance regression has been discovered in the MPT Fusion + driver between versions 3.02.73rh and 3.02.99.00rh. As a + consequence, we have downgraded the MPT Fusion driver in the RHEL4 + kernel from 3.02.99.00 to 3.02.73 until this problem is fixed. + +Severity : enhancement +Bugzilla : 14729 +Description: SNMP support enhancement +Details : Adding total number of sampled request for an MDS node in snmp + support. + +Severity : enhancement +Bugzilla : 14748 +Description: Optimize ldlm waiting list processing for PR extent locks +Details : When processing waiting list for read extent lock and meeting read + lock that is same or wider to it that is not contended, skip + processing rest of the list and immediatelly return current + status of conflictness, since we are guaranteed there are no + conflicting locks in the rest of the list. + +Severity : normal +Bugzilla : 14774 +Description: Time out and refuse to reconnect +Details : When the failover node is the primary node, it is possible + to have two identical connections in imp_conn_list. We must + compare not conn's pointers but NIDs, otherwise we can defeat + connection throttling. + +Severity : normal +Bugzilla : 13821 +Description: port llog fixes from b1_6 into HEAD +Details : Port llog reference couting and some llog cleanups from b1_6 + (bug 10800) into HEAD, for protect from panic and access to already + free llog structures. + +Severity : normal +Bugzilla : 14483 +Description: Detect stride IO mode in read-ahead +Details : When a client does stride read, read-ahead should detect that and + read-ahead pages according to the detected stride pattern. + +Severity : normal +Bugzilla : 13805 +Description: data checksumming impacts single node performance +Details : add support for several checksum algorithm. Currently, only CRC32 + and Adler-32 are supported. The checksum type can be changed on + the fly via /proc/fs/lustre/osc/*/checksum_type. + +Severity : normal +Bugzilla : 14648 +Description: use adler32 for page checksums +Details : when available, use the Adler-32 algorithm instead of CRC32 for + page checksums. + +Severity : normal +Bugzilla : 15033 +Description: build for x2 fails +Details : fix compile issue on Cray systems. + +Severity : normal +Bugzilla : 14379 +Description: Properly match for duplicate locks +Details : Due to different lock order from skiplists code, we need to + traverse entire list for now + +Severity : normal +Frequency : only on PPC/SLES10 +Bugzilla : 14855 +Description: "BITS_PER_LONG is not 32 or 64" in linux/idr.h +Details : On SLES10/PPC, fs.h includes idr.h which requires BITS_PER_LONG to + be defined. Add a hack in mkfs_lustre.c to work around this compile + issue. + +Severity : normal +Bugzilla : 14257 +Description: LASSERT on MDS when client holding flock lock dies +Details : ldlm pool logic depends on number of granted locks equal to + number of released locks which is not true for flock locks, so + just exclude such locks from consideration. + +Severity : normal +Bugzilla : 15188 +Description: MDS deadlock with many ll_sync_lov threads and I/O stalled +Details : Use fsfilt_sync() for both the whole filesystem sync and + individual file sync to eliminate dangerous inode locking + with I_LOCK that can lead to a deadlock. + -------------------------------------------------------------------------------- 2007-08-10 Cluster File Systems, Inc.