X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=blobdiff_plain;f=lustre%2FChangeLog;h=d281ffde65a5e484b97081ac47d7c5a4830389e8;hp=e9a30134da2aa771ddb896e549b9f88e26facb5d;hb=99a7a2f7f42477bc03eaca0b0faaa4e9a303afd2;hpb=5621b9f70e62da7b98d556de8cae39acd07f9e97 diff --git a/lustre/ChangeLog b/lustre/ChangeLog index e9a3013..d281ffd 100644 --- a/lustre/ChangeLog +++ b/lustre/ChangeLog @@ -1,17 +1,239 @@ tbd Sun Microsystems, Inc. - * version 1.8.0 + * version 2.0.0 * Support for kernels: - 2.6.9-67.0.4.EL (RHEL 4), 2.6.16.54-0.2.5 (SLES 10), - 2.6.18-53.1.13.el5 (RHEL 5). + 2.6.18-53.1.21.el5 (RHEL 5), + 2.6.22.14 vanilla (kernel.org). * Client support for unpatched kernels: (see http://wiki.lustre.org/index.php?title=Patchless_Client) 2.6.16 - 2.6.21 vanilla (kernel.org) - * Recommended e2fsprogs version: 1.40.4-cfs1 + * Recommended e2fsprogs version: 1.40.7-sun3 * Note that reiserfs quotas are disabled on SLES 10 in this kernel. * RHEL 4 and RHEL 5/SLES 10 clients behaves differently on 'cd' to a removed cwd "./" (refer to Bugzilla 14399). +Severity : normal +Bugzilla : 12653 +Description: sanity test 65a fails if stripecount of -1 is set +Details : handle -1 striping on filesystem in ll_dirstripe_verify + +Severity : normal +Bugzilla : 14742 +Frequency : rare +Description: ASSERTION(CheckWriteback(page,cmd)) failed +Details : badly clear PG_Writeback bit in ll_ap_completion can produce false + positive assertion. + +Severity : enhancement +Bugzilla : 15865 +Description: Update to RHEL5 kernel-2.6.18-53.1.21.el5. + +Severity : major +Bugzilla : 15924 +Description: do not process already freed flock +Details : flock can possibly be freed by another thread before it reaches + to ldlm_flock_completion_ast. + +Severity : normal +Bugzilla : 14480 +Description: LBUG during stress test +Details : Need properly lock accesses the flock deadlock detection list. + +Severity : minor +Bugzilla : 15837 +Description: oops in page fault handler +Details : kernel page fault handler can return two special 'pages' in error case, don't + try dereference NOPAGE_SIGBUS and NOPAGE_OMM. + +Severity : minor +Bugzilla : 15716 +Description: timeout with invalidate import. +Details : ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be + handled by ptlrpcd. This produce long age waiting and -ETIMEOUT + ptlrpc_invalidate_import and as result LASSERT. + +Severity : enhancement +Bugzilla : 15741 +Description: Update to RHEL5 kernel-2.6.18-53.1.19.el5. + +Severity : major +Bugzilla : 14134 +Description: enable MGS and MDT services start separately +Details : add a 'nomgs' option in mount.lustre to enable start a MDT with + a co-located MGS without starting the MGS, which is a complement + to 'nosvc' mount option. + +Severity : normal +Bugzilla : 14835 +Frequency : after recovery +Description: precreate to many object's after del orphan. +Details : del orphan st in oscc last_id == next_id and this triger growing + count of precreated objects. Set flag LOW to skip increase count + of precreated objects. + +Severity : normal +Bugzilla : 15139 +Frequency : rare, on clear nid stats +Description: ASSERTION(client_stat->nid_exp_ref_count == 0) +Details : when clean nid stats sometimes try destroy live entry, + and this produce panic in free. + +Severity : major +Bugzilla : 15575 +Description: Stack overflow during MDS log replay + ease stack pressure by using a thread dealing llog_process. + +Severity : normal +Bugzilla : 15443 +Description: wait until IO finished before start new when do lock cancel. +Details : VM protocol want old IO finished before start new, in this case + need wait until PG_writeback is cleared until check dirty flag and + call writepages in lock cancel callback. + +Severity : enhancement +Bugzilla : 14929 +Description: using special macro for print time and cleanup in includes. + +Severity : normal +Bugzilla : 12888 +Description: mds_mfd_close() ASSERTION(rc == 0) +Details : In mds_mfd_close(), we need protect inode's writecount change + within its orphan write semaphore to prevent possible races. + +Severity : minor +Bugzilla : 14929 +Description: Obsolete CURRENT_SECONDS and use cfs_time_current_sec() instead. + +Severity : minor +Bugzilla : 14645 +Frequency : rare, on shutdown ost +Description: don't hit live lock with umount ost. +Details : shrink_dcache_parent can be in long loop with destroy dentries, + use shrink_dcache_sb instead. + +Severity : minor +Bugzilla : 14949 +Description: don't panic with use echo client +Details : echo client pass NULL as client nid pointer and this produce null + pointer dereference. + +Severity : normal +Bugzilla : 15278 +Description: fix build on ppc32 +Details : compile code with -m64 flag produce wrong object file for ppc32. + +Severity : normal +Bugzilla : 12191 +Description: add message levels for liblustreapi + +Severity : normal +Bugzilla : 13380 +Description: fix for occasional failure case of -ENOSPC in recovery-small tests +Details : Move the 'good_osts' check before the 'total_bavail' check. This + will result in an -EAGAIN and in the exit call path we call + alloc_rr() which will with increasing aggressiveness attempt to + aquire precreated objects on the minimum number of required OSCs. + +Severity : major +Bugzilla : 14326 +Description: Use old size assignment to avoid deadlock +Details : This reverts the changes in bugs 2369 and bug 14138 that introduced + the scheduling while holding a spinlock. We do not need locking + for size in ll_update_inode() because size is only updated from + the MDS for directories or files without objects, so there is no + other place to do the update, and concurrent access to such inodes + are protected by the inode lock. + +Severity : normal +Bugzilla : 14746 +Description: resolve "_IOWR redefined" build error on SLES10 + +Severity : normal +Bugzilla : 14763 +Description: dump the memory debugging after all modules are unloaded to + suppress false negative in conf_sanity test 39 + +Severity : enhancement +Bugzilla : 15316 +Description: build kernel-ib packages for OFED 1.3 in our release cycle + +Severity : minor +Bugzilla : 13969 +Frequency : always +Description: fix SLES kernel versioning +Details : the kernel version for our SLES 10 kernel did not include a "-" + before the "smp" at the end. while this was not a problem in + general, it did mean that software trying to use the kernel + version to try to detect a vendor specific kernel would fail. + this was most evident by the OFED build scripts. + +Severity : normal +Bugzilla : 14803 +Description: Don't update lov_desc members until making sure they are valid +Details : When updating lov_desc members via proc fs, need fix their + validities before doing the real update. + +Severity : normal +Bugzilla : 15069 +Description: don't put request into delay list while invalidate in flight. +Details : ptlrpc_delay_request sometimes put in delay list while invalidate + import in flight. this produce timeout for invalidate and sometimes + can cause stale data. + +Severity : minor +Bugzilla : 14856 +Frequency : on ppc only +Description: not convert ost objects for directory because it's not exist. +Details : ll_dir_getstripe assume dirrectory has ost objects but this wrong. + +Severity : normal +Bugzilla : 12652 +Description: Add FMODE_EXEC file flag for SLES10 SP1 kernel. + +Severity : enhancement +Bugzilla : 13397 +Description: Update to support 2.6.22.14 vanilla kernel. + +Severity : normal +Bugzilla : 14533 +Frequency : rare, on recovery +Description: read procfs can produce deadlock in some situation +Details : Holding lprocfs lock which send rpc can produce block for destroy + obd objects and this also block reconnect with -EALREADY. This isn't + fix all lprocfs bugs - but make it rare. + +Severity : enhancement +Bugzilla : 15152 +Description: Update kernel to RHEL5 2.6.18-53.1.14.el5. + +Severity : major +Frequency : frequent on X2 node +Bugzilla : 15010 +Description: mdc_set_open_replay_data LBUG +Details : Set replay data for requests that are eligible for replay. + +Severity : normal +Bugzilla : 14321 +Description: lustre_mgs: operation 101 on unconnected MGS +Details : When MGC is disconnected from MGS long enough, MGS will evict the + MGC, and late on MGC cannot successfully connect to MGS and a lot + of the error messages complaining that MGS is not connected. + +Severity : major +Bugzilla : 15027 +Frequency : on network error +Description: panic with double free request if network error +Details : mdc_finish_enqueue is finish request if any network error ocuring, + but it's true only for synchronus enqueue, for async enqueue + (via ptlrpcd) this incorrect and ptlrpcd want finish request + himself. + +Severity : enhancement +Bugzilla : 11401 +Description: client-side metadata stat-ahead during readdir(directory readahead) +Details : perform client-side metadata stat-ahead when the client detects + readdir and sequential stat of dir entries therein + Severity : major Frequency : on start mds Bugzilla : 14884 @@ -22,10 +244,10 @@ Frequency : occasional Bugzilla : 13537 Description: Correctly check stale fid, not start epoch if ost not support SOM Details : open with flag O_CREATE need set old fid in op_fid3 because op_fid2 - overwrited with new generated fid, but mds can anwer with one of these - two fids and both is not stale. setattr incorectly start epoch and - assume will be called done_writeting, but without SOM done_writing - never called. + overwrited with new generated fid, but mds can anwer with one of these + two fids and both is not stale. setattr incorectly start epoch and + assume will be called done_writeting, but without SOM done_writing + never called. Severity : major Frequency : rare, depends on device drivers and load @@ -41,9 +263,9 @@ Severity : normal Frequency : occasional Bugzilla : 13730 Description: Do not fail import if osc_interpret_create gets -EAGAIN -Details : If osc_interpret_create got -EAGAIN it immediately exits and - wakeup oscc_waitq. After wakeup oscc_wait_for_objects call - oscc_has_objects and see OSC has no objests and call +Details : If osc_interpret_create got -EAGAIN it immediately exits and + wakeup oscc_waitq. After wakeup oscc_wait_for_objects call + oscc_has_objects and see OSC has no objests and call oscc_internal_create to resend create request. Severity : enhancement @@ -84,7 +306,7 @@ Details : Don't allow skipping OSTs if index has been specified, make locking Severity : normal Bugzilla : 12228 Description: LBUG in ptlrpc_check_set() bad phase ebc0de00 -Details : access to bitfield in structure is always rounded to long +Details : access to bitfield in structure is always rounded to long and this produce problem with not atomic change any bit. Severity : normal @@ -100,7 +322,7 @@ Description: If llog cancel was not send before clean_exports phase, this can produce deadlock in llog code. Details : If llog thread has last reference to obd and call class_import_put this produce deadlock because llog_cleanup_commit_master wait when - last llog_commit_thread exited, but this never success because was + last llog_commit_thread exited, but this never success because was called from llog_commit_thread. Severity : normal @@ -160,7 +382,7 @@ Frequency : rare, at shutdown Description: access already free / zero obd_namespace. Details : if client_disconnect_export was called without force flag set, and exist connect request in flight, this can produce access to - NULL pointer (or already free pointer) when connect_interpret + NULL pointer (or already free pointer) when connect_interpret store ocd flags in obd_namespace. Severity : minor @@ -178,7 +400,7 @@ Details : Make lustre randomly failed allocating memory for testing purpose. Severity : enhancement Bugzilla : 12702 Description: lost problems with lov objid file -Details : Fixes some scability and access to not inited memory problems +Details : Fixes some scability and access to not inited memory problems in work with lov objdid file. Severity : major @@ -220,18 +442,18 @@ Description: Update to RHEL4 latest kernel. Severity : enhancement Bugzilla : 13690 Description: Build SLES10 patchless client fails -Details : The configure was broken by run ./configure with +Details : The configure was broken by run ./configure with --with-linux-obj=.... argument for patchless client. When the configure use --with-linux-obj, the LINUXINCLUDE= -Iinclude - can't search header adequately. Use absolute path such as - -I($LINUX)/include instead. + can't search header adequately. Use absolute path such as + -I($LINUX)/include instead. Severity : normal Bugzilla : 13888 Description: interrupt oig_wait produce painc on resend. Details : brw_redo_request can be used for resend requests from ptlrpcd and private set, and this produce situation when rq_ptlrpcd_data not - copyed to new allocated request and triggered LBUG on assert + copyed to new allocated request and triggered LBUG on assert req->rq_ptlrpcd_data != NULL. But this member used only for wakeup ptlrpcd set if request is changed and can be safety changed to use rq_set directly. @@ -256,10 +478,10 @@ Details : This causes SLES 10 clients to behave as patchless clients Severity : enhancement Bugzilla : 2262 Description: self-adjustable client's lru lists -Details : use adaptive algorithm for managing client cached locks lru +Details : use adaptive algorithm for managing client cached locks lru lists according to current server load, other client's work - pattern, memory activities, etc. Both, server and client - side namespaces provide number of proc tunables for controlling + pattern, memory activities, etc. Both, server and client + side namespaces provide number of proc tunables for controlling things Severity : enhancement @@ -381,7 +603,7 @@ Details : set obd_health_check_timeout as 1.5x of obd_timeout Severity : normal Bugzilla : 12192 Description: llapi_file_create() does not allow some changes -Details : add llapi_file_open() that allows specifying the mode and +Details : add llapi_file_open() that allows specifying the mode and open flags, and also returns an open file handle. Severity : normal @@ -392,9 +614,9 @@ Details : Remove mnt_lustre_list in vfs_intent-2.6-rhel4.patch. Severity : normal Bugzilla : 10657 Description: Add journal checksum support.(Kernel part) -Details : The journal checksum feature adds two new flags i.e - JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and - JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag +Details : The journal checksum feature adds two new flags i.e + JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and + JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Now commit record can be sent to disk without waiting for descriptor @@ -414,7 +636,7 @@ Details : execute lfs setstripe on client Severity : major Bugzilla : 12223 Description: mds_obd_create error creating tmp object -Details : When the user sets quota on root, llog will be affected and can't +Details : When the user sets quota on root, llog will be affected and can't create files and write files. Severity : normal @@ -422,7 +644,7 @@ Frequency : Always on ia64 patchless client, and possibly others. Bugzilla : 12826 Description: Add EXPORT_SYMBOL check for node_to_cpumask symbol. Details : This allows the patchless client to be loaded on architectures - without this export. + without this export. Severity : normal Bugzilla : 13039 @@ -482,10 +704,10 @@ Description: when mds and osts use different quota unit(32bit and 64bit), Details : void sending multiple quota reqs to mds, which will keep the status between the reqs. -Severity : normal +Severity : normal Bugzilla : 13125 Description: osts not allocated evenly to files -Details : change the condition to increase offset_idx +Details : change the condition to increase offset_idx Severity : critical Frequency : Always for filesystems larger than 2TB on 32-bit systems. @@ -500,7 +722,7 @@ Details : When generating the bio request for lustre file writes the Severity : normal Bugzilla : 11230 -Description: Tune the kernel for good SCSI performance. +Description: Tune the kernel for good SCSI performance. Details : Set the value of /sys/block/{dev}/queue/max_sectors_kb to the value of /sys/block/{dev}/queue/max_hw_sectors_kb in mount_lustre. @@ -542,8 +764,8 @@ Frequency : only on ppc Bugzilla : 12234 Description: /proc/fs/lustre/devices broken on ppc Details : The patch as applied to 1.6.2 doesn't look correct for all arches. - We should make sure the type of 'index' is loff_t and then cast - explicitly as needed below. Do not assign an explicitly cast + We should make sure the type of 'index' is loff_t and then cast + explicitly as needed below. Do not assign an explicitly cast loff_t to an int. Severity : normal @@ -562,14 +784,14 @@ Severity : normal Bugzilla : 13304 Frequency : Always, for kernels after 2.6.16 Description: Fix warning idr_remove called for id=.. which is not allocated. -Details : Last kernels save old s_dev before kill super and not allow +Details : Last kernels save old s_dev before kill super and not allow to restore from callback - restore it before call kill_anon_super. Severity : minor Bugzilla : 12948 Description: buffer overruns could theoretically occur Details : llapi_semantic_traverse() modifies the "path" argument by - appending values to the end of the origin string, and a + appending values to the end of the origin string, and a overrun may occur. Adding buffer overrun check in liblustreapi. Severity : normal @@ -602,12 +824,12 @@ Severity : critical Bugzilla : 13751 Description: Kernel patches update for RHEL5 2.6.18-8.1.14.el5. Details : Modify target file & which_patch. - A flaw was found in the IA32 system call emulation provided - on AMD64 and Intel 64 platforms. An improperly validated 64-bit - value could be stored in the %RAX register, which could trigger an - out-of-bounds system call table access. An untrusted local user - could exploit this flaw to run code in the kernel - (ie a root privilege escalation). (CVE-2007-4573). + A flaw was found in the IA32 system call emulation provided + on AMD64 and Intel 64 platforms. An improperly validated 64-bit + value could be stored in the %RAX register, which could trigger an + out-of-bounds system call table access. An untrusted local user + could exploit this flaw to run code in the kernel + (ie a root privilege escalation). (CVE-2007-4573). Severity : major Bugzilla : 13093 @@ -621,7 +843,7 @@ Bugzilla : 13454 Description: Add jbd statistics patch for RHEL5 and 2.6.18-vanilla Severity : minor -Bugzilla : 13732 +Bugzilla : 13732 Description: change order of libsysio includes Details : '#include sysio.h' should always come before '#include xtio.h' @@ -759,18 +981,180 @@ Severity : enhancement Bugzilla : 14729 Description: SNMP support enhancement Details : Adding total number of sampled request for an MDS node in snmp - support. + support. Severity : enhancement Bugzilla : 14748 Description: Optimize ldlm waiting list processing for PR extent locks -Details : When processing waiting list for read extent lock and meeting -read - lock that is same or wider to it that is not contended, skip +Details : When processing waiting list for read extent lock and meeting read + lock that is same or wider to it that is not contended, skip processing rest of the list and immediatelly return current status of conflictness, since we are guaranteed there are no conflicting locks in the rest of the list. +Severity : normal +Bugzilla : 14774 +Description: Time out and refuse to reconnect +Details : When the failover node is the primary node, it is possible + to have two identical connections in imp_conn_list. We must + compare not conn's pointers but NIDs, otherwise we can defeat + connection throttling. + +Severity : normal +Bugzilla : 13821 +Description: port llog fixes from b1_6 into HEAD +Details : Port llog reference couting and some llog cleanups from b1_6 + (bug 10800) into HEAD, for protect from panic and access to already + free llog structures. + +Severity : normal +Bugzilla : 14483 +Description: Detect stride IO mode in read-ahead +Details : When a client does stride read, read-ahead should detect that and + read-ahead pages according to the detected stride pattern. + +Severity : normal +Bugzilla : 13805 +Description: data checksumming impacts single node performance +Details : add support for several checksum algorithm. Currently, only CRC32 + and Adler-32 are supported. The checksum type can be changed on + the fly via /proc/fs/lustre/osc/*/checksum_type. + +Severity : normal +Bugzilla : 14648 +Description: use adler32 for page checksums +Details : when available, use the Adler-32 algorithm instead of CRC32 for + page checksums. + +Severity : normal +Bugzilla : 15033 +Description: build for x2 fails +Details : fix compile issue on Cray systems. + +Severity : normal +Bugzilla : 14379 +Description: Properly match for duplicate locks +Details : Due to different lock order from skiplists code, we need to + traverse entire list for now + +Severity : normal +Frequency : only on PPC/SLES10 +Bugzilla : 14855 +Description: "BITS_PER_LONG is not 32 or 64" in linux/idr.h +Details : On SLES10/PPC, fs.h includes idr.h which requires BITS_PER_LONG to + be defined. Add a hack in mkfs_lustre.c to work around this compile + issue. + +Severity : normal +Bugzilla : 14257 +Description: LASSERT on MDS when client holding flock lock dies +Details : ldlm pool logic depends on number of granted locks equal to + number of released locks which is not true for flock locks, so + just exclude such locks from consideration. + +Severity : normal +Bugzilla : 15188 +Description: MDS deadlock with many ll_sync_lov threads and I/O stalled +Details : Use fsfilt_sync() for both the whole filesystem sync and + individual file sync to eliminate dangerous inode locking + with I_LOCK that can lead to a deadlock. + +Severity : normal +Bugzilla : 14410 +Description: performance in 1.6.3 +Details : Force q->max_phys_segments to MAX_PHYS_SEGMENTS on SLES10 to be + sure that 1MB requests are not fragmented by the block layer. + +Severity : enhancement +Bugzilla : 11089 +Description: organize the server-side client stats on per-nid basis +Details : Change the structure of stats under obdfilter and mds to + New structure: + +- exports + +- nid#1 + | + stats + | + uuids + +- nid#2... + +- clear + The "uuid"s file would list the uuids of _active_ exports. + And the clear entry is to clear all stats and stale nids. + +Severity : enhancement +Bugzilla : 11270 +Description: eliminate client locks in face of contention +Details : file contention detection and lockless i/o implementation + for contended files. + +Severity : normal +Bugzilla : 15212 +Description: Reinitialize optind to 0 so that interactive lfs works in all cases + +Severity : critical +Frequency : very rare, if additional xattrs are used on kernels >= 2.6.12 +Bugzilla : 15777 +Description: MDS may lose file striping (and hence file data) in some cases +Details : If there are additional extended attributes stored on the MDS, + in particular ACLs, SELinux, or user attributes (if user_xattr + is specified for the client mount options) then there is a risk + of attribute loss. Additionally, the Lustre file striping + needs to be larger than default (e.g. striped over all OSTs), + and an additional attribute must be stored initially in the + inode and then increase in size enough to be moved to the + external attribute block (e.g. ACL growing in size) for file + data to be lost. + +Severity : normal +Bugzilla : 15346 +Description: skiplist implementation simplification +Details : skiplists are used to group compatible locks on granted list + that was implemented as tracking first and last lock of each lock group + the patch changes that to using doubly linked lists + +Severity : normal +Bugzilla : 15574 +Description: MDS LBUG: ASSERTION(!IS_ERR(dchild)) +Details : Change LASSERTs to client eviction (i.e. abort client's recovery) + because LASSERT on both the data supplied by a client, and the data + on disk is dangerous and incorrect. + +Severity : enhancement +Bugzilla : 10718 +Description: Slow truncate/writes to huge files at high offsets. +Details : Directly associate cached pages to lock that protect those pages, + this allows us to quickly find what pages to write and remove + once lock callback is received. + +Severity : normal +Bugzilla : 15953 +Description: more ldlm soft lockups +Details : In ldlm_resource_add_lock(), call to ldlm_resource_dump() + starve other threads from the resource lock for a long time in + case of long waiting queue, so change the debug level from + D_OTHER to the less frequently used D_INFO. + +Severity : enhancement +Bugzilla : 13128 +Description: add -gid, -group, -uid, -user options to lfs find + +Severity : normal +Bugzilla : 15950 +Description: Hung threads in invalidate_inode_pages2_range +Details : The direct IO path doesn't call check_rpcs to submit a new RPC once + one is completed. As a result, some RPCs are stuck in the queue + and are never sent. + +Severity : normal +Bugzilla : 14629 +Description: filter threads hungs on waiting journal commit +Details : Cleanup filter group llog code, then only filter group llog will + be only created in the MDS/OST syncing process. + +Severity : normal +Bugzilla : 15684 +Description: Procfs and llog threads access destoryed import sometimes. +Details : Sync the import destoryed process with procfs and llog threads by + the import refcount and semaphore. + -------------------------------------------------------------------------------- 2007-08-10 Cluster File Systems, Inc.