*** empty log message ***

[fs/lustre-release.git] / lustre / ChangeLog
diff --git a/lustre/ChangeLog b/lustre/ChangeLog

index 420e776..a1c3742 100644 (file)
--- a/lustre/ChangeLog
+++ b/lustre/ChangeLog
@@ -1,24 +1,264 @@
  tbd  Sun Microsystems, Inc.
-       * version 1.8.0
+       * version 2.0.0
         * Support for kernels:
-        2.6.9-67.0.4.EL (RHEL 4),
-        2.6.16.54-0.2.5 (SLES 10),
-        2.6.18-53.1.14.el5 (RHEL 5).
+        2.6.16.60-0.23 (SLES 10),
+        2.6.18-92.1.6.el5 (RHEL 5),
+        2.6.22.14 vanilla (kernel.org).
         * Client support for unpatched kernels:
          (see http://wiki.lustre.org/index.php?title=Patchless_Client)
          2.6.16 - 2.6.21 vanilla (kernel.org)
-       * Recommended e2fsprogs version: 1.40.7-sun1
+       * Recommended e2fsprogs version: 1.40.11-sun1
         * Note that reiserfs quotas are disabled on SLES 10 in this kernel.
         * RHEL 4 and RHEL 5/SLES 10 clients behaves differently on 'cd' to a
          removed cwd "./" (refer to Bugzilla 14399).
  
+Severity   : enhancement
+Bugzilla   : 16091
+Description: configure's --enable-quota should check the 
+           : kernel .config for CONFIG_QUOTA
+Details    : configure is terminated if --enable-quota is passed but
+           : no quota support is in kernel
+
+Severity   : normal
+Bugzilla   : 13139
+Description: Remove portals compatibility
+Details    : Remove portals compatibility, not interoperable with releases
+            before 1.4.6
+
+Severity   : normal
+Bugzilla   : 15576
+Description: Resolve device initialization race
+Details    : Prevent proc handler from accessing devices added to the
+            obd_devs array but yet be intialized.
+
+Severity   : enhancement
+Bugzilla   : 15308
+Description: Update to SLES10 SP2 kernel-2.6.16.60-0.23.
+
+Severity   : enhancement
+Bugzilla   : 16190
+Description: Update to RHEL5 kernel-2.6.18-92.1.6.el5.
+
+Severity   : normal
+Bugzilla   : 12975
+Frequency  : rare
+Description: Using wrong pointer in osc_brw_prep_request
+Details    : Access to array[-1] can produce panic if kernel compiled with
+            CONFIG_PAGE_ALLOC enabled
+
+Severity   : normal
+Bugzilla   : 16037
+Description: Client runs out of low memory
+Details    : Consider only lowmem when counting initial number of llap pages
+
+Severity   : normal
+Bugzilla   : 15625
+Description: *optional* service tags registration
+Details    : if the "service tags" package is installed on a Lustre node
+            When the filesystem is mounted, a local-node service tag will
+            be created.  See http://inventory.sun.com/ for more information
+            about the Service Tags asset management system.
+
+Severity   : normal
+Bugzilla   : 15825
+Description: Kernel BUG tries to release flock
+Details    : Lustre does not destroy flock lock before last reference goes
+            away. So always drop flock locks when client is evicted and 
+            perform unlock regardless of successfulness of speaking to MDS.
+
+Severity   : normal
+Bugzilla   : 15210
+Description: add recount protection for osc callbacks, so avoid panic on shutdown
+
+Severity   : normal
+Bugzilla   : 12653
+Description: sanity test 65a fails if stripecount of -1 is set
+Details    : handle -1 striping on filesystem in ll_dirstripe_verify
+
+Severity   : normal
+Bugzilla   : 14742
+Frequency  : rare
+Description: ASSERTION(CheckWriteback(page,cmd)) failed
+Details    : badly clear PG_Writeback bit in ll_ap_completion can produce false
+            positive assertion.
+
+Severity   : enhancement
+Bugzilla   : 15865
+Description: Update to RHEL5 kernel-2.6.18-53.1.21.el5.
+
+Severity   : major
+Bugzilla   : 15924
+Description: do not process already freed flock
+Details    : flock can possibly be freed by another thread before it reaches
+            to ldlm_flock_completion_ast.
+
+Severity   : normal
+Bugzilla   : 14480
+Description: LBUG during stress test
+Details    : Need properly lock accesses the flock deadlock detection list.
+
+Severity   : minor
+Bugzilla   : 15837
+Description: oops in page fault handler
+Details    : kernel page fault handler can return two special 'pages' in error case, don't
+            try dereference NOPAGE_SIGBUS and NOPAGE_OMM.
+
+Severity   : minor
+Bugzilla   : 15716
+Description: timeout with invalidate import.
+Details    : ptlrpcd_check call obd_zombie_impexp_cull and wait request which should be
+            handled by ptlrpcd. This produce long age waiting and -ETIMEOUT
+            ptlrpc_invalidate_import and as result LASSERT.
+
+Severity   : enhancement
+Bugzilla   : 15741
+Description: Update to RHEL5 kernel-2.6.18-53.1.19.el5.
+
+Severity   : major
+Bugzilla   : 14134
+Description: enable MGS and MDT services start separately
+Details    : add a 'nomgs' option in mount.lustre to enable start a MDT with
+            a co-located MGS without starting the MGS, which is a complement
+            to 'nosvc' mount option.
+
+Severity   : normal
+Bugzilla   : 14835
+Frequency  : after recovery
+Description: precreate to many object's after del orphan.
+Details    : del orphan st in oscc last_id == next_id and this triger growing
+            count of precreated objects. Set flag LOW to skip increase count
+            of precreated objects.
+
+Severity   : normal
+Bugzilla   : 15139
+Frequency  : rare, on clear nid stats
+Description: ASSERTION(client_stat->nid_exp_ref_count == 0)
+Details    : when clean nid stats sometimes try destroy live entry,
+            and this produce panic in free.
+
+Severity   : major
+Bugzilla   : 15575
+Description: Stack overflow during MDS log replay
+            ease stack pressure by using a thread dealing llog_process.
+
+Severity   : normal
+Bugzilla   : 15443
+Description: wait until IO finished before start new when do lock cancel.
+Details    : VM protocol want old IO finished before start new, in this case
+            need wait until PG_writeback is cleared until check dirty flag and
+            call writepages in lock cancel callback.
+
+Severity   : enhancement
+Bugzilla   : 14929
+Description: using special macro for print time and cleanup in includes.
+
+Severity   : normal
+Bugzilla   : 12888
+Description: mds_mfd_close() ASSERTION(rc == 0)
+Details    : In mds_mfd_close(), we need protect inode's writecount change
+            within its orphan write semaphore to prevent possible races.
+
+Severity   : minor
+Bugzilla   : 14929
+Description: Obsolete CURRENT_SECONDS and use cfs_time_current_sec() instead.
+
+Severity   : minor
+Bugzilla   : 14645
+Frequency  : rare, on shutdown ost
+Description: don't hit live lock with umount ost.
+Details    : shrink_dcache_parent can be in long loop with destroy dentries,
+            use shrink_dcache_sb instead.
+
+Severity   : minor
+Bugzilla   : 14949
+Description: don't panic with use echo client
+Details    : echo client pass NULL as client nid pointer and this produce null
+            pointer dereference.
+
+Severity   : normal
+Bugzilla   : 15278
+Description: fix build on ppc32
+Details    : compile code with -m64 flag produce wrong object file for ppc32.
+
+Severity   : normal
+Bugzilla   : 12191
+Description: add message levels for liblustreapi
+
+Severity   : normal
+Bugzilla   : 13380
+Description: fix for occasional failure case of -ENOSPC in recovery-small tests
+Details    : Move the 'good_osts' check before the 'total_bavail' check.  This
+            will result in an -EAGAIN and in the exit call path we call
+            alloc_rr() which will with increasing aggressiveness attempt to
+            aquire precreated objects on the minimum number of required OSCs.
+
+Severity   : major
+Bugzilla   : 14326
+Description: Use old size assignment to avoid deadlock
+Details    : This reverts the changes in bugs 2369 and bug 14138 that introduced
+            the scheduling while holding a spinlock.  We do not need locking
+            for size in ll_update_inode() because size is only updated from
+            the MDS for directories or files without objects, so there is no
+            other place to do the update, and concurrent access to such inodes
+            are protected by the inode lock.
+
+Severity   : normal
+Bugzilla   : 14746
+Description: resolve "_IOWR redefined" build error on SLES10
+
+Severity   : normal
+Bugzilla   : 14763
+Description: dump the memory debugging after all modules are unloaded to
+            suppress false negative in conf_sanity test 39
+
+Severity   : enhancement
+Bugzilla   : 15316
+Description: build kernel-ib packages for OFED 1.3 in our release cycle
+
+Severity   : minor
+Bugzilla   : 13969
+Frequency  : always
+Description: fix SLES kernel versioning
+Details    : the kernel version for our SLES 10 kernel did not include a "-"
+            before the "smp" at the end.  while this was not a problem in
+            general, it did mean that software trying to use the kernel
+            version to try to detect a vendor specific kernel would fail.
+            this was most evident by the OFED build scripts.
+
+Severity   : normal
+Bugzilla   : 14803
+Description: Don't update lov_desc members until making sure they are valid
+Details    : When updating lov_desc members via proc fs, need fix their
+            validities before doing the real update.
+
+Severity   : normal
+Bugzilla   : 15069
+Description: don't put request into delay list while invalidate in flight.
+Details    : ptlrpc_delay_request sometimes put in delay list while invalidate
+            import in flight. this produce timeout for invalidate and sometimes
+            can cause stale data.
+
+Severity   : minor
+Bugzilla   : 14856
+Frequency  : on ppc only
+Description: not convert ost objects for directory because it's not exist.
+Details    : ll_dir_getstripe assume dirrectory has ost objects but this wrong.
+
+Severity   : normal
+Bugzilla   : 12652
+Description: Add FMODE_EXEC file flag for SLES10 SP1 kernel.
+
+Severity   : enhancement
+Bugzilla   : 13397
+Description: Update to support 2.6.22.14 vanilla kernel.
+
  Severity   : normal
  Bugzilla   : 14533
  Frequency  : rare, on recovery
  Description: read procfs can produce deadlock in some situation
  Details    : Holding lprocfs lock which send rpc can produce block for destroy
-             obd objects and this also block reconnect with -EALREADY. This isn't
-             fix all lprocfs bugs - but make it rare.
+            obd objects and this also block reconnect with -EALREADY. This isn't
+            fix all lprocfs bugs - but make it rare.
  
  Severity   : enhancement
  Bugzilla   : 15152
@@ -34,10 +274,25 @@ Severity   : normal
  Bugzilla   : 14321
  Description: lustre_mgs: operation 101 on unconnected MGS
  Details    : When MGC is disconnected from MGS long enough, MGS will evict the
-             MGC, and late on MGC cannot successfully connect to MGS and a lot
+            MGC, and late on MGC cannot successfully connect to MGS and a lot
              of the error messages complaining that MGS is not connected.
  
  Severity   : major
+Bugzilla   : 15027
+Frequency  : on network error
+Description: panic with double free request if network error
+Details    : mdc_finish_enqueue is finish request if any network error ocuring,
+            but it's true only for synchronus enqueue, for async enqueue
+            (via ptlrpcd) this incorrect and ptlrpcd want finish request
+            himself.
+
+Severity   : enhancement
+Bugzilla   : 11401
+Description: client-side metadata stat-ahead during readdir(directory readahead)
+Details    : perform client-side metadata stat-ahead when the client detects
+            readdir and sequential stat of dir entries therein
+
+Severity   : major
  Frequency  : on start mds
  Bugzilla   : 14884
  Description: Implement get_info(last_id) in obdfilter.
@@ -376,7 +631,7 @@ Frequency  : rare
  Description: Oops in read and write path when failing to allocate lock.
  Details    : Check if lock allocation failed and return error back.
  
-Severity   : normal 
+Severity   : normal
  Bugzilla   : 11679
  Description: lstripe command fails for valid OST index
  Details    : The stripe offset is compared to lov->desc.ld_tgt_count
@@ -398,10 +653,11 @@ Bugzilla   : 12836
  Description: lfs find on -1 stripe looping in lsm_lmm_verify_common()
  Details    : Avoid lov_verify_lmm_common() on directory with -1 stripe count.
  
-Severity   : major
-Bugzilla   : 12932
-Description: obd_health_check_timeout too short
-Details    : set obd_health_check_timeout as 1.5x of obd_timeout
+Severity   : enhancement
+Bugzilla   : 3055
+Description: Adaptive timeouts
+Details    : RPC timeouts adapt to changing server load and network
+            conditions to reduce resend attempts and improve recovery time.
  
  Severity   : normal
  Bugzilla   : 12192
@@ -507,10 +763,10 @@ Description: when mds and osts use different quota unit(32bit and 64bit),
  Details    : void sending multiple quota reqs to mds, which will keep the
              status between the reqs.
  
-Severity   : normal 
+Severity   : normal
  Bugzilla   : 13125
  Description: osts not allocated evenly to files
-Details    : change the condition to increase offset_idx 
+Details    : change the condition to increase offset_idx
  
  Severity   : critical
  Frequency  : Always for filesystems larger than 2TB on 32-bit systems.
@@ -668,11 +924,11 @@ Severity   : normal
  Bugzilla   : 13570
  Description: To avoid grant space > avaible space when the disk is almost
              full. Without this patch you might see the error "grant XXXX >
-            available" or some LBUG about grant, when the disk is almost 
+            available" or some LBUG about grant, when the disk is almost
              full.
  Details    : In filter_check_grant, for non_grant cache write, we should
              check the left space by  if (*left > ungranted + bytes), instead
-            of (*left > ungranted), because only we are sure the left space 
+            of (*left > ungranted), because only we are sure the left space
              is enough for another "bytes", then the ungrant space should be
              increase. In client, we should update cl_avail_grant only there
              is OBD_MD_FLGRANT in the reply.
@@ -838,7 +1094,187 @@ Severity   : normal
  Bugzilla   : 14379
  Description: Properly match for duplicate locks
  Details    : Due to different lock order from skiplists code, we need to
-             traverse entire list for now
+            traverse entire list for now
+
+Severity   : normal
+Frequency  : only on PPC/SLES10
+Bugzilla   : 14855
+Description: "BITS_PER_LONG is not 32 or 64" in linux/idr.h
+Details    : On SLES10/PPC, fs.h includes idr.h which requires BITS_PER_LONG to
+            be defined. Add a hack in mkfs_lustre.c to work around this compile
+            issue.
+
+Severity   : normal
+Bugzilla   : 14257
+Description: LASSERT on MDS when client holding flock lock dies
+Details    : ldlm pool logic depends on number of granted locks equal to
+            number of released locks which is not true for flock locks, so 
+            just exclude such locks from consideration.
+
+Severity   : normal
+Bugzilla   : 15188
+Description: MDS deadlock with many ll_sync_lov threads and I/O stalled
+Details    : Use fsfilt_sync() for both the whole filesystem sync and
+            individual file sync to eliminate dangerous inode locking
+            with I_LOCK that can lead to a deadlock.
+
+Severity   : normal
+Bugzilla   : 14410
+Description: performance in 1.6.3
+Details    : Force q->max_phys_segments to MAX_PHYS_SEGMENTS on SLES10 to be
+            sure that 1MB requests are not fragmented by the block layer.
+
+Severity   : enhancement
+Bugzilla   : 11089
+Description: organize the server-side client stats on per-nid basis
+Details    : Change the structure of stats under obdfilter and mds to
+             New structure:
+                +- exports
+                        +- nid#1
+                        |   + stats
+                        |   + uuids
+                        +- nid#2...
+                        +- clear
+             The "uuid"s file would list the uuids of _active_ exports.
+             And the clear entry is to clear all stats and stale nids.
+
+Severity   : enhancement
+Bugzilla   : 11270
+Description: eliminate client locks in face of contention
+Details    : file contention detection and lockless i/o implementation
+            for contended files.
+
+Severity   : normal
+Bugzilla   : 15212
+Description: Reinitialize optind to 0 so that interactive lfs works in all cases
+
+Severity   : critical
+Frequency  : very rare, if additional xattrs are used on kernels >= 2.6.12
+Bugzilla   : 15777
+Description: MDS may lose file striping (and hence file data) in some cases
+Details    : If there are additional extended attributes stored on the MDS,
+            in particular ACLs, SELinux, or user attributes (if user_xattr
+            is specified for the client mount options) then there is a risk
+            of attribute loss.  Additionally, the Lustre file striping
+            needs to be larger than default (e.g. striped over all OSTs),
+            and an additional attribute must be stored initially in the
+            inode and then increase in size enough to be moved to the
+            external attribute block (e.g. ACL growing in size) for file
+            data to be lost.
+
+Severity   : normal
+Bugzilla   : 15346
+Description: skiplist implementation simplification
+Details    : skiplists are used to group compatible locks on granted list
+            that was implemented as tracking first and last lock of each lock group
+            the patch changes that to using doubly linked lists
+
+Severity   : normal
+Bugzilla   : 15574
+Description: MDS LBUG: ASSERTION(!IS_ERR(dchild))
+Details    : Change LASSERTs to client eviction (i.e. abort client's recovery)
+            because LASSERT on both the data supplied by a client, and the data 
+            on disk is dangerous and incorrect.
+
+Severity   : enhancement
+Bugzilla   : 10718
+Description: Slow truncate/writes to huge files at high offsets.
+Details    : Directly associate cached pages to lock that protect those pages,
+            this allows us to quickly find what pages to write and remove
+            once lock callback is received.
+
+Severity   : normal
+Bugzilla   : 15953
+Description: more ldlm soft lockups
+Details    : In ldlm_resource_add_lock(), call to ldlm_resource_dump()
+            starve other threads from the resource lock for a long time in
+            case of long waiting queue, so change the debug level from
+            D_OTHER to the less frequently used D_INFO.
+
+Severity   : enhancement
+Bugzilla   : 13128
+Description: add -gid, -group, -uid, -user options to lfs find
+
+Severity   : normal
+Bugzilla   : 15950
+Description: Hung threads in invalidate_inode_pages2_range
+Details    : The direct IO path doesn't call check_rpcs to submit a new RPC once
+            one is completed. As a result, some RPCs are stuck in the queue
+            and are never sent.
+
+Severity   : normal
+Bugzilla   : 14629
+Description: filter threads hungs on waiting journal commit
+Details    : Cleanup filter group llog code, then only filter group llog will
+            be only created in the MDS/OST syncing process.
+
+Severity   : normal
+Bugzilla   : 15684
+Description: Procfs and llog threads access destoryed import sometimes.
+Details    : Sync the import destoryed process with procfs and llog threads by
+            the import refcount and semaphore.
+
+Severity   : enhancement
+Bugzilla   : 14975
+Description: openlock cache of b1_6 port to HEAD
+
+Severity   : major
+Frequncy   : rare
+Bugzilla   : 16226
+Description: kernel BUG at ldiskfs2_ext_new_extent_cb
+Details    : If insertion of an extent fails, then discard the inode
+            preallocation and free data blocks else it can lead to duplicate
+            blocks.
+
+Severity   : normal
+Bugzilla   : 16199
+Description: don't always update ctime in ext3_xattr_set_handle()
+Details    : Current xattr code updates the inode ctime in ext3_xattr_set_handle.
+            In some cases the ctime should not be updated, for example for
+            2.0->1.8 compatibility it is necessary to delete an xattr and it
+            should not update the ctime.
+
+Severity   : major
+Frequency  : rare
+Bugzilla   : 15713/16362
+Description: Assertion in iopen_connect_dentry in 1.6.3
+Details    : looking up an inode via iopen with the wrong generation number can
+            populate the dcache with a disconneced dentry while the inode
+            number is in the process of being reallocated. This causes an
+            assertion failure in iopen since the inode's dentry list contains
+            both a connected and disconnected dentry.
+
+Severity   : normal
+Bugzilla   : 16496
+Description: assertion failure in ldlm_handle2lock()
+Details    : fix a race between class_handle_unhash() and class_handle2object()
+            introduced in lustre 1.6.5 by bug 13622.
+
+Severity   : minor
+Frequency  : rare
+Bugzilla   : 12755
+Description: Kernel BUG: sd_iostats_bump: unexpected disk index
+Details    : remove the limit of 256 scsi disks in the sd_iostat patch
+
+Severity   : minor
+Frequency  : rare
+Bugzilla   : 16494
+Description: oops in sd_iostats_seq_show()
+Details    : unloading/reloading the scsi low level driver triggers a kernel
+            bug when trying to access the sd iostat file.
+
+Severity   : major
+Frequency  : rare
+Bugzilla   : 16404
+Description: Kernel panics during QLogic driver reload
+Details    : REQ_BLOCK_PC requests are not handled properly in the sd iostat
+            patch, causing memory corruption.
+
+Severity   : minor
+Frequency  : rare
+Bugzilla   : 16140
+Description: journal_dev option does not work in b1_6
+Details    : pass mount option during pre-mount.
  
  --------------------------------------------------------------------------------