From 553ee377a9180b50847540537ae02e634d32b37b Mon Sep 17 00:00:00 2001 From: johann Date: Wed, 29 Jul 2009 17:43:13 +0000 Subject: [PATCH] Branch b1_8 sync up changelog between b_release_1_8_1 & b1_8. --- lustre/ChangeLog | 346 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 193 insertions(+), 153 deletions(-) diff --git a/lustre/ChangeLog b/lustre/ChangeLog index 219a7fe..737ee54 100644 --- a/lustre/ChangeLog +++ b/lustre/ChangeLog @@ -14,17 +14,6 @@ tbd Sun Microsystems, Inc. of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630. -Severity : enhancement -Bugzilla : 19847 -Description: Update kernel to SLES10 SP2 2.6.16.60-0.39.3. - -Severity : normal -Frequency : with 1.8 server and 1.6 clients -Bugzilla : 20020 -Descriptoin: correctly shrink reply for avoid send too big message to client. -Details : 1.8 mds is allocate to big buffer to LOV EA data and this produce - some problems with sending this reply to 1.6 client. - Severity : normal Bugzilla : 19529 Description: Avoid deadlock for local client writes @@ -43,20 +32,12 @@ Descriptoin: lock ordering violation between &cli->cl_sem and _lprocfs_lock Details : move ldlm namespace creation in setup phase to avoid grab _lprocfs_lock with cli_sem held. -Severity : enhancement -Bugzilla : 19846 -Description: Update kernel to RHEL5 2.6.18-128.1.14.el5. - -Severity : enhancement -Bugzilla : 19848 -Description: Update kernel to SLES11 2.6.27.23-0.1. - -Severity : normal -Bugzilla : 18624 +Severity : normal +Bugzilla : 18624 Description: Unable to run several mkfs.lustre on loop devices at the same time. Details : mkfs.lustre returns error 256 on the concurrent loop devices - formatting. The solution is to proper handle the error. + formatting. The solution is to proper handle the error. Severity : enhancement Bugzilla : 18357 @@ -95,97 +76,73 @@ Details : lov_update_create_set() uses set->set_success as index for created objects, so if some requests failed, they will have hole at end of array and we can use qos_shrink_lsm for allocate correct lsm. -Severity : major -Frequency : rare -Bugzilla : 19495 -Description: fix lqs' reference which won't be put in some situations -Details : This patch fixes: - 1. In quota_check_common(), this function will check quota - for user and group, but only send one return via "pending". - In most cases, the pendings should be same. But that is not - always the case. - 2. If quotaoff runs between lquota_chkquota() and - lquota_pending_commit(), the same thing will happen too. - That is why it comes: - - if (!ll_sb_any_quota_active(qctxt->lqc_sb)) - - RETURN(0); - -Severity : low -Bugzilla : 15010 -Description: Rare Client crash on resend if the file was deleted. -Details : When file is opened, but open reply is lost and file is - subsequently deleted before resend, resend processing logic - breaks trying to open the file again, should not try to open. - -Severity : low -Bugzilla : 19756 -Description: Rare dentry leakage on MDS on forceful shutdown and - replay-resends. -Details : Sometimes on transaction errors mds_mfd_close did not release - mfd and dentry pointed to it. reconstruct_open did not release - parent dentry if no matching mfd was found and child lookup was - used. - -Severity : high -Bugzilla : 17569 -Description: add check for >8TB ldiskfs filesystems -Details : ext3-based ldiskfs does not support greater than 8TB LUNs. - Don't allow >8TB ldiskfs filesystems to be mounted without - force_over_8tb mount option - ------------------------------------------------------------------------------ -tbd Sun Microsystems, Inc. +2009-07-31 Sun Microsystems, Inc. * version 1.8.1 * Support for kernels: - 2.6.16.60-0.37 (SLES 10), - 2.6.27.21-0.1 (SLES11), - 2.6.18-128.1.6.el5 (RHEL 5) + 2.6.16.60-0.39.3 (SLES 10), + 2.6.27.23-0.1 (SLES11, i686 & x84_64 only), + 2.6.18-128.1.14.el5 (RHEL 5), * Client support for unpatched kernels: (see http://wiki.lustre.org/index.php?title=Patchless_Client) 2.6.16 - 2.6.27 vanilla (kernel.org) - * Recommended e2fsprogs version: 1.41.5-sun2 + * Recommended e2fsprogs version: 1.41.6.sun1 * File join has been disabled in this release, refer to Bugzilla 16929. - * A new Lustre ADIO driver is available for MPICH2-1.0.7. * NFS export disabled when stack size < 8192. Since the NFSv4 export of Lustre filesystem with 4K stack may cause a stack overflow. For more information, please refer to bugzilla 17630. + * ext4 support for RHEL5 is experimental and thus should not be + used in production. + +Severity : enhancement +Bugzilla : 19847 +Description: Update kernel to SLES10 SP2 2.6.16.60-0.39.3. Severity : normal -Bugzilla : 19528 -Description: resolve race between obd_disconnect and class_disconnect_exports -Details : if obd_disconnect will be called to already disconnected export he - forget release one reference and osc module can't unloaded. +Frequency : with 1.8 server and 1.6 clients +Bugzilla : 20020 +Description: correctly shrink reply for avoid send too big message to client. +Details : 1.8 mds is allocate to big buffer to LOV EA data and this produce + some problems with sending this reply to 1.6 client. + +Severity : normal +Bugzilla : 19917 +Description: Repeated atomic allocation failures. +Details : Use GFP_HIGHUSER | __GFP_NOMEMALLOC flags for memory allocations + to generate memory pressure and allow reclaiming of inactive pages. + At the same time, do not allow to exhaust emergency pools. + For local clients the use of GFP_NOFS will be introduced in 1.8.2 Severity : enhancement -Bugzilla : 19293 -Description: move AT tunable parameters for more consistent usage -Details : add AT tunables under /proc/sys/lustre, add to conf_param parsing +Bugzilla : 19846, 18289 +Description: Update kernel to RHEL5 2.6.18-128.1.14.el5. Severity : enhancement -Bugzilla : 19024 -Description: Update kernel to RHEL5.3 2.6.18-128.1.6.el5. +Bugzilla : 19625, 16893, 18668, 19848 +Description: Add support for SLES11 2.6.27.23-0.1. + +Severity : enhancement +Bugzilla : 14250 +Description: Update client support to vanila kernels up to 2.6.27. Severity : enhancement Bugzilla : 19212 Description: Update kernel to SLES10 SP2 2.6.16.60-0.37. Severity : enhancement -Bugzilla : 12182 -Description: Caching OSS -Details : introduce data caching on the OSS. The OSS now relies on the linux - kernel page cache to keep recently accessed data in memory. - It is worth noting that all write requests are still flushed - synchronously as in lustre 1.6. +Bugzilla : 15981 +Description: Compile with -Werror by default for i686 and x86_64. -Severity : enhancement -Bugzilla : 10609 -Description: version based recovery -Details : introduce finer grained recovery able to detect transaction - dependencies and can deal with transaction gaps. +Severity : normal +Bugzilla : 19528 +Description: resolve race between obd_disconnect and class_disconnect_exports +Details : if obd_disconnect will be called to already disconnected export he + forget release one reference and osc module can't unloaded. Severity : enhancement -Bugzilla : 3055 -Description: Enable adaptive timeouts by default +Bugzilla : 19293 +Description: move AT tunable parameters for more consistent usage +Details : add AT tunables under /proc/sys/lustre, add to conf_param parsing Severity : normal Bugzilla : 19223 @@ -193,12 +150,6 @@ Descriptoin: correctly skip time estimate if in recovery Details : rq_send_state insn't bitmask so using bitwise ops is forbid. Severity : normal -Bugzilla : 18192 -Descriptoin: fix goal inodes -Details : Allocate inodes for llog in last inode group for avoid broken - recovery. - -Severity : normal Bugzilla : 18399 Descriptoin: OSS DeadLock Details : Use trylock to prevent deadlock when shrink icache. @@ -236,26 +187,18 @@ Severity : enhancement Bugzilla : 17536 Description: MDS create should not wait for statfs RPC while holding DLM lock. -Severity : enhancement -Bugzilla : 14250 -Description: Update client support to vanilla kernels up to 2.6.27. - Severity : normal Frequency : rare, connect and disconnect target at same time Bugzilla : 17310 Descriptoin: ASSERTION(atomic_read(&imp->imp_inflight) == 0 -Details : Don't call obd_disconnect under lov_lock. this long time +Details : don't call obd_disconnect under lov_lock. this long time operation and can block ptlrpcd which answer to connect request. -Severity : enhancement -Bugzilla : 18062 -Description: Update to sles9 kernel-2.6.5-7.315. - Severity : normal Frequency : start MDS on uncleanly shutdowned MDS device Bugzilla : 16839 Descriptoin: ll_sync thread stay in waiting mds<>ost recovery finished -Details : Stay in waiting mds<>ost recovery finished produce random bugs +Details : stay in waiting mds<>ost recovery finished produce random bugs due race between two ll_sync thread for one lov target. send ACTIVATE event only if connect realy finished and import have FULL state. @@ -264,17 +207,7 @@ Severity : normal Frequency : start MDS on uncleanly shutdowned MDS device Bugzilla : 18049 Descriptoin: aborting recovery hang on MDS -Details : Don't throttle destroy RPCs for the MDT. - -Severity : enhancement -Bugzilla : 16919 -Descriptoin: Don't sync journal after every i/o -Details : Implement write RPC replay to allow server replies for write RPCs - before data is on disk. However, this feature is disabled by - default since some issues leading to data corruptions have been - found during recovery (e.g. bug 19128). This feature can be - enabled by running the following command on the OSSs: - lctl set_param obdfilter.*.sync_journal=0 +Details : don't throttle destroy RPCs for the MDT. Severity : low Bugzilla : 18016 @@ -308,7 +241,7 @@ Details : Do not put cancelled locks into replay list, hold references on Severity : normal Bugzilla : 18577 -Description: 1.6.5 mdsrate performance is slower than 1.4.11 (MDS not cpu bound) +Description: 1.6.5 mdsrate performance is slower than 1.4.11/12 (MDS is not cpu bound!) Details : create_count always drops to the min value (=32) because grow_count is being changed before the precreate RPC completes. @@ -325,6 +258,20 @@ Description: MMP check in ext3_remount() fails without displaying any error Details : When multiple mount protection fails during remount, proper error should be returned +Severity : Low +Bugzilla : 15010 +Description: Rare Client crash on resend if the file was deleted. +Details : When file is opened, but open reply is lost and file is + subsequently deleted before resend, resend processing logic + breaks trying to open the file again, should not try to open. + +Severity : high +Bugzilla : 17569 +Description: add check for >8TB ldiskfs filesystems +Details : ext3-based ldiskfs does not support greater than 8TB LUNs. + Don't allow >8TB ldiskfs filesystems to be mounted without + force_over_8tb mount option + Severity : normal Bugzilla : 20011 Description: Client locked up when running multiple instances of an app. on @@ -338,7 +285,66 @@ Description: Cannot acces an NFS-mounted Lustre filesystem Details : An NFS client cannot access the Lustre filesystem NFS-mounted from a Lustre-client exporting the Lustre filesystem via NFS. +Severity : normal +Bugzilla : 20139 +Description: panic in ll_statahead_thread +Details : grab dentry reference in parent process. + ------------------------------------------------------------------------------- + +tbd Sun Microsystems, Inc. + * version 1.8.0.1 + * Support for kernels: + 2.6.16.60-0.31 (SLES 10), + 2.6.18-128.1.6.el5 (RHEL 5), + 2.6.22.14 vanilla (kernel.org) + * Client support for unpatched kernels: + (see http://wiki.lustre.org/index.php?title=Patchless_Client) + 2.6.16 - 2.6.22 vanilla (kernel.org) + * Recommended e2fsprogs version: 1.40.11-sun1 + * File join has been disabled, refer to Bugzilla 16929. + * A new Lustre ADIO driver is available for MPICH2-1.0.7. + * NFS export disabled when stack size < 8192. Since the NFSv4 export of + Lustre filesystem with 4K stack may cause a stack overflow. For more + information, please refer to bugzilla 17630. + +Severity : enhancement +Bugzilla : 19024 +Description: Update to RHEL5.3 kernel-2.6.18-128.1.6.el5. + +Severity : enhancement +Bugzilla : 17671 +Description: Update OFED release to 1.4.1 RC4 + +Severity : major, only with big OST +Bugzilla : 18518 +Description: Very poor metadata performance on Infiniband lustre configuration +Details : OST object precreation becomes very slow on big OSTs. This is due + to the ialloc patch spending too much time scanning groups. + +Severity : normal +Frequency : during recovery +Bugzilla : 18192 +Description: don't mix llog inodes with normal. +Details : allocate inodes for log in last inode group + +Severity : major +Frequency : rare +Bugzilla : 19495 +Description: fix lqs' reference which won't be put in some situations +Details : This patch fixes: + 1. In quota_check_common(), this function will check quota + for user and group, but only send one return via "pending". + In most cases, the pendings should be same. But that is not + always the case. + 2. If quotaoff runs between lquota_chkquota() and + lquota_pending_commit(), the same thing will happen too. + That is why it comes: + - if (!ll_sb_any_quota_active(qctxt->lqc_sb)) + - RETURN(0); + +------------------------------------------------------------------------------- + 2008-12-31 Sun Microsystems, Inc. * version 1.8.0 * Support for kernels: @@ -364,7 +370,7 @@ Details : An NFS client cannot access the Lustre filesystem NFS-mounted the following command on the MDS: 'tunefs.lustre --param="mdt.quota_type=ug1" $MDTDEV'. For more information, please refer to bugzilla 13904. - * A new quota file format was introduced in 1.6.6/1.8.0 (kernel 2.6.16+) + * A new quota file format was introduced in 1.6.6/1.8.0 (kernels 2.6.16+). The format conversion from prior releases is handled transparently, but releases older than 1.6.6/1.8.0 don't understand this new format. The automatic format conversion can be avoided by running @@ -385,45 +391,74 @@ Details : An NFS client cannot access the Lustre filesystem NFS-mounted information, please refer to bugzilla 17630. Severity : enhancement +Bugzilla : 12182 +Description: Caching OSS +Details : introduce data caching on the OSS. The OSS now relies on the linux + kernel page cache to keep recently accessed data in memory. + It is worth noting that all write requests are still flushed + synchronously as in lustre 1.6. + +Severity : enhancement +Bugzilla : 10609 +Description: Version based recovery +Details : introduce finer grained recovery able to detect transaction + dependencies and can deal with transaction gaps caused by clients + failing at the same time as the server. + +Severity : enhancement +Bugzilla : 3055 +Description: Enable adaptive timeouts by default +Details : The Lustre timeout value in /proc/sys/lustre/timeout is now + managed dynamically based on server load and should not need + to be tuned manually based on cluster size. This allows Lustre + to work under a wider variety of system sizes and loads, without + unnecessarily causing lengthy recovery times. + +Severity : enhancement +Bugzilla : 15899 +Description: Add OST Pools support +Details : File striping can now be set to use an arbitrary pool of OSTs + +Severity : enhancement Bugzilla : 17974 Description: add lazystatfs mount option to allow statfs(2) to skip down OSTs -Details : Allow skip disconnected ost for send statfs request and hide error - in this case. +Details : allow skip disconnected ost for send statfs request and hide error + in this case. Severity : normal Frequency : rare, on llog test 6 Bugzilla : 16839 Descriptoin: don't allow connect to already connected import -Details : Connect to already connected import hides connection problem. +Details : allowing connect to already connected import is hide connecting problem. Severity : normal Frequency : rare, on failed llog setup Bugzilla : 18896 Descriptoin: don't leak obd reference on failed llog setup -Details : For failed llog setup - mgc forget call class_destroy_import for - client import, move destroy import to more generic place. +Details : for failed llog setup - mgc forget call class_destroy_import for + client import, move destroy import to more generic place. Severity : normal Frequency : rare Bugzilla : 18902 Descriptoin: allow kill process which wait statahead result -Details : For some reasons 'ls' can stick in waiting result from statahead, - in this case need way for kill this process. +Details : for some reasons 'ls' can stick in waiting result from statahead, + in this case need way for kill this process. Severity : normal Frequency : rare, at shutdown Bugzilla : 18773 Descriptoin: panic at umount Details : llap_shrinker can be raced with killing super block from list and - this produce panic with access to already freeded pointer + this produce panic with access to already freeded pointer Severity : normal Frequency : rare Bugzilla : 18154 Descriptoin: don't lose wakeup for imp_recovery_waitq -Details : Recover_import_no_retry or invalidate_import and import_close can - both sleep on imp_recovery_waitq, but we was send only one wakeup - to sleep queue. +Details : recover_import_no_retry or invalidate_import and import_close can + both sleep on imp_recovery_waitq, but we was send only one wakeup + to sleep queue. Severity : normal Frequency : rare @@ -436,7 +471,7 @@ Frequency : rare Bugzilla : 17972 Descriptoin: stuck in cache_remove_extent() or panic with accessing to already freed look. -Details : Release lock refernce only after add page to pages list. +Details : release lock refernce only after add page to pages list. Severity : normal Frequency : always with long access acl @@ -509,7 +544,7 @@ Severity : major Frequency : rare Bugzilla : 16492 Description: mds is deadlocked -Details : In rare cases, inode in catalog can have i_no less than have parent +Details : in rare cases, inode in catalog can have i_no less than have parent i_no, this produce wrong order for locking during open, and parallel unlink can be lock open. this need teach mds_open to grab locks in resource id order, not at parent -> child order. @@ -574,14 +609,14 @@ Details : Only when device name contains ":/" will mount treat it as Severity : normal Bugzilla : 15927 Frequency : rare -Description: Replace ptlrpcd with the statahead thread to interpret the async +Description: replace ptlrpcd with the statahead thread to interpret the async statahead RPC callback Severity : normal Bugzilla : 16611 Frequency : on recovery Description: I/O failures after umount during fail back -Details : If client reconnected to restarted server we need join to recovery +Details : if client reconnected to restarted server we need join to recovery instead of find server handler is changed and process self eviction with cancel all locks. @@ -627,23 +662,19 @@ Severity : normal Bugzilla : 15139 Frequency : rare Description: avoid ASSERTION(client_stat->nid_exp_ref_count == 0) failed -Details : Release reference to stats when client disconnected, not +Details : release reference to stats when client disconnected, not when export destroyed for avoid races when client destroyed after main ost export. Severity : normal Bugzilla : 16679 Description: more cleanup in mds_lov -Details : Add workaround for get valid ost count for avoid warnings about +Details : add workaround for get valid ost count for avoid warnings about drop too big messages, not init llog cat under semphore which can be blocked on reconnect and break normal replay, fix access to wrong pointer. Severity : enhancement -Bugzilla : 15899 -Description: File striping can now be set to use an arbitrary pool of OSTs. - -Severity : enhancement Bugzilla : 16573 Description: Export bytes_read/bytes_write count on OSC/OST. @@ -656,7 +687,7 @@ Details : Apply the MGS_CONNECT_SUPPORTED mask at reconnect time so Severity : normal Bugzilla : 16006 Description: Properly propagate oinfo flags from lov to osc for statfs -Details : Restore missing copy oi_flags to lov requests. +Details : restore missing copy oi_flags to lov requests. Severity : normal Bugzilla : 16317 @@ -667,7 +698,7 @@ Severity : enhancement Bugzilla : 16581 Description: Add man pages for llobdstat(8), llstat(8), plot-llstat(8), : l_getgroups(8), lst(8), routerstat(8) -Details : Included man pages for llobdstat(8), llstat(8), +Details : included man pages for llobdstat(8), llstat(8), : plot-llstat(8), l_getgroups(8), lst(8), routerstat(8) Severity : enhancement @@ -685,12 +716,12 @@ Description: Update to RHEL4 kernel-2.6.9-67.0.22.EL. Severity : normal Bugzilla : 16317 Description: exports in /proc are broken -Details : Recreate /proc entries for clients when they reconnect. +Details : recreate /proc entries for clients when they reconnect. Severity : normal Bugzilla : 16080 Description: don't fail open with -ERANGE -Details : If client connected until mds will be know about real ost count +Details : if client connected until mds will be know about real ost count get LOV EA can be fail because mds not allocate enougth buffer for LOV EA. @@ -703,9 +734,9 @@ Details : Prevent proc handler from accessing devices added to the Severity : enhancement Bugzilla : 16091 Description: configure's --enable-quota should check the - kernel .config for CONFIG_QUOTA -Details : Configure is terminated if --enable-quota is passed but - no quota support is in kernel + : kernel .config for CONFIG_QUOTA +Details : configure is terminated if --enable-quota is passed but + : no quota support is in kernel Severity : enhancement Bugzilla : 15308 @@ -720,12 +751,12 @@ Bugzilla : 16318 Frequency : rare, on PPC clients Description: don't swab ost objects in response about directory, because this not exist. -Details : Bug similar bug 14856, but in different function. +Details : bug similar bug 14856, but in different function. Severity : enhancement Bugzilla : 15754 Description: lfs quota tool enhancement -Details : Added units specifiers support for setquota, default to +Details : added units specifiers support for setquota, default to current uid/gid for quota report, short quota stats by default, nonpositional parameters for setquota, added llapi_quotactl manual page. @@ -733,7 +764,7 @@ Details : Added units specifiers support for setquota, default to Severity : enhancement Bugzilla : 15625 Description: *optional* service tags registration -Details : If the "service tags" package is installed on a Lustre node +Details : if the "service tags" package is installed on a Lustre node When the filesystem is mounted, a local-node service tag will be created. See http://inventory.sun.com/ for more information about the Service Tags asset management system. @@ -762,7 +793,7 @@ Severity : normal Frequency : testing only Bugzilla : 12653 Description: sanity test 65a fails if stripecount of -1 is set -Details : Handle -1 striping on filesystem in ll_dirstripe_verify +Details : handle -1 striping on filesystem in ll_dirstripe_verify Severity : normal Frequency : only in unusual configurations @@ -778,7 +809,7 @@ Severity : major Frequency : rarely, if filesystem is mounted with -o flock Bugzilla : 15924 Description: do not process already freed flock -Details : Flock can possibly be freed by another thread before it reaches +Details : flock can possibly be freed by another thread before it reaches to ldlm_flock_completion_ast. Severity : normal @@ -791,7 +822,7 @@ Severity : minor Frequency : rarely, if binaries are being run from Lustre Bugzilla : 15837 Description: oops in page fault handler -Details : Kernel page fault handler can return two special 'pages' in +Details : kernel page fault handler can return two special 'pages' in error case, don't try dereference NOPAGE_SIGBUS and NOPAGE_OMM. Severity : minor @@ -807,14 +838,14 @@ Frequency : rarely Bugzilla : 14742 Frequency : rare Description: ASSERTION(CheckWriteback(page,cmd)) failed -Details : Badly clear PG_Writeback bit in ll_ap_completion can produce false +Details : badly clear PG_Writeback bit in ll_ap_completion can produce false positive assertion. Severity : normal Frequency : only with broken builds/installations Bugzilla : 15779 Description: no LBUG if lquota.ko and fsfilt_ldiskfs.ko are different versions -Details : Just return an error to a user, put a console error message. +Details : just return an error to a user, put a console error message Severity : enhancement Bugzilla : 15741 @@ -1170,6 +1201,15 @@ Details : With AT enabled, the recovery window can be excessively long (6000+ INITIAL_CONNECT_TIMEOUT now) and clients report the old service time via pb_service_time. +Bugzilla : 16919 +Descriptoin: Don't sync journal after every i/o +Details : Implement write RPC replay to allow server replies for write RPCs + before data is on disk. However, this feature is disabled by + default since some issues leading to data corruptions have been + found during recovery (e.g. bug 19128). This feature can be enabled + by running the following command on the OSSs: + lctl set_param obdfilter.*.sync_journal=0 + Severity : normal Bugzilla : 16522 Description: Watchdog triggered on MDS failover -- 1.8.3.1