From e5f2b5bcacf90cd6c4d777cef4ab255923fb3d02 Mon Sep 17 00:00:00 2001 From: adilger Date: Tue, 14 Apr 2009 07:17:51 +0000 Subject: [PATCH] Branch b_release_1_8_0 Description: fix racy locking of mballoc block bitmaps causing BUG Details : The locking of the mballoc buddy bitmap and the in-memory block bitmap was using two different spin locks in some cases. This made it possible to incorrectly access the mballoc bitmap while another process was modifying it, causing a sanity assertion to fail. While no on-disk corruption was reported, there was some risk of this happening. b=18810 i=alex Update ChangeLog for missing entries. Update ldiskfs build version to 3.0.8 for 1.8.0 release. --- ldiskfs/ChangeLog | 98 +++++++++++++++++++--- ldiskfs/configure.ac | 2 +- .../patches/ext3-mballoc3-core.patch | 66 ++++++++------- .../patches/ext3-uninit-2.6-sles10.patch | 2 +- .../patches/ext3-uninit-2.6-suse.patch | 2 +- .../patches/ext3-uninit-2.6.18.patch | 2 +- .../patches/ext3-uninit-2.6.22-vanilla.patch | 4 +- .../kernel_patches/patches/ext3-uninit-2.6.9.patch | 2 +- 8 files changed, 132 insertions(+), 46 deletions(-) diff --git a/ldiskfs/ChangeLog b/ldiskfs/ChangeLog index decf9bc..aa8bbec 100644 --- a/ldiskfs/ChangeLog +++ b/ldiskfs/ChangeLog @@ -1,11 +1,75 @@ -2009-03-20 Sun Microsystems, Inc. +2009-04-20 Sun Microsystems, Inc. * version 3.0.8 -Severity : enhancement +Severity : minor Bugzilla : 16114 -Description: return EXT_UNSET_BLOCK from ext3_ext_next_leaf_block() -Details : With 16TB-1 filesystem, 0xffffffff is valid block number so - EXT_UNSET_BLOCK was introduced for ext3_ext_next_leaf_block() +Description: minor fixes and cleanups +Details : use EXT_UNSET_BLOCK to avoid confusion with EXT_MAX_BLOCK. + Initialize 'ix' variable in extents patch to stop compiler warning. + +Severity : feature +Bugzilla : 17942 +Description: update FIEMAP ioctl to match upstream kernel version +Details : the FIEMAP block-mapping ioctl had a prototype version in + ldiskfs 3.0.7 but this release updates it to match the + interface in the upstream kernel, with a new ioctl number. + +Severity : normal +Frequency : only if MMP is active and detects filesystem is in use +Bugzilla : 18173 +Description: if MMP startup fails, an oops is triggered +Details : if ldiskfs mounting doesn't succeed the error handling doesn't + clean up the MMP data correctly, causing an oops. + +------------------------------------------------------------------------------- + +2009-04-06 Sun Microsystems, Inc. + * version 3.0.7.1 + +Severity : major +Frequency : rare +Bugzilla : 18810 +Description: fix racy locking of mballoc block bitmaps causing BUG +Details : The locking of the mballoc buddy bitmap and the in-memory + block bitmap was using two different spin locks in some + cases. This made it possible to incorrectly access the + mballoc bitmap while another process was modifying it, + causing a sanity assertion to fail. While no on-disk corruption + was reported, there was some risk of this happening. + +------------------------------------------------------------------------------- + +2009-02-07 Sun Microsystems, Inc. + * version 3.0.7 + +Severity : enhancement +Bugzilla : 16498 +Description: Get RAID stripe size from superblock +Details : RAID striping parameters are now saved in the superblock itself, + so we should use these parameters instead of having to specify + a mount option each time. + +Severity : major +Frequency : only if server is running on unsupported big-endian machine +Bugzilla : 16438 +Description: Disable big-endian ldiskfs server support. +Details : The ldiskfs code is not tested on big-endian machines, and + there are known compatibility problems in the extents code + when running for most of the kernels. Print an error message + and refuse to mount, in case anyone tests this. For existing + filesystems that might have been created in this way it is + possible to mount with the "bigendian_extents" option to + force the mount. + +------------------------------------------------------------------------------- + +2008-08-31 Sun Microsystems, Inc. + * version 3.0.6 + +Severity : enhancement +Bugzilla : 11826 +Description: Interoperability at server side (Disk interoperability) +Details : Exported some ldiskfs functions which are required for iop Severity : normal Bugzilla : 15320 @@ -25,12 +89,20 @@ Description: ldiskfs error: XXX blocks in bitmap, YYY in gd Details : If blocks per group is less than blocksize*8, set rest of the bitmap to 1. +Severity : normal +Frequency : only for filesystems larger than 8TB +Bugzilla : 16101 +Description: ldiskfs BUG ldiskfs_mb_use_best_found() +Details : The ldiskfs mballoc3 code was using a __u16 to store the group + number, but with 8TB+ filesystems there are more than 65536 + groups, causing an oops. + Severity : enhancement Bugzilla : 10555 Description: Add a FIEMAP(FIle Extent MAP) ioctl for ldiskfs Details : FIEMAP ioctl will allow an application to efficiently fetch the - extent information of a file. It can be used to map logical blocks - in a file to physical blocks in the block device. + extent information of a file. It can be used to map logical blocks + in a file to physical blocks in the block device. Severity : normal Bugzilla : 16498 @@ -41,13 +113,13 @@ Details : RAID striping parameters are now saved in the superblock itself, Severity : normal Bugzilla : 17490 -Description: mkfs.lustre: Unable to mount /dev/cciss/c0d1: Cannot allocate memory +Description: mkfs.lustre: Unable to mount /dev/cciss/c0d1:Cannot allocate memory Details : correctly handle device paths using a subdirectory in /dev when creating the per-device procfs directory under /proc/fs/ldiskfs. ------------------------------------------------------------------------------- -04-26-2008 Sun Microsystems, Inc. +2008-04-26 Sun Microsystems, Inc. * version 3.0.5 Severity : normal @@ -81,10 +153,16 @@ Details : If there are multiple extended attributes stored on the inode, moved to the external attribute block (e.g. ACL growing in size) for the attribute to be lost. +Severity : normal +Bugzilla : 15604 +Description: inode version not being initialized on new inodes +Details : The inode i_version field was not being initialized on disk. + This field is currently unused but will be needed for VBR. + -------------------------------------------------------------------------------- 2008-01-11 Sun Microsystems, Inc. - * version 3.0.4 + * version 3.0.4 Severity : normal Bugzilla : 13397 diff --git a/ldiskfs/configure.ac b/ldiskfs/configure.ac index a673712..e7b19c1 100644 --- a/ldiskfs/configure.ac +++ b/ldiskfs/configure.ac @@ -1,6 +1,6 @@ # Process this file with autoconf to produce a configure script. -AC_INIT([Lustre ldiskfs], 3.0.6, [https://bugzilla.lustre.org/]) +AC_INIT([Lustre ldiskfs], 3.0.8, [https://bugzilla.lustre.org/]) AC_CONFIG_SRCDIR([lustre-ldiskfs.spec.in]) # Don't look for install-sh, etc. in .. diff --git a/ldiskfs/kernel_patches/patches/ext3-mballoc3-core.patch b/ldiskfs/kernel_patches/patches/ext3-mballoc3-core.patch index fa8b4ae..197e8cc 100644 --- a/ldiskfs/kernel_patches/patches/ext3-mballoc3-core.patch +++ b/ldiskfs/kernel_patches/patches/ext3-mballoc3-core.patch @@ -288,10 +288,10 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c =================================================================== --- /dev/null +++ linux-2.6.22.19/fs/ext3/mballoc.c -@@ -0,0 +1,4475 @@ +@@ -0,0 +1,4483 @@ +/* -+ * Copyright 2008 Sun Microsystems, Inc. -+ * Written by Alex Tomas ++ * Copyright 2009 Sun Microsystems, Inc. ++ * Written by Alex Zhuravlev + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as @@ -1456,7 +1456,10 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + cur += 32; + continue; + } -+ mb_clear_bit_atomic(lock, cur, bm); ++ if (lock) ++ mb_clear_bit_atomic(lock, cur, bm); ++ else ++ mb_clear_bit(cur, bm); + cur++; + } +} @@ -1474,7 +1477,10 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + cur += 32; + continue; + } -+ mb_set_bit_atomic(lock, cur, bm); ++ if (lock) ++ mb_set_bit_atomic(lock, cur, bm); ++ else ++ mb_set_bit(cur, bm); + cur++; + } +} @@ -1628,6 +1634,7 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + BUG_ON(start + len > (e3b->bd_sb->s_blocksize << 3)); + BUG_ON(e3b->bd_group != ex->fe_group); + BUG_ON(!ext3_is_group_locked(e3b->bd_sb, e3b->bd_group)); ++ spin_lock(sb_bgl_lock(EXT3_SB(e3b->bd_sb), ex->fe_group)); + mb_check_buddy(e3b); + mb_mark_used_double(e3b, start, len); + @@ -1681,9 +1688,9 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + e3b->bd_info->bb_counters[ord]++; + } + -+ mb_set_bits(sb_bgl_lock(EXT3_SB(e3b->bd_sb), ex->fe_group), -+ EXT3_MB_BITMAP(e3b), ex->fe_start, len0); ++ mb_set_bits(NULL, EXT3_MB_BITMAP(e3b), ex->fe_start, len0); + mb_check_buddy(e3b); ++ spin_unlock(sb_bgl_lock(EXT3_SB(e3b->bd_sb), ex->fe_group)); + + return ret; +} @@ -3244,6 +3251,8 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + ext3_error(sb, __FUNCTION__, + "Allocating block in system zone - block = %lu", + (unsigned long) block); ++ ext3_lock_group(sb, ac->ac_b_ex.fe_group); ++ spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); +#ifdef AGGRESSIVE_CHECK + { + int i; @@ -3253,15 +3262,15 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + } + } +#endif -+ mb_set_bits(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group), bitmap_bh->b_data, ++ mb_set_bits(NULL, bitmap_bh->b_data, + ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); + -+ spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + gdp->bg_free_blocks_count = + cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) + - ac->ac_b_ex.fe_len); + spin_unlock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + percpu_counter_mod(&sbi->s_freeblocks_counter, - ac->ac_b_ex.fe_len); ++ ext3_unlock_group(sb, ac->ac_b_ex.fe_group); + + err = ext3_journal_dirty_metadata(handle, bitmap_bh); + if (err) @@ -3613,6 +3622,7 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + unsigned short max = EXT3_BLOCKS_PER_GROUP(sb); + unsigned short i, first, free = 0; + ++ spin_lock(sb_bgl_lock(EXT3_SB(sb), group)); + i = mb_find_next_zero_bit(bitmap, max, 0); + + while (i < max) { @@ -3626,11 +3636,13 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + } + + if (free != le16_to_cpu(gdp->bg_free_blocks_count)) { ++ spin_unlock(sb_bgl_lock(EXT3_SB(sb), group)); + ext3_error(sb, __FUNCTION__, "on-disk bitmap for group %d" + "corrupted: %u blocks free in bitmap, %u - in gd\n", + group, free, le16_to_cpu(gdp->bg_free_blocks_count)); + return -EIO; + } ++ spin_unlock(sb_bgl_lock(EXT3_SB(sb), group)); + return 0; +} + @@ -4566,7 +4578,6 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + BUG_ON(e3b->bd_bitmap_page == NULL); + BUG_ON(e3b->bd_buddy_page == NULL); + -+ ext3_lock_group(sb, group); + for (i = 0; i < count; i++) { + md = db->bb_md_cur; + if (md && db->bb_tid != handle->h_transaction->t_tid) { @@ -4611,7 +4622,6 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + db->bb_md_cur = NULL; + } + } -+ ext3_unlock_group(sb, group); + return 0; +} + @@ -4704,6 +4714,8 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + if (err) + goto error_return; + ++ ext3_lock_group(sb, block_group); ++ spin_lock(sb_bgl_lock(sbi, block_group)); +#ifdef AGGRESSIVE_CHECK + { + int i; @@ -4711,35 +4723,31 @@ Index: linux-2.6.22.19/fs/ext3/mballoc.c + BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data)); + } +#endif -+ mb_clear_bits(sb_bgl_lock(sbi, block_group), bitmap_bh->b_data, bit, -+ count); -+ -+ /* We dirtied the bitmap block */ -+ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); -+ err = ext3_journal_dirty_metadata(handle, bitmap_bh); -+ -+ ac.ac_b_ex.fe_group = block_group; -+ ac.ac_b_ex.fe_start = bit; -+ ac.ac_b_ex.fe_len = count; -+ ext3_mb_store_history(&ac); ++ mb_clear_bits(NULL, bitmap_bh->b_data, bit, count); ++ gdp->bg_free_blocks_count = ++ cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) + count); ++ spin_unlock(sb_bgl_lock(sbi, block_group)); ++ percpu_counter_mod(&sbi->s_freeblocks_counter, count); + + if (metadata) { + /* blocks being freed are metadata. these blocks shouldn't + * be used until this transaction is committed */ + ext3_mb_free_metadata(handle, &e3b, block_group, bit, count); + } else { -+ ext3_lock_group(sb, block_group); + err = mb_free_blocks(inode, &e3b, bit, count); + ext3_mb_return_to_preallocation(inode, &e3b, block, count); -+ ext3_unlock_group(sb, block_group); + BUG_ON(err != 0); + } ++ ext3_unlock_group(sb, block_group); + -+ spin_lock(sb_bgl_lock(sbi, block_group)); -+ gdp->bg_free_blocks_count = -+ cpu_to_le16(le16_to_cpu(gdp->bg_free_blocks_count) + count); -+ spin_unlock(sb_bgl_lock(sbi, block_group)); -+ percpu_counter_mod(&sbi->s_freeblocks_counter, count); ++ ac.ac_b_ex.fe_group = block_group; ++ ac.ac_b_ex.fe_start = bit; ++ ac.ac_b_ex.fe_len = count; ++ ext3_mb_store_history(&ac); ++ ++ /* We dirtied the bitmap block */ ++ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); ++ err = ext3_journal_dirty_metadata(handle, bitmap_bh); + + ext3_mb_release_desc(&e3b); + diff --git a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-sles10.patch b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-sles10.patch index 11f1ac0..d001841 100644 --- a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-sles10.patch +++ b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-sles10.patch @@ -535,9 +535,9 @@ Index: linux-2.6.16.60-0.27/fs/ext3/mballoc.c &meta_group_info[j]->bb_state); @@ -2945,9 +2957,17 @@ int ext3_mb_mark_diskspace_used(struct e + mb_set_bits(NULL, bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + if (gdp->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT3_BG_BLOCK_UNINIT); + gdp->bg_free_blocks_count = diff --git a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-suse.patch b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-suse.patch index 9d15162..6bef8a5 100644 --- a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-suse.patch +++ b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6-suse.patch @@ -498,9 +498,9 @@ Index: linux-2.6.5-7.311/fs/ext3/mballoc.c &meta_group_info[j]->bb_state); @@ -2945,9 +2957,17 @@ int ext3_mb_mark_diskspace_used(struct e + mb_set_bits(NULL, bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + if (gdp->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT3_BG_BLOCK_UNINIT); + gdp->bg_free_blocks_count = diff --git a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.18.patch b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.18.patch index ab150a1..e184d14 100644 --- a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.18.patch +++ b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.18.patch @@ -535,9 +535,9 @@ Index: linux-2.6.18-53.1.14/fs/ext3/mballoc.c &meta_group_info[j]->bb_state); @@ -2943,9 +2955,17 @@ int ext3_mb_mark_diskspace_used(struct e + mb_set_bits(NULL, bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + if (gdp->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT3_BG_BLOCK_UNINIT); + gdp->bg_free_blocks_count = diff --git a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.22-vanilla.patch b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.22-vanilla.patch index c4fa59e..069f1ba 100644 --- a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.22-vanilla.patch +++ b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.22-vanilla.patch @@ -535,9 +535,9 @@ Index: linux-2.6.22.14/fs/ext3/mballoc.c &meta_group_info[j]->bb_state); @@ -2945,9 +2957,17 @@ int ext3_mb_mark_diskspace_used(struct e - ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); + mb_set_bits(NULL, bitmap_bh->b_data, + ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + if (gdp->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT3_BG_BLOCK_UNINIT); + gdp->bg_free_blocks_count = diff --git a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.9.patch b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.9.patch index aa7effc..2759377 100644 --- a/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.9.patch +++ b/ldiskfs/kernel_patches/patches/ext3-uninit-2.6.9.patch @@ -535,9 +535,9 @@ Index: linux-2.6.9-67.0.15/fs/ext3/mballoc.c &meta_group_info[j]->bb_state); @@ -2945,9 +2957,17 @@ int ext3_mb_mark_diskspace_used(struct e + mb_set_bits(NULL, bitmap_bh->b_data, ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len); - spin_lock(sb_bgl_lock(sbi, ac->ac_b_ex.fe_group)); + if (gdp->bg_flags & cpu_to_le16(EXT3_BG_BLOCK_UNINIT)) { + gdp->bg_flags &= cpu_to_le16(~EXT3_BG_BLOCK_UNINIT); + gdp->bg_free_blocks_count = -- 1.8.3.1