From f91552d85cd086b37f78524113cadf799c045220 Mon Sep 17 00:00:00 2001 From: Artem Blagodarenko Date: Tue, 28 May 2019 19:51:21 +0300 Subject: [PATCH] LU-12345 ldiskfs: optimize nodelalloc mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit We found performance regression when using bigalloc with "nodelalloc" (1MB cluster size): 1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda 2. mount -o nodelalloc /dev/sda /test/ 3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024 The "dd" will cost about 2 seconds to finish, but if we mke2fs without "bigalloc", "dd" will only cost less than 1 second. The reason is: when using ext4 with "nodelalloc", it will call ext4_find_delalloc_cluster() nearly everytime it call ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan all pages in cluster because no buffer is "delayed". A cluster has 256 pages (1MB cluster), so it will scan 256 * 256k pags when creating a 1G file. That severely hurts the performance. Therefore, we return immediately from ext4_find_delalloc_range() in nodelalloc mode, since by definition there can't be any delalloc pages. The same optimization also added for ldiskfs_find_delayed_extent() function that improve performance dromaticaly. Here is results of testing on two node system. Without the patch: avg-cpu:  %user   %nice %system %iowait  %steal   %idle            0.00    0.00   56.30    0.06    0.00   43.63 Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util sds               0.00     0.00    0.00 1174.00     0.00     4.59 8.00     0.84    0.71    0.00    0.71   0.01   1.20 With patch: 08/29/2018 01:13:22 AM avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 4.13 30.37 0.00 65.50 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sds 0.00 0.00 0.00 54117.82 0.00 211.43 8.00 152.59 2.82 0.00 2.82 0.02 99.01 Lustre-change: https://review.whamcloud.com/34982 Lustre-commit: af48ae8bff289b2bc083a888efeafa3c48df91e2 Cray-bug-id: LUS-5835 Signed-off-by: Artem Blagodarenko Change-Id: Ie33410d4481778ee4f76a054ab8cfc11cc19a0ed Reviewed-by: Andreas Dilger Reviewed-by: Alex Zhuravlev Reviewed-by: Li Dongyang Signed-off-by: Minh Diep Reviewed-on: https://review.whamcloud.com/37538 Tested-by: jenkins Tested-by: Maloo --- ...ze-ext4_find_delalloc_range-in-nodelalloc.patch | 54 ++++++++++++++++++++++ .../series/ldiskfs-3.10-rhel7.2.series | 1 + .../series/ldiskfs-3.10-rhel7.3.series | 1 + .../series/ldiskfs-3.10-rhel7.4.series | 1 + .../series/ldiskfs-3.10-rhel7.5.series | 1 + .../series/ldiskfs-3.10-rhel7.6.series | 1 + .../series/ldiskfs-3.10-rhel7.7.series | 1 + .../series/ldiskfs-3.10-rhel7.series | 1 + .../series/ldiskfs-3.12-sles12.series | 1 + .../series/ldiskfs-3.12-sles12sp1.series | 1 + .../series/ldiskfs-4.4-sles12sp2.series | 1 + .../series/ldiskfs-4.4-sles12sp3.series | 1 + .../series/ldiskfs-4.4.0-45-ubuntu14+16.series | 1 + .../series/ldiskfs-4.4.0-49-ubuntu14+16.series | 1 + .../series/ldiskfs-4.4.0-62-ubuntu14+16.series | 1 + .../series/ldiskfs-4.4.0-73-ubuntu14+16.series | 1 + 16 files changed, 69 insertions(+) create mode 100644 ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch b/ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch new file mode 100644 index 0000000..08b857f --- /dev/null +++ b/ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch @@ -0,0 +1,54 @@ +From 8c48f7e88e293b9dd422bd8884842aea85d30b22 +Subject: [PATCH] ext4: optimize ext4_find_delalloc_range() in nodelalloc mode + +We found performance regression when using bigalloc with "nodelalloc" +(1MB cluster size): + +1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda +2. mount -o nodelalloc /dev/sda /test/ +3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024 + +The "dd" will cost about 2 seconds to finish, but if we mke2fs without +"bigalloc", "dd" will only cost less than 1 second. + +The reason is: when using ext4 with "nodelalloc", it will call +ext4_find_delalloc_cluster() nearly everytime it call +ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan +all pages in cluster because no buffer is "delayed". A cluster has +256 pages (1MB cluster), so it will scan 256 * 256k pags when creating +a 1G file. That severely hurts the performance. + +Therefore, we return immediately from ext4_find_delalloc_range() in +nodelalloc mode, since by definition there can't be any delalloc +pages. + +Signed-off-by: Robin Dong +Signed-off-by: "Theodore Ts'o" +--- + fs/ext4/extents.c | 3 +++ + 1 file changed, 3 insertions(+) + +Index: linux-stage/fs/ext4/extents.c +=================================================================== +--- linux-stage.orig/fs/ext4/extents.c ++++ linux-stage/fs/ext4/extents.c +@@ -3909,6 +3909,9 @@ int ext4_find_delalloc_range(struct inod + { + struct extent_status es; + ++ if (!test_opt(inode->i_sb, DELALLOC)) ++ return 0; ++ + ext4_es_find_delayed_extent_range(inode, lblk_start, lblk_end, &es); + if (es.es_len == 0) + return 0; /* there is no delay extent in this tree */ +@@ -5115,6 +5118,9 @@ static int ext4_find_delayed_extent(stru + struct extent_status es; + ext4_lblk_t block, next_del; + ++ if (!test_opt(inode->i_sb, DELALLOC)) ++ return 0; ++ + if (newes->es_pblk == 0) { + ext4_es_find_delayed_extent_range(inode, newes->es_lblk, + newes->es_lblk + newes->es_len - 1, &es); diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.2.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.2.series index 530cb3a..ac83b2f 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.2.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.2.series @@ -37,3 +37,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.3.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.3.series index 53a3078..3dfecb7 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.3.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.3.series @@ -37,3 +37,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.4.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.4.series index b78efbc..66035ef 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.4.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.4.series @@ -37,3 +37,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.5.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.5.series index ad3e2bd..6c4e7dd 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.5.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.5.series @@ -37,3 +37,4 @@ rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch rhel7.2/ext4-export-mb-stream-allocator-variables.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.6.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.6.series index 42f4bc6..d10b17f 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.6.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.6.series @@ -37,4 +37,5 @@ rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch rhel7.2/ext4-export-mb-stream-allocator-variables.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch rhel7.2/ext4-simple-blockalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.7.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.7.series index be6bc14..ca67087 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.7.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.7.series @@ -37,4 +37,5 @@ rhel7/ext4-export-orphan-add.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch rhel7.2/ext4-export-mb-stream-allocator-variables.patch rhel7.7/ext4-fix-project-with-unpatched-kernel.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch rhel7.2/ext4-simple-blockalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.series b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.series index 5066e5a..2f67631 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.series @@ -32,3 +32,4 @@ rhel7/ext4-reduce-lock-contention-in-__ext4_new_inode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12.series b/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12.series index b35c55c..6424192 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12.series @@ -23,3 +23,4 @@ rhel7/ext4-jcb-optimization.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12sp1.series b/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12sp1.series index ae72316..ceacea6 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12sp1.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12sp1.series @@ -24,3 +24,4 @@ sles12sp1/ext4-attach-jinode-in-writepages.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp2.series b/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp2.series index 5aff249..6b6e2d5 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp2.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp2.series @@ -29,3 +29,4 @@ rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch sles12sp2/ext4-export-mb-stream-allocator-variables.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp3.series b/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp3.series index 9f32a73..e247d1a 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp3.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp3.series @@ -28,3 +28,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch sles12sp2/ext4-export-mb-stream-allocator-variables.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-45-ubuntu14+16.series b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-45-ubuntu14+16.series index 2617d3f..4b801ec 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-45-ubuntu14+16.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-45-ubuntu14+16.series @@ -25,3 +25,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-49-ubuntu14+16.series b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-49-ubuntu14+16.series index f7ab192..f0cd928 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-49-ubuntu14+16.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-49-ubuntu14+16.series @@ -25,3 +25,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-62-ubuntu14+16.series b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-62-ubuntu14+16.series index 072fe21..e9dcd34 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-62-ubuntu14+16.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-62-ubuntu14+16.series @@ -25,3 +25,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch diff --git a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-73-ubuntu14+16.series b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-73-ubuntu14+16.series index 934c97e..a688394 100644 --- a/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-73-ubuntu14+16.series +++ b/ldiskfs/kernel_patches/series/ldiskfs-4.4.0-73-ubuntu14+16.series @@ -26,3 +26,4 @@ rhel7/ext4-export-orphan-add.patch rhel7/ext4-mmp-dont-mark-bh-dirty.patch rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch sles12sp2/ext4-export-mb-stream-allocator-variables.patch +rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch -- 1.8.3.1