Whamcloud - gitweb
LU-12345 ldiskfs: optimize nodelalloc mode 82/34982/2
authorArtem Blagodarenko <artem.blagodarenko@seagate.com>
Tue, 28 May 2019 16:51:21 +0000 (19:51 +0300)
committerOleg Drokin <green@whamcloud.com>
Sat, 1 Jun 2019 03:58:52 +0000 (03:58 +0000)
We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):

1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024

The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.

The reason is: when using ext4 with "nodelalloc", it will call
ext4_find_delalloc_cluster() nearly everytime it call
ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
all pages in cluster because no buffer is "delayed".  A cluster has
256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
a 1G file. That severely hurts the performance.

Therefore, we return immediately from ext4_find_delalloc_range() in
nodelalloc mode, since by definition there can't be any delalloc
pages.

The same optimization also added for ldiskfs_find_delayed_extent()
function that improve performance dromaticaly.

Here is results of testing on two node system.
Without the patch:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   56.30    0.06    0.00   43.63

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sds               0.00     0.00    0.00 1174.00     0.00     4.59
8.00     0.84    0.71    0.00    0.71   0.01   1.20

With patch:
08/29/2018 01:13:22 AM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
0.00    0.00    4.13   30.37    0.00   65.50

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s      wMB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm %util
sds               0.00     0.00    0.00 54117.82     0.00     211.43
8.00   152.59    2.82    0.00    2.82   0.02 99.01

Cray-bug-id: LUS-5835
Signed-off-by: Artem Blagodarenko <c17828@cray.com>
Change-Id: Ie33410d4481778ee4f76a054ab8cfc11cc19a0ed
Reviewed-on: https://review.whamcloud.com/34982
Tested-by: Jenkins
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 files changed:
ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch [new file with mode: 0644]
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.2.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.3.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.4.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.5.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.6.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.series
ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12.series
ldiskfs/kernel_patches/series/ldiskfs-3.12-sles12sp1.series
ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp2.series
ldiskfs/kernel_patches/series/ldiskfs-4.4-sles12sp3.series
ldiskfs/kernel_patches/series/ldiskfs-4.4.0-45-ubuntu14+16.series
ldiskfs/kernel_patches/series/ldiskfs-4.4.0-49-ubuntu14+16.series
ldiskfs/kernel_patches/series/ldiskfs-4.4.0-62-ubuntu14+16.series
ldiskfs/kernel_patches/series/ldiskfs-4.4.0-73-ubuntu14+16.series

diff --git a/ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch b/ldiskfs/kernel_patches/patches/rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
new file mode 100644 (file)
index 0000000..08b857f
--- /dev/null
@@ -0,0 +1,54 @@
+From 8c48f7e88e293b9dd422bd8884842aea85d30b22 
+Subject: [PATCH] ext4: optimize ext4_find_delalloc_range() in nodelalloc mode
+
+We found performance regression when using bigalloc with "nodelalloc"
+(1MB cluster size):
+
+1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
+2. mount -o nodelalloc /dev/sda /test/
+3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
+
+The "dd" will cost about 2 seconds to finish, but if we mke2fs without
+"bigalloc", "dd" will only cost less than 1 second.
+
+The reason is: when using ext4 with "nodelalloc", it will call
+ext4_find_delalloc_cluster() nearly everytime it call
+ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
+all pages in cluster because no buffer is "delayed".  A cluster has
+256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
+a 1G file. That severely hurts the performance.
+
+Therefore, we return immediately from ext4_find_delalloc_range() in
+nodelalloc mode, since by definition there can't be any delalloc
+pages.
+
+Signed-off-by: Robin Dong <sanbai@taobao.com>
+Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
+---
+ fs/ext4/extents.c |    3 +++
+ 1 file changed, 3 insertions(+)
+
+Index: linux-stage/fs/ext4/extents.c
+===================================================================
+--- linux-stage.orig/fs/ext4/extents.c
++++ linux-stage/fs/ext4/extents.c
+@@ -3909,6 +3909,9 @@ int ext4_find_delalloc_range(struct inod
+ {
+       struct extent_status es;
++      if (!test_opt(inode->i_sb, DELALLOC))
++              return 0;
++
+       ext4_es_find_delayed_extent_range(inode, lblk_start, lblk_end, &es);
+       if (es.es_len == 0)
+               return 0; /* there is no delay extent in this tree */
+@@ -5115,6 +5118,9 @@ static int ext4_find_delayed_extent(stru
+       struct extent_status es;
+       ext4_lblk_t block, next_del;
++      if (!test_opt(inode->i_sb, DELALLOC))
++              return 0;
++
+       if (newes->es_pblk == 0) {
+               ext4_es_find_delayed_extent_range(inode, newes->es_lblk,
+                               newes->es_lblk + newes->es_len - 1, &es);
index 530cb3a..ac83b2f 100644 (file)
@@ -37,3 +37,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 53a3078..3dfecb7 100644 (file)
@@ -37,3 +37,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index b78efbc..66035ef 100644 (file)
@@ -37,3 +37,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index ad3e2bd..6c4e7dd 100644 (file)
@@ -37,3 +37,4 @@ rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7.2/ext4-export-mb-stream-allocator-variables.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7.2/ext4-export-mb-stream-allocator-variables.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 4452882..c89262a 100644 (file)
@@ -37,3 +37,4 @@ rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7.2/ext4-export-mb-stream-allocator-variables.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7.2/ext4-export-mb-stream-allocator-variables.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 5066e5a..2f67631 100644 (file)
@@ -32,3 +32,4 @@ rhel7/ext4-reduce-lock-contention-in-__ext4_new_inode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index b35c55c..6424192 100644 (file)
@@ -23,3 +23,4 @@ rhel7/ext4-jcb-optimization.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index ae72316..ceacea6 100644 (file)
@@ -24,3 +24,4 @@ sles12sp1/ext4-attach-jinode-in-writepages.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 5aff249..6b6e2d5 100644 (file)
@@ -29,3 +29,4 @@ rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 sles12sp2/ext4-export-mb-stream-allocator-variables.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 sles12sp2/ext4-export-mb-stream-allocator-variables.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 9f32a73..e247d1a 100644 (file)
@@ -28,3 +28,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 sles12sp2/ext4-export-mb-stream-allocator-variables.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 sles12sp2/ext4-export-mb-stream-allocator-variables.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 2617d3f..4b801ec 100644 (file)
@@ -25,3 +25,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index f7ab192..f0cd928 100644 (file)
@@ -25,3 +25,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 072fe21..e9dcd34 100644 (file)
@@ -25,3 +25,4 @@ rhel7/ext4-use-GFP_NOFS-in-ext4_inode_attach_jinode.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch
index 934c97e..a688394 100644 (file)
@@ -26,3 +26,4 @@ rhel7/ext4-export-orphan-add.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 sles12sp2/ext4-export-mb-stream-allocator-variables.patch
 rhel7/ext4-mmp-dont-mark-bh-dirty.patch
 rhel7/ext4-include-terminating-u32-in-size-of-xattr-entries-when-expanding-inodes.patch
 sles12sp2/ext4-export-mb-stream-allocator-variables.patch
+rhel7/ext4-optimize-ext4_find_delalloc_range-in-nodelalloc.patch