Whamcloud - gitweb
LU-9796 ldiskfs: improve inode allocation performance 95/32295/2
authorWang Shilong <wshilong@ddn.com>
Thu, 26 Oct 2017 07:03:52 +0000 (15:03 +0800)
committerJohn L. Hammond <john.hammond@intel.com>
Mon, 11 Jun 2018 20:04:25 +0000 (20:04 +0000)
commitf27584430fc8b1379a4f6f064b9b201da8deec92
treea23bb29f967c8456378d5736cce7551f448657e7
parent08876238836886bc6ad9b90c5ff50bdde87e2849
LU-9796 ldiskfs: improve inode allocation performance

Backport following upstream patches:

------
ext4: cleanup goto next group

avoid duplicated codes, also we need goto
next group in case we found reserved inode.
------

ext4: reduce lock contention in __ext4_new_inode

While running number of creating file threads concurrently,
we found heavy lock contention on group spinlock:

FUNC                           TOTAL_TIME(us)       COUNT        AVG(us)
ext4_create                    1707443399           1440000      1185.72
_raw_spin_lock                 1317641501           180899929    7.28
jbd2__journal_start            287821030            1453950      197.96
jbd2_journal_get_write_access  33441470             73077185     0.46
ext4_add_nondir                29435963             1440000      20.44
ext4_add_entry                 26015166             1440049      18.07
ext4_dx_add_entry              25729337             1432814      17.96
ext4_mark_inode_dirty          12302433             5774407      2.13

most of cpu time blames to _raw_spin_lock, here is some testing
numbers with/without patch.

Test environment:
Server : SuperMicro Sever (2 x E5-2690 v3@2.60GHz, 128GB 2133MHz
         DDR4 Memory, 8GbFC)
Storage : 2 x RAID1 (DDN SFA7700X, 4 x Toshiba PX02SMU020 200GB
          Read Intensive SSD)

format command:
        mkfs.ext4 -J size=4096

test command:
        mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
                -r -i 1 -v -p 10 -u #first run to load inode

        mpirun -np 48 mdtest -n 30000 -d /ext4/mdtest.out -F -C \
                -r -i 3 -v -p 10 -u

Kernel version: 4.13.0-rc3

Test  1,440,000 files with 48 directories by 48 processes:

Without patch:

File Creation   File removal
79,033          289,569 ops/per second
81,463          285,359
79,875          288,475

With patch:
File Creation   File removal
810669          301694
812805          302711
813965          297670

Creation performance is improved more than 10X with large
journal size. The main problem here is we test bitmap
and do some check and journal operations which could be
slept, then we test and set with lock hold, this could
be racy, and make 'inode' steal by other process.

However, after first try, we could confirm handle has
been started and inode bitmap journaled too, then
we could find and set bit with lock hold directly, this
will mostly gurateee success with second try.

Lustre-commit: 3f0a7241c434d9556308299eea069628715816c2
Lustre-change: https://review.whamcloud.com/29032

Signed-off-by: Wang Shilong <wshilong@ddn.com>
Signed-off-by: Bob Glossman <bob.glossman@intel.com>
Change-Id: I234ff3027c8d96155d374c56b12aab7c4dc0dafd
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Gu Zheng <gzheng@ddn.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
Reviewed-on: https://review.whamcloud.com/32295
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: John L. Hammond <john.hammond@intel.com>
ldiskfs/kernel_patches/patches/rhel7/ext4-cleanup-goto-next-group.patch [new file with mode: 0644]
ldiskfs/kernel_patches/patches/rhel7/ext4-reduce-lock-contention-in-__ext4_new_inode.patch [new file with mode: 0644]
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.2.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.3.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.4.series
ldiskfs/kernel_patches/series/ldiskfs-3.10-rhel7.series