git://git.whamcloud.com - fs/lustre-release.git/commit

LU-1669 vvp: Use lockless __generic_file_aio_write

Testing multi-threaded single shard file write performance has shown
the inode mutex to be a limiting factor when using the
generic_file_aio_write function. To work around this bottle neck, this
change replaces the locked version of that call with the lock less
version, specifically, __generic_file_aio_write.

In order to maintain posix consistency, Lustre must now employ it's
own locking mechanism in the higher layers. Currently writes are
protected using the lli_write_mutex in the ll_inode_info structure.
To protect against simultaneous write and truncate operations, since
we no longer take the inode mutex during writes, we must down the
lli_trunc_sem semaphore.

Unfortunately, this change by itself does not garner any performance
benefits. Using FIO on a single machine with 32 GB of RAM, write
performance tests were ran with and without this change applied; the
results are below:

    +---------+-----------+---------+--------+--------+
    |     fio v2.0.13     |   Write Bandwidth (KB/s)  |
    +---------+-----------+---------+--------+--------+
    | # Tasks | GB / Task | Test 1  | Test 2 | Test 3 |
    +---------+-----------+---------+--------+--------+
    |    1    |    64     |  452446 | 454623 | 457653 |
    |    2    |    32     |  850318 | 565373 | 602498 |
    |    4    |    16     | 1058900 | 463546 | 529107 |
    |    8    |     8     | 1026300 | 468190 | 576451 |
    |   16    |     4     | 1065500 | 503160 | 462902 |
    |   32    |     2     | 1068600 | 462228 | 466963 |
    |   64    |     1     |  991830 | 556618 | 557863 |
    +---------+-----------+---------+--------+--------+

* Test 1: Lustre client running 04ec54f. File per process write
           workload. This test was used as a baseline for what we
           _could_ achieve in the single shared file tests if the
           bottle necks were removed.

* Test 2: Lustre client running 04ec54f. Single shared file
           workload, each task writing to a unique region.

* Test 3: Lustre client running 04ec54f + this patch. Single shared
           file workload, each task writing to a unique region.

In order to garner any real performance benefits out of a single
shared file workload, the lli_write_mutex needs to be broken up into a
range lock. That would allow write operations to unique regions of a
file to be executed concurrently. This work is left to be done in a
follow up patch.

Signed-off-by: Prakash Surya <surya1@llnl.gov>
Change-Id: I0023132b5d941b3304f39f015f95106542998072
Reviewed-on: http://review.whamcloud.com/6672
Tested-by: Jenkins
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Lai Siyao <lai.siyao@intel.com>
Reviewed-by: Andreas Dilger <andreas.dilger@intel.com>
Reviewed-by: Jinshan Xiong <jinshan.xiong@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>

author	Prakash Surya <surya1@llnl.gov>
	Thu, 3 Oct 2013 00:16:51 +0000 (17:16 -0700)
committer	Oleg Drokin <oleg.drokin@intel.com>
	Fri, 12 Sep 2014 17:38:37 +0000 (17:38 +0000)
commit	ed5ebb87bfc2b684958daac90c4369f395482a16
tree	34bd9f2d0fd63718eea88855139c2e8eb2e92b7a	tree \| snapshot
parent	10d07da8eeaa50d8b97751f99797705c73b211fb	commit \| diff

lustre/llite/rw26.c		diff \| blob \| history
lustre/llite/vvp_io.c		diff \| blob \| history
lustre/llite/vvp_page.c		diff \| blob \| history
lustre/obdclass/cl_page.c		diff \| blob \| history