Whamcloud - gitweb
LU-1757 brw: add short io osc/ost transfer. 67/27767/12
authorPatrick Farrell <paf@cray.com>
Mon, 16 Oct 2017 10:22:22 +0000 (05:22 -0500)
committerOleg Drokin <oleg.drokin@intel.com>
Thu, 9 Nov 2017 20:06:50 +0000 (20:06 +0000)
commit70f092a0587866662735e1a6eaf27701a576370d
tree79a16f07f623e5d5a222f99b6ee908dd3bd1efce
parenta046e879fcadd601c9a19fd906f82ecbd2d4efd5
LU-1757 brw: add short io osc/ost transfer.

There's no need to do target bulk io for small amount of
data, and it requires extra network operations.

For this case we add short i/o.  When the i/o size is less
than or equal to some number of pages (default 3), we
encapsulate the data in the ptlrpc request.

With this patch, 4k direct i/o read latency on a Cray Aries
network (data is on flash on another node on the Aries)
drops from ~280 microseconds to ~200 microseconds.  Write
latency drops from ~370 microseconds to ~350 microseconds
(much more of write latency is waiting for write commit).

This translates to about a 25-30% performance improvement
on 4k direct i/o reads and 4k random reads.  (Write
performance improvement was small to non-existent.)

Improvement was similar with 8k i/o.

Buffered sequential i/o sees no improvement, because it
does not perform small i/os.

Performance data:
        access             = file-per-process
        pattern            = segmented (1 segment)
        ordering in a file = random offsets
        ordering inter file= no tasks offsets
        xfersize           = 4096 bytes
        blocksize          = 100 MiB

nprocs  xfsize  shortio dio     random  Read (MB/s)
1       4k      no      yes     no      15.0
8       4k      no      yes     no      73.4
16      4k      no      yes     no      81.1
1       4k      yes     yes     no      16.5
8       4k      yes     yes     no      95.2
16      4k      yes     yes     no      107.3
1       4k      no      no      yes     15.5
8       4k      no      no      yes     73.4
16      4k      no      no      yes     81.2
1       4k      yes     no      yes     16.8
8       4k      yes     no      yes     95.0
16      4k      yes     no      yes     106.5

Note even when individual i/o performance is not improved,
this change reduces the # of network operations required
for small i/o, which can help on large systems.

Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I70050935eaa0a5e98ca437e18e730be4aa0e4700
Reviewed-on: https://review.whamcloud.com/27767
Tested-by: Jenkins
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
14 files changed:
lustre/include/lprocfs_status.h
lustre/include/lustre_export.h
lustre/include/lustre_net.h
lustre/include/lustre_osc.h
lustre/include/lustre_req_layout.h
lustre/include/obd.h
lustre/ldlm/ldlm_lib.c
lustre/llite/llite_lib.c
lustre/obdclass/lprocfs_status.c
lustre/osc/lproc_osc.c
lustre/osc/osc_page.c
lustre/osc/osc_request.c
lustre/ptlrpc/layout.c
lustre/target/tgt_handler.c