Whamcloud - gitweb
LU-1757 brw: add short io osc/ost transfer.
There's no need to do target bulk io for small amount of
data, and it requires extra network operations.
For this case we add short i/o. When the i/o size is less
than or equal to some number of pages (default 3), we
encapsulate the data in the ptlrpc request.
With this patch, 4k direct i/o read latency on a Cray Aries
network (data is on flash on another node on the Aries)
drops from ~280 microseconds to ~200 microseconds. Write
latency drops from ~370 microseconds to ~350 microseconds
(much more of write latency is waiting for write commit).
This translates to about a 25-30% performance improvement
on 4k direct i/o reads and 4k random reads. (Write
performance improvement was small to non-existent.)
Improvement was similar with 8k i/o.
Buffered sequential i/o sees no improvement, because it
does not perform small i/os.
Performance data:
access = file-per-process
pattern = segmented (1 segment)
ordering in a file = random offsets
ordering inter file= no tasks offsets
xfersize = 4096 bytes
blocksize = 100 MiB
nprocs xfsize shortio dio random Read (MB/s)
1 4k no yes no 15.0
8 4k no yes no 73.4
16 4k no yes no 81.1
1 4k yes yes no 16.5
8 4k yes yes no 95.2
16 4k yes yes no 107.3
1 4k no no yes 15.5
8 4k no no yes 73.4
16 4k no no yes 81.2
1 4k yes no yes 16.8
8 4k yes no yes 95.0
16 4k yes no yes 106.5
Note even when individual i/o performance is not improved,
this change reduces the # of network operations required
for small i/o, which can help on large systems.
Signed-off-by: Patrick Farrell <paf@cray.com>
Change-Id: I70050935eaa0a5e98ca437e18e730be4aa0e4700
Reviewed-on: https://review.whamcloud.com/27767
Tested-by: Jenkins
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Alexandr Boyko <c17825@cray.com>
Tested-by: Maloo <hpdd-maloo@intel.com>
Reviewed-by: Oleg Drokin <oleg.drokin@intel.com>
14 files changed: