Whamcloud - gitweb
EX-773 lnet: add LNet GPU Direct Support
This patch exports registration/unregistration functions
which are called by the NVFS module to let the LND know
that it can call into the NVFS module to do RDMA mapping
of GPU shadow pages.
GPU priority is considered during NI selection.
Less than 4K writes are always RDMAed if the rdma source is
the gpu device
The dma mapping function provided by the GPU Direct driver
returns < 0 on failure, which is not in keeping with the kernel
provided mapping function, which returns 0 on failure.
The code changed slightly to handle the non-standard return code.
Also properly handle mapping error in the standard code path.
If the ib_dma_map_sg() returns 0, then there is no need
to go through the rest of the rd processing, just return an
error
When RDMA mapping failure occurs mark the failure with a
unique errno, EHWPOISON. Record that error in the message
event. When the message is finalized and the event is
propagated to the ptlrpc layer, if the mapping error has
occurred then flag the request not to be resent. This is
to avoid cases when Lustre enters into an RPC resend loop
without a way to terminate the loop.
RDMA mapping errors are assumed to be fatal and therefore
there is no point in retrying the request on the same memory
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I2bfdbdd5fe3b8536e616ab442d18deace6756d57
Reviewed-on: https://review.whamcloud.com/37368
Reviewed-by: Wang Shilong <wshilong@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/42001
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>