Whamcloud - gitweb
LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs 40/46540/9
authorQian Yingjin <qian@ddn.com>
Thu, 17 Feb 2022 03:42:14 +0000 (22:42 -0500)
committerOleg Drokin <green@whamcloud.com>
Mon, 1 May 2023 04:08:07 +0000 (04:08 +0000)
commit668f48f87bec3999892ce1daad24b6dba9ae362b
treecdb5cd7ddfb4f58b6a0434aaf68b3cf1049096aa
parent7278b74abf2eb2233262973cea07b90c8c98537f
LU-15550 ptlrpc: retry mechanism for overflowed batched RPCs

Before send the batched RPC, the client has no idea about the
actual reply buffer size. The reply buffer size prepared by a
client may be smalller than the reply buffer buffer size in need.
We already have the patch to grow the reply buffer properly in
most cases.

However, when the reply buffer size is growing larger than
BUT_MAXREPSIZE (1000 * 1024), the server will return -EOVERFLOW
error code. At this time, the server only executed the partial
sub requests in the batched RPC. The overflowed sub requests are
not handled.

In this patch, it adds a retry mechanism for overflowed batched
RPC. When found that the reply buffer overflowed, the client will
rebuild the batched RPC for the unhandled sub requests, and use
work queue mechanism to resend the new batched RPC to the server
to re-execute then again.

Add the test case sanity test_123f to verify it for large LOV
stripes with overstriping.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If84fad32f2026bd34ffb47b3e163f84a9d950dbb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lustre/ptlrpc/batch.c
lustre/tests/sanity.sh