Whamcloud - gitweb
LU-17700 lnet: properly calculate ping buffer size 73/54673/5
authorJames Simmons <jsimmons@infradead.org>
Fri, 5 Apr 2024 00:01:32 +0000 (20:01 -0400)
committerOleg Drokin <green@whamcloud.com>
Mon, 15 Apr 2024 16:51:37 +0000 (16:51 +0000)
Originally for lnet_ping() we allocated the ping buffer size by
using lnet_ping_sts_size(). The limitation to that approach is
that if the nid passed into lnet_ping_sts_size() is a smaller
NID like IPv4 the buffer could be too small. Say n_ids is 4
and 3 returned NIDs are IPv4 but one is IPv6 then it can overflow.
The solution is allocate maximum possible NID size. That can be
done with LNET_ANY_NID which fills in all the fields. For
lnet_ping_sts_size() we have to properly handle the size when
using LNET_ANY_NID. If struct lnet_nid ever increasing in the
future this code should still work.

Also cap the maximum size of the ping buffer to avoid o2iblnd
failures from using RDMA which sends data that doesn't support
large NIDs.

Fixes: d137e9823ca ("LU-10003 lnet: use Netlink to support LNet ping commands")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I5b61add2b3701cad12074515f45773bbc9fbc583
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54673
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/include/lnet/lib-types.h
lnet/lnet/api-ni.c

index a02eab2..fadaead 100644 (file)
@@ -1225,6 +1225,10 @@ lnet_ping_sts_size(const struct lnet_nid *nid)
 {
        int size;
 
+       /* for deciding the size of the ping buffer */
+       if (unlikely(LNET_NID_IS_ANY(nid)))
+               return sizeof(struct lnet_ni_large_status);
+
        if (nid_is_nid4(nid))
                return sizeof(struct lnet_ni_status);
 
index 9cfe617..973b615 100644 (file)
@@ -9054,6 +9054,9 @@ lnet_ping_event_handler(struct lnet_event *event)
                complete(&pd->completion);
 }
 
+/* Max buffer we allow to be sent. Larger values will cause IB failures */
+#define LNET_PING_BUFFER_MAX   3960
+
 static int lnet_ping(struct lnet_processid *id, struct lnet_nid *src_nid,
                     signed long timeout, struct lnet_genl_ping_list *plist,
                     int n_ids)
@@ -9085,7 +9088,11 @@ static int lnet_ping(struct lnet_processid *id, struct lnet_nid *src_nid,
        if (id->pid == LNET_PID_ANY)
                id->pid = LNET_PID_LUSTRE;
 
-       id_bytes += n_ids * sizeof(struct lnet_nid);
+       /* Allocate maximum possible NID size */
+       id_bytes += lnet_ping_sts_size(&LNET_ANY_NID) * n_ids;
+       if (id_bytes > LNET_PING_BUFFER_MAX)
+               id_bytes = LNET_PING_BUFFER_MAX;
+
        pbuf = lnet_ping_buffer_alloc(id_bytes, GFP_NOFS);
        if (!pbuf)
                return -ENOMEM;