Whamcloud - gitweb
LU-13035 lnet: fix remote peer ni selection 15/36915/7
authorAmir Shehata <ashehata@whamcloud.com>
Fri, 29 Nov 2019 05:22:30 +0000 (21:22 -0800)
committerOleg Drokin <green@whamcloud.com>
Tue, 24 Mar 2020 05:17:02 +0000 (05:17 +0000)
When selecting the NI of a remote peer we can only round robin
over the NIs. Remote peer NIs have no credits and no health.
However, we'd like to still spread the traffic over the remote
peer NIs as much as we can.

Remote peer NIs sequence number was not being incremented and
therefore the round robin algorithm wasn't working.

Test-parameters: trivial

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I83dba3597987844b7bddc980c53e8c6d90bcd3b2
Reviewed-on: https://review.whamcloud.com/36915
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
lnet/lnet/lib-move.c

index d74cfcc..4c6c382 100644 (file)
@@ -2061,9 +2061,24 @@ lnet_handle_find_routed_path(struct lnet_send_data *sd,
                        return -EHOSTUNREACH;
                }
 
+               /*
+                * We're attempting to round robin over the remote peer
+                * NI's so update the final destination we selected
+                */
+               sd->sd_final_dst_lpni = sd->sd_best_lpni;
+
+               /*
+                * find the best route. Restrict the selection on the net of the
+                * local NI if we've already picked the local NI to send from.
+                * Otherwise, let's pick any route we can find and then find
+                * a local NI we can reach the route's gateway on. Any route we select
+                * will be reachable by virtue of the restriction we have when
+                * adding a route.
+                */
                best_route = lnet_find_route_locked(best_rnet,
                                                    LNET_NIDNET(src_nid),
                                                    &last_route, &gwni);
+
                if (!best_route) {
                        CERROR("no route to %s from %s\n",
                               libcfs_nid2str(dst_nid),
@@ -2082,6 +2097,12 @@ lnet_handle_find_routed_path(struct lnet_send_data *sd,
                LASSERT(gw == gwni->lpni_peer_net->lpn_peer);
                local_lnet = best_route->lr_lnet;
 
+               /*
+                * Increment the sequence number of the remote lpni so we
+                * can round robin over the different interfaces of the
+                * remote lpni
+                */
+               sd->sd_best_lpni->lpni_seq++;
        }
 
        /*