From: Serguei Smirnov Date: Mon, 16 Aug 2021 23:37:30 +0000 (-0700) Subject: LU-14945 lnet: don't use hops to determine the route state X-Git-Tag: 2.14.55~15 X-Git-Url: https://git.whamcloud.com/?p=fs%2Flustre-release.git;a=commitdiff_plain;h=3f2844dc9333c86452c37bd7b4519729b1351371 LU-14945 lnet: don't use hops to determine the route state NodeA <-tcp1-> GW1 <-tcp2-> GW2 <-tcp3-> NodeB Assuming GW1 knows how to reach tcp3 network and GW2 knows how to reach tcp1 network, it should be possible to add routes without specifying hop=2 on nodes A and B to reach tcp3 and tcp1 respectively and then be able to lnetctl ping between them. Changes introduced by LU-13785 interpret default hops to be equivalent to hop=1 set explicitly for the purpose of determining route aliveness, which results in the routes created as described above to be considered "down". Fix it so that default hop setting doesn't prevent the multi-hop scenario from working. Test-Parameters: trivial Fixes: 2e07619477 ("LU-13785 lnet: Use lr_hops for avoid_asym_router_failure") Signed-off-by: Serguei Smirnov Change-Id: I341ccdfe156434b0cb306359acc91a9193b44f7b Reviewed-on: https://review.whamcloud.com/44674 Tested-by: jenkins Tested-by: Maloo Reviewed-by: Amir Shehata Reviewed-by: Chris Horn Reviewed-by: Oleg Drokin --- diff --git a/lnet/lnet/router.c b/lnet/lnet/router.c index c297959..f46627e 100644 --- a/lnet/lnet/router.c +++ b/lnet/lnet/router.c @@ -329,7 +329,7 @@ bool lnet_is_route_alive(struct lnet_route *route) * routes the next-hop will not have the remote net. */ if (avoid_asym_router_failure && - (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) { + (route->lr_hops == 1 || route->lr_single_hop)) { rlpn = lnet_peer_get_net_locked(gw, route->lr_net); if (!rlpn) return false; @@ -483,8 +483,7 @@ lnet_router_discovery_ping_reply(struct lnet_peer *lp) route->lr_single_hop = single_hop; if (avoid_asym_router_failure && - (route->lr_hops == 1 || - route->lr_hops == LNET_UNDEFINED_HOPS)) + (route->lr_hops == 1 || route->lr_single_hop)) lnet_set_route_aliveness(route, net_up); else lnet_set_route_aliveness(route, true); @@ -790,6 +789,14 @@ lnet_add_route(__u32 net, __u32 hops, struct lnet_nid *gateway, lnet_peer_ni_decref_locked(lpni); lnet_net_unlock(LNET_LOCK_EX); + /* If avoid_asym_router_failure is enabled and hop count is not + * set to 1 for a route that is actually single-hop, then the + * feature will fail to prevent the router from being selected + * if it is missing a NI on the remote network due to misconfiguration. + */ + if (avoid_asym_router_failure && hops == LNET_UNDEFINED_HOPS) + CWARN("Use hops = 1 for a single-hop route when avoid_asym_router_failure feature is enabled"); + rc = 0; if (!add_route) {