<screen><expansion> :== "[" <entry> { "," <entry> } "]"
<entry> :== <numeric range> | <non-numeric item>
<numeric range> :== <number> [ "-" <number> [ "/" <number> ] ]</screen>
- <para>The expansion is a list enclosed in square brackets. Numeric items in the list may be a single number, a contiguous range of numbers, or a strided range of numbers. For example, <literal>routes="elan 192.168.1.[22-24]@tcp"</literal> says that network <literal>elan0</literal> is adjacent (hopcount defaults to 1); and is accessible via 3 routers on the <literal>tcp0</literal> network (<literal>192.168.1.22@tcp</literal>, <literal>192.168.1.23@tcp</literal> and <literal>192.168.1.24@tcp</literal>).</para>
+ <para>The expansion is a list enclosed in square brackets. Numeric
+ items in the list may be a single number, a contiguous range of numbers,
+ or a strided range of numbers. For example, <literal>routes="elan
+ 192.168.1.[22-24]@tcp"</literal> says that network
+ <literal>elan0</literal> may be adjacent or behind another network
+ (hopcount is undefined); and is accessible via 3 routers on the
+ <literal>tcp0</literal> network (<literal>192.168.1.22@tcp</literal>,
+ <literal>192.168.1.23@tcp</literal> and
+ <literal>192.168.1.24@tcp</literal>).</para>
<para><literal>routes="[tcp,o2ib] 2 [8-14/2]@elan"</literal>
says that 2 networks (<literal>tcp0</literal> and <literal>o2ib0</literal>) are accessible through 4 routers (<literal>8@elan</literal>, <literal>10@elan</literal>, <literal>12@elan</literal> and <literal>14@elan</literal>). The hopcount of 2 means that traffic to both these networks will be traversed 2 routers - first one of the routers specified in this entry, then one more.</para>
<para>Duplicate entries, entries that route to a local network, and entries that specify routers on a non-local network are ignored.</para>
- <para>Prior to release 2.5, a conflict between equivalent entries was resolved in favor of the route with the shorter hopcount. The hopcount, if omitted, defaults to 1 (the remote network is adjacent)..</para>
- <para condition='l25'>Since 2.5, equivalent entries are resolved in favor of the route with the lowest priority number or shorter hopcount if the priorities are equal. The priority, if omitted, defaults to 0. The hopcount, if omitted, defaults to 1 (the remote network is adjacent).</para>
- <para>It is an error to specify routes to the same destination with routers on different local networks.</para>
- <para>If the target network string contains no expansions, then the hopcount defaults to 1 and may be omitted (that is, the remote network is adjacent). In practice, this is true for most multi-network configurations. It is an error to specify an inconsistent hop count for a given target network. This is why an explicit hopcount is required if the target network string specifies more than one network.</para>
+ <para>Prior to release 2.5, a conflict between equivalent entries was
+ resolved in favor of the route with the shorter hopcount. The hopcount,
+ if omitted, is undefined, but is treated as 1 when being compared to
+ other routes during selection (as if the remote network is adjacent).
+ </para>
+ <para condition='l25'>Since 2.5, equivalent entries are resolved in
+ favor of the route with the lowest priority number or shorter hopcount
+ if the priorities are equal. The priority, if omitted, defaults to 0.
+ The hopcount, if omitted, is undefined, but is treated as 1 when being
+ compared to other routes during selection (as if the remote network is
+ adjacent).</para>
+ <para>It is an error to specify routes to the same destination with
+ routers on different local networks.</para>
+ <para>If a route string contains no hop count, then the hop count is
+ undefined. Explicitly setting the hop count to 1 is recommended if the
+ remote network is adjacent and
+ <literal>avoid_asym_router_failure</literal> is enabled
+ to ensure proper operation of the feature.</para>
</section>
<section remap="h4">
<title><indexterm><primary>configuring</primary>
networks. The peers using the gateway can reach it on one or
more of its interfaces. Multi-Rail routing takes care of managing which
interface to use.</para>
- <screen>lnetctl route add --net <remote network> --gateway <NID for the gateway>
- --hops <number of hops> --priority <route priority></screen>
+ <screen>lnetctl route add --net <remote network>
+ --gateway <NID for the gateway>
+ --hop <number of hops> --priority <route priority>
+ </screen>
</section>
<section xml:id="mrrouting.health_config.modparams">
<title>Configuring Module Parameters</title>
</entry>
<entry>
<para>Defaults to <literal>1</literal>. If set to
- <literal>1</literal> a route will be considered up if and only
- if there exists at least one healthy interface on the local and
- remote interfaces of the gateway.</para>
+ <literal>1</literal> single-hop routes have an additional
+ requirement to be considered up. The requirement is that the
+ gateway of the route must have at least one healthy network
+ interface connected directly to the remote net of the route. In
+ this context single-hop routes are routes that are given
+ <literal>hop=1</literal> explicitly when created, or routes for
+ which lnet can infer that they have only one hop.
+ Otherwise the route is not single-hop and this parameter has no
+ effect.</para>
</entry>
</row>
<row>
<orderedlist>
<listitem><para>The gateway can be reached on the local net via at least
one path.</para></listitem>
- <listitem><para>If <literal>avoid_asym_router_failure</literal> is
+ <listitem><para> For a single-hop route, if
+ <literal>avoid_asym_router_failure</literal> is
enabled then the remote network defined in the route must have at least
one healthy interface on the gateway.</para></listitem>
</orderedlist>
</listitem>
<listitem>
<para>
- <literal>avoid_asym_router_failure</literal>– When set to 1, the
- router checker running on the client or a server periodically pings
- all the routers corresponding to the NIDs identified in the routes
- parameter setting on the node to determine the status of each router
- interface. The default setting is 1. (For more information about the
- LNet routes parameter, see
+ <literal>avoid_asym_router_failure</literal>– When set to 1,
+ this parameter adds the additional requirement that for a route to be
+ considered up the gateway of the route must have at least one NI up on
+ the remote network of the route.
+ This new requirement applies only to routes that are single-hop,
+ which means that either the route's hop value is explicitly set to 1,
+ or that it can be inferred that the route is single-hop. The default
+ setting is 1.</para>
+ <para>The inference of a single-hop routes works as follows:
+ If the router checker is running on a node, the node will periodically
+ ping all of its gateways, which are routers on the same lnet that are
+ listed in the node's routes. The gateways' responses will include the
+ status of all their network interfaces (NIs).
+ If a node <literal>A</literal> has a route <literal>R</literal>
+ through gateway <literal>B</literal> to network <literal>C</literal>,
+ and node <literal>A</literal> sees that router <literal>B</literal>
+ has at least one NI that connects directly to network
+ <literal>C</literal> (in a ping response from <literal>B</literal>),
+ node <literal>A</literal> will infer that <literal>R</literal> is a
+ single-hop route.
+ This aspect of single-hop is independent of the hop value of the
+ route. For more information about the LNet routes parameter, see
<xref xmlns:xlink="http://www.w3.org/1999/xlink"
linkend="lnet_module_routes" /></para>
- <para>A router is considered down if any of its NIDs are down. For
- example, router X has three NIDs:
- <literal>Xnid1</literal>,
- <literal>Xnid2</literal>, and
- <literal>Xnid3</literal>. A client is connected to the router via
- <literal>Xnid1</literal>. The client has router checker enabled. The
- router checker periodically sends a ping to the router via
- <literal>Xnid1</literal>. The router responds to the ping with the
- status of each of its NIDs. In this case, it responds with
- <literal>Xnid1=up</literal>,
- <literal>Xnid2=up</literal>,
- <literal>Xnid3=down</literal>. If
- <literal>avoid_asym_router_failure==1</literal>, the router is
- considered down if any of its NIDs are down, so router X is
- considered down and will not be used for routing messages. If
- <literal>avoid_asym_router_failure==0</literal>, router X will
- continue to be used for routing messages.</para>
+ <para>It is recommended to specify <literal>hop=1</literal> when
+ creating a single-hop
+ route when this feature is enabled. When a route truly has only 1 hop,
+ it is still recommended to explicitly set hop=1 because the single-hop
+ inference mechanism will fail in cases where NIs fail to ever come up
+ at all. This is because for the route inference to work, the NIs must
+ at least come up or the gateway won't even mention them in the ping
+ response, so the node won't see any mention of the remote net of the
+ route in the gateway's NIs, and will mistakenly consider the route to
+ be multi-hop, and <literal>avoid_asym_router_failure</literal>
+ will then have no effect unless hop=1 is set explicitly when the route
+ is created.</para>
+ <para>In the following examples, nodes running lnet are circles,
+ networks are squares,
+ and NIs are lines labeled by their NIDs.
+ There is a client <literal>C</literal> and a router
+ <literal>X</literal>. <literal>C</literal>
+ has routes to networks <literal>o2ib0</literal> and
+ <literal>o2ib1</literal> with <literal>X</literal> as the gateway.
+ If an NI is red
+ with it's name crossed out, <literal>C</literal> considers it
+ to be down, otherwise <literal>C</literal>
+ considers it to be up.</para>
+ <figure xml:id="avoid_asym_router_failure.fig.one_o2ib0_down">
+ <title>One of Two Connections to o2ib0 Down</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" width="45%"
+ fileref="figures/Tuning_one_o2ib0_down.png" />
+ </imageobject>
+ <textobject>
+ <phrase>One of Two Connections to o2ib0 Down</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <para>In the above figure, one of two NIs that connect to
+ <literal>o2ib0</literal> is up,
+ so the route to <literal>o2ib0</literal> is considered up.</para>
+ <figure xml:id="avoid_asym_router_failure.fig.both_o2ib0_down">
+ <title>Both Connections to o2ib0 Down</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" width="45%"
+ fileref="figures/Tuning_both_o2ib0_down.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Both Connections to o2ib0 Down</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <para>In the above figure, zero NIs that connect to
+ <literal>o2ib0</literal> are up,
+ so the route to <literal>o2ib0</literal> is down.</para>
+ <figure xml:id="avoid_asym_router_failure.fig.o2ib1_down">
+ <title>Connection to o2ib1 Down</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" width="45%"
+ fileref="figures/Tuning_o2ib1_down.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Connection to o2ib1 Down</phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <para>In the above figure, zero NIs that connect to
+ <literal>o2ib1</literal> are up,
+ so the route to <literal>o2ib1</literal> is down.</para>
+ <figure xml:id="avoid_asym_router_failure.fig.o2ib1_missing">
+ <title>Connection to o2ib1 Never Came Up</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata scalefit="1" width="65%"
+ fileref="figures/Tuning_o2ib1_missing.png" />
+ </imageobject>
+ <textobject>
+ <phrase>Connection to o2ib1 Never Came Up </phrase>
+ </textobject>
+ </mediaobject>
+ </figure>
+ <para>Compare Figures 34.3 and 34.4. In 34.4,
+ <literal>X4@o2ib1</literal> never came up
+ (rather than coming up and then going down). Consequently,
+ <literal>X</literal> did not list <literal>X4@o2ib1</literal> in its
+ ping response, so <literal>C</literal> cannot infer that
+ <literal>X</literal> should be directly connected to
+ <literal>o2ib1</literal>. If <literal>C</literal> has a route to
+ <literal>o2ib1</literal> through <literal>X</literal>, and the hop
+ count is not set to 1 by the sysadmin, LNet assumes that
+ <literal>X</literal> has a route to <literal>o2ib1</literal> through
+ some remote router node, such as <literal>Y</literal>. The gray part of
+ Figure 34.4 shows the sort of configuration that LNet incorrectly
+ assumes in this situation. Therefore, <literal>C</literal> will try
+ to send messages for <literal>o2ib1</literal> through
+ <literal>X</literal>, where they will be dropped.
+ If the sysadmin explicitly sets <literal>hop=1</literal> for the route
+ to <literal>o2ib1</literal> (on <literal>C</literal>), LNet will know
+ that if <literal>X</literal> does not report an NI on
+ <literal>o2ib1</literal>, that the route should be marked as down.
+ </para>
</listitem>
</itemizedlist></para>
<para>The following router checker parameters must be set to the maximum