1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="lnetmr" condition='l2A'>
5 <title xml:id="lnetmr.title">LNet Software Multi-Rail</title>
6 <para>This chapter describes LNet Software Multi-Rail configuration and
10 <para><xref linkend="dbdoclet.mroverview"/></para>
11 <para><xref linkend="dbdoclet.mrconfiguring"/></para>
12 <para><xref linkend="dbdoclet.mrrouting"/></para>
13 <para><xref linkend="mrrouting.health"/></para>
14 <para><xref linkend="dbdoclet.mrhealth"/></para>
17 <section xml:id="dbdoclet.mroverview">
18 <title><indexterm><primary>MR</primary><secondary>overview</secondary>
19 </indexterm>Multi-Rail Overview</title>
20 <para>In computer networking, multi-rail is an arrangement in which two or
21 more network interfaces to a single network on a computer node are employed,
22 to achieve increased throughput. Multi-rail can also be where a node has
23 one or more interfaces to multiple, even different kinds of networks, such
24 as Ethernet, Infiniband, and Intel® Omni-Path. For Lustre clients,
25 multi-rail generally presents the combined network capabilities as a single
26 LNet network. Peer nodes that are multi-rail capable are established during
27 configuration, as are user-defined interface-section policies.</para>
28 <para>The following link contains a detailed high-level design for the
30 <link xl:href="http://wiki.lustre.org/images/b/bb/Multi-Rail_High-Level_Design_20150119.pdf">
31 Multi-Rail High-Level Design</link></para>
33 <section xml:id="dbdoclet.mrconfiguring">
34 <title><indexterm><primary>MR</primary><secondary>configuring</secondary>
35 </indexterm>Configuring Multi-Rail</title>
36 <para>Every node using multi-rail networking needs to be properly
37 configured. Multi-rail uses <literal>lnetctl</literal> and the LNet
38 Configuration Library for configuration. Configuring multi-rail for a
39 given node involves two tasks:</para>
41 <listitem><para>Configuring multiple network interfaces present on the
42 local node.</para></listitem>
43 <listitem><para>Adding remote peers that are multi-rail capable (are
44 connected to one or more common networks with at least two interfaces).
47 <para>This section is a supplement to
48 <xref linkend="lnet_config.lnetaddshowdelete" /> and contains further
49 examples for Multi-Rail configurations.</para>
50 <para>For information on the dynamic peer discovery feature added in
51 Lustre Release 2.11.0, see
52 <xref linkend="lnet_config.dynamic_discovery" />.</para>
53 <section xml:id="dbdoclet.addinterfaces">
54 <title><indexterm><primary>MR</primary>
55 <secondary>multipleinterfaces</secondary>
56 </indexterm>Configure Multiple Interfaces on the Local Node</title>
57 <para>Example <literal>lnetctl add</literal> command with multiple
58 interfaces in a Multi-Rail configuration:</para>
59 <screen>lnetctl net add --net tcp --if eth0,eth1</screen>
60 <para>Example of YAML net show:</para>
61 <screen>lnetctl net show -v
74 peer_buffer_credits: 0
82 - nid: 192.168.122.10@tcp
93 peer_buffer_credits: 0
99 - nid: 192.168.122.11@tcp
110 peer_buffer_credits: 0
117 <section xml:id="dbdoclet.deleteinterfaces">
118 <title><indexterm><primary>MR</primary>
119 <secondary>deleteinterfaces</secondary>
120 </indexterm>Deleting Network Interfaces</title>
121 <para>Example delete with <literal>lnetctl net del</literal>:</para>
122 <para>Assuming the network configuration is as shown above with the
123 <literal>lnetctl net show -v</literal> in the previous section, we can
124 delete a net with following command:</para>
125 <screen>lnetctl net del --net tcp --if eth0</screen>
126 <para>The resultant net information would look like:</para>
127 <screen>lnetctl net show -v
140 peer_buffer_credits: 0
145 CPT: "[0,1,2,3]"</screen>
146 <para>The syntax of a YAML file to perform a delete would be:</para>
147 <screen>- net type: tcp
149 - nid: 192.168.122.10@tcp
153 <section xml:id="dbdoclet.addremotepeers">
154 <title><indexterm><primary>MR</primary>
155 <secondary>addremotepeers</secondary>
156 </indexterm>Adding Remote Peers that are Multi-Rail Capable</title>
157 <para>The following example <literal>lnetctl peer add</literal>
158 command adds a peer with 2 nids, with
159 <literal>192.168.122.30@tcp</literal> being the primary nid:</para>
160 <screen>lnetctl peer add --prim_nid 192.168.122.30@tcp --nid 192.168.122.30@tcp,192.168.122.31@tcp
162 <para>The resulting <literal>lnetctl peer show</literal> would be:
163 <screen>lnetctl peer show -v
165 - primary nid: 192.168.122.30@tcp
168 - nid: 192.168.122.30@tcp
171 available_tx_credits: 8
174 available_rtr_credits: 8
181 - nid: 192.168.122.31@tcp
184 available_tx_credits: 8
187 available_rtr_credits: 8
193 drop_count: 0</screen>
195 <para>The following is an example YAML file for adding a peer:</para>
198 - primary nid: 192.168.122.30@tcp
201 - nid: 192.168.122.31@tcp</screen>
203 <section xml:id="dbdoclet.deleteremotepeers">
204 <title><indexterm><primary>MR</primary>
205 <secondary>deleteremotepeers</secondary>
206 </indexterm>Deleting Remote Peers</title>
207 <para>Example of deleting a single nid of a peer (192.168.122.31@tcp):
209 <screen>lnetctl peer del --prim_nid 192.168.122.30@tcp --nid 192.168.122.31@tcp</screen>
210 <para>Example of deleting the entire peer:</para>
211 <screen>lnetctl peer del --prim_nid 192.168.122.30@tcp</screen>
212 <para>Example of deleting a peer via YAML:</para>
213 <screen>Assuming the following peer configuration:
215 - primary nid: 192.168.122.30@tcp
218 - nid: 192.168.122.30@tcp
220 - nid: 192.168.122.31@tcp
222 - nid: 192.168.122.32@tcp
225 You can delete 192.168.122.32@tcp as follows:
229 - primary nid: 192.168.122.30@tcp
232 - nid: 192.168.122.32@tcp
234 % lnetctl import --del < delPeer.yaml</screen>
237 <section xml:id="dbdoclet.mrrouting">
238 <title><indexterm><primary>MR</primary>
239 <secondary>mrrouting</secondary>
240 </indexterm>Notes on routing with Multi-Rail</title>
241 <para>This section details how to configure Multi-Rail with the routing
242 feature before the <xref linkend="mrrouting.health" /> feature landed in
243 Lustre 2.13. Routing code has always monitored the state of the route, in
244 order to avoid using unavailable ones.</para>
245 <para>This section describes how you can configure multiple interfaces on
246 the same gateway node but as different routes. This uses the existing route
247 monitoring algorithm to guard against interfaces going down. With the
248 <xref linkend="mrrouting.health" /> feature introduced in Lustre 2.13, the
249 new algorithm uses the <xref linkend="dbdoclet.mrhealth" /> feature to
250 monitor the different interfaces of the gateway and always ensures that the
251 healthiest interface is used. Therefore, the configuration described in this
252 section applies to releases prior to Lustre 2.13. It will still work in
253 2.13 as well, however it is not required due to the reason mentioned above.
255 <section xml:id="dbdoclet.mrroutingex">
256 <title><indexterm><primary>MR</primary>
257 <secondary>mrrouting</secondary>
258 <tertiary>routingex</tertiary>
259 </indexterm>Multi-Rail Cluster Example</title>
260 <para>The below example outlines a simple system where all the Lustre
261 nodes are MR capable. Each node in the cluster has two interfaces.</para>
262 <figure xml:id="lnetmultirail.fig.routingdiagram">
263 <title>Routing Configuration with Multi-Rail</title>
266 <imagedata scalefit="1" width="100%"
267 fileref="./figures/MR_RoutingConfig.png" />
270 <phrase>Routing Configuration with Multi-Rail</phrase>
274 <para>The routers can aggregate the interfaces on each side of the network
275 by configuring them on the appropriate network.</para>
276 <para>An example configuration:</para>
278 lnetctl net add --net o2ib0 --if ib0,ib1
279 lnetctl net add --net o2ib1 --if ib2,ib3
280 lnetctl peer add --nid <peer1-nidA>@o2ib,<peer1-nidB>@o2ib,...
281 lnetctl peer add --nid <peer2-nidA>@o2ib1,<peer2-nidB>>@o2ib1,...
282 lnetctl set routing 1
285 lnetctl net add --net o2ib0 --if ib0,ib1
286 lnetctl route add --net o2ib1 --gateway <rtrX-nidA>@o2ib
287 lnetctl peer add --nid <rtrX-nidA>@o2ib,<rtrX-nidB>@o2ib
290 lnetctl net add --net o2ib1 --if ib0,ib1
291 lnetctl route add --net o2ib0 --gateway <rtrX-nidA>@o2ib1
292 lnetctl peer add --nid <rtrX-nidA>@o2ib1,<rtrX-nidB>@o2ib1</screen>
293 <para>In the above configuration the clients and the servers are
294 configured with only one route entry per router. This works because the
295 routers are MR capable. By adding the routers as peers with multiple
296 interfaces to the clients and the servers, when sending to the router the
297 MR algorithm will ensure that bot interfaces of the routers are used.
299 <para>However, as of the Lustre 2.10 release LNet Resiliency is still
300 under development and single interface failure will still cause the entire
301 router to go down.</para>
303 <section xml:id="dbdoclet.mrroutingresiliency">
304 <title><indexterm><primary>MR</primary>
305 <secondary>mrrouting</secondary>
306 <tertiary>routingresiliency</tertiary>
307 </indexterm>Utilizing Router Resiliency</title>
308 <para>Currently, LNet provides a mechanism to monitor each route entry.
309 LNet pings each gateway identified in the route entry on regular,
310 configurable interval to ensure that it is alive. If sending over a
311 specific route fails or if the router pinger determines that the gateway
312 is down, then the route is marked as down and is not used. It is
313 subsequently pinged on regular, configurable intervals to determine when
314 it becomes alive again.</para>
315 <para>This mechanism can be combined with the MR feature in Lustre 2.10 to
316 add this router resiliency feature to the configuration.</para>
318 lnetctl net add --net o2ib0 --if ib0,ib1
319 lnetctl net add --net o2ib1 --if ib2,ib3
320 lnetctl peer add --nid <peer1-nidA>@o2ib,<peer1-nidB>@o2ib,...
321 lnetctl peer add --nid <peer2-nidA>@o2ib1,<peer2-nidB>@o2ib1,...
322 lnetctl set routing 1
325 lnetctl net add --net o2ib0 --if ib0,ib1
326 lnetctl route add --net o2ib1 --gateway <rtrX-nidA>@o2ib
327 lnetctl route add --net o2ib1 --gateway <rtrX-nidB>@o2ib
330 lnetctl net add --net o2ib1 --if ib0,ib1
331 lnetctl route add --net o2ib0 --gateway <rtrX-nidA>@o2ib1
332 lnetctl route add --net o2ib0 --gateway <rtrX-nidB>@o2ib1</screen>
333 <para>There are a few things to note in the above configuration:</para>
336 <para>The clients and the servers are now configured with two
337 routes, each route's gateway is one of the interfaces of the
338 route. The clients and servers will view each interface of the
339 same router as a separate gateway and will monitor them as
340 described above.</para>
343 <para>The clients and the servers are not configured to view the
344 routers as MR capable. This is important because we want to deal
345 with each interface as a separate peers and not different
346 interfaces of the same peer.</para>
349 <para>The routers are configured to view the peers as MR capable.
350 This is an oddity in the configuration, but is currently required
351 in order to allow the routers to load balance the traffic load
352 across its interfaces evenly.</para>
356 <section xml:id="dbdoclet.mrroutingmixed">
357 <title><indexterm><primary>MR</primary>
358 <secondary>mrrouting</secondary>
359 <tertiary>routingmixed</tertiary>
360 </indexterm>Mixed Multi-Rail/Non-Multi-Rail Cluster</title>
361 <para>The above principles can be applied to mixed MR/Non-MR cluster.
362 For example, the same configuration shown above can be applied if the
363 clients and the servers are non-MR while the routers are MR capable.
364 This appears to be a common cluster upgrade scenario.</para>
367 <section xml:id="mrrouting.health" condition="l2D">
368 <title><indexterm><primary>MR</primary>
369 <secondary>mrroutinghealth</secondary>
370 </indexterm>Multi-Rail Routing with LNet Health</title>
371 <para>This section details how routing and pertinent module parameters can
372 be configured beginning with Lustre 2.13.</para>
373 <para>Multi-Rail with Dynamic Discovery allows LNet to discover and use all
374 configured interfaces of a node. It references a node via it's primary NID.
375 Multi-Rail routing carries forward this concept to the routing
376 infrastructure. The following changes are brought in with the Lustre 2.13
379 <listitem><para>Configuring a different route per gateway interface is no
380 longer needed. One route per gateway should be configured. Gateway
381 interfaces are used according to the Multi-Rail selection criteria.</para>
383 <listitem><para>Routing now relies on <xref linkend="dbdoclet.mrhealth" />
384 to keep track of the route aliveness.</para></listitem>
385 <listitem><para>Router interfaces are monitored via LNet Health.
386 If an interface fails other interfaces will be used.</para></listitem>
387 <listitem><para>Routing uses LNet discovery to discover gateways on
388 regular intervals.</para></listitem>
389 <listitem><para>A gateway pushes its list of interfaces upon the discovery
390 of any changes in its interfaces' state.</para></listitem>
392 <section xml:id="mrrouting.health_config">
393 <title><indexterm><primary>MR</primary>
394 <secondary>mrrouting</secondary>
395 <tertiary>routinghealth_config</tertiary>
396 </indexterm>Configuration</title>
397 <section xml:id="mrrouting.health_config.routes">
398 <title>Configuring Routes</title>
399 <para>A gateway can have multiple interfaces on the same or different
400 networks. The peers using the gateway can reach it on one or
401 more of its interfaces. Multi-Rail routing takes care of managing which
402 interface to use.</para>
403 <screen>lnetctl route add --net <remote network> --gateway <NID for the gateway>
404 --hops <number of hops> --priority <route priority></screen>
406 <section xml:id="mrrouting.health_config.modparams">
407 <title>Configuring Module Parameters</title>
408 <table frame="all" xml:id="mrrouting.health_config.tab1">
409 <title>Configuring Module Parameters</title>
411 <colspec colname="c1" colwidth="1*" />
412 <colspec colname="c2" colwidth="2*" />
417 <emphasis role="bold">Module Parameter</emphasis>
422 <emphasis role="bold">Usage</emphasis>
430 <para><literal>check_routers_before_use</literal></para>
433 <para>Defaults to <literal>0</literal>. If set to
434 <literal>1</literal> all routers must be up before the system
440 <para><literal>avoid_asym_router_failure</literal></para>
443 <para>Defaults to <literal>1</literal>. If set to
444 <literal>1</literal> a route will be considered up if and only
445 if there exists at least one healthy interface on the local and
446 remote interfaces of the gateway.</para>
451 <para><literal>alive_router_check_interval</literal></para>
454 <para>Defaults to <literal>60</literal> seconds. The gateways
455 will be discovered ever
456 <literal>alive_router_check_interval</literal>. If the gateway
457 can be reached on multiple networks, the interval per network is
458 <literal>alive_router_check_interval</literal> / number of
464 <para><literal>router_ping_timeout</literal></para>
467 <para>Defaults to <literal>50</literal> seconds. A gateway sets
468 its interface down if it has not received any traffic for
469 <literal>router_ping_timeout + alive_router_check_interval
476 <para><literal>router_sensitivity_percentage</literal></para>
479 <para>Defaults to <literal>100</literal>. This parameter defines
480 how sensitive a gateway interface is to failure. If set to 100
481 then any gateway interface failure will contribute to all routes
482 using it going down. The lower the value the more tolerant to
483 failures the system becomes.</para>
491 <section xml:id="mrrouting.health_routerhealth">
492 <title><indexterm><primary>MR</primary>
493 <secondary>mrrouting</secondary>
494 <tertiary>routinghealth_routerhealth</tertiary>
495 </indexterm>Router Health</title>
496 <para>The routing infrastructure now relies on LNet Health to keep track
497 of interface health. Each gateway interface has a health value
498 associated with it. If a send fails to one of these interfaces, then the
499 interface's health value is decremented and placed on a recovery queue.
500 The unhealthy interface is then pinged every
501 <literal>lnet_recovery_interval</literal>. This value defaults to
502 <literal>1</literal> second.</para>
503 <para>If the peer receives a message from the gateway, then it immediately
504 assumes that the gateway's interface is up and resets its health value to
505 maximum. This is needed to ensure we start using the gateways immediately
506 instead of holding off until the interface is back to full health.</para>
508 <section xml:id="mrrouting.health_discovery">
509 <title><indexterm><primary>MR</primary>
510 <secondary>mrrouting</secondary>
511 <tertiary>routinghealth_discovery</tertiary>
512 </indexterm>Discovery</title>
513 <para>LNet Discovery is used in place of pinging the peers. This serves
516 <listitem><para>The discovery communication infrastructure does not need
517 to be duplicated for the routing feature.</para></listitem>
518 <listitem><para>It allows propagation of the gateway's interface state
519 changes to the peers using the gateway.</para></listitem>
521 <para>For (2), if an interface changes state from <literal>UP</literal> to
522 <literal>DOWN</literal> or vice versa, then a discovery
523 <literal>PUSH</literal> is sent to all the peers which can be reached.
524 This allows peers to adapt to changes quicker.</para>
525 <para>Discovery is designed to be backwards compatible. The discovery
526 protocol is composed of a <literal>GET</literal> and a
527 <literal>PUT</literal>. The <literal>GET</literal> requests interface
528 information from the peer, this is a basic lnet ping. The peer responds
529 with its interface information and a feature bit. If the peer is
530 multi-rail capable and discovery is turned on, then the node will
531 <literal>PUSH</literal> its interface information. As a result both peers
532 will be aware of each other's interfaces.</para>
533 <para>This information is then used by the peers to decide, based on the
534 interface state provided by the gateway, whether the route is alive or
537 <section xml:id="mrrouting.health_aliveness">
538 <title><indexterm><primary>MR</primary>
539 <secondary>mrrouting</secondary>
540 <tertiary>routinghealth_aliveness</tertiary>
541 </indexterm>Route Aliveness Criteria</title>
542 <para>A route is considered alive if the following conditions hold:</para>
544 <listitem><para>The gateway can be reached on the local net via at least
545 one path.</para></listitem>
546 <listitem><para>If <literal>avoid_asym_router_failure</literal> is
547 enabled then the remote network defined in the route must have at least
548 one healthy interface on the gateway.</para></listitem>
552 <section xml:id="dbdoclet.mrhealth" condition="l2C">
553 <title><indexterm><primary>MR</primary><secondary>health</secondary>
554 </indexterm>LNet Health</title>
555 <para>LNet Multi-Rail has implemented the ability for multiple interfaces
556 to be used on the same LNet network or across multiple LNet networks. The
557 LNet Health feature adds the ability to maintain a health value for each
558 local and remote interface. This allows the Multi-Rail algorithm to
559 consider the health of the interface before selecting it for sending.
560 The feature also adds the ability to resend messages across different
561 interfaces when interface or network failures are detected. This allows
562 LNet to mitigate communication failures before passing the failures to
563 upper layers for further error handling. To accomplish this, LNet Health
564 monitors the status of the send and receive operations and uses this
565 status to increment the interface's health value in case of success and
566 decrement it in case of failure.</para>
567 <section xml:id="dbdoclet.mrhealthvalue">
568 <title><indexterm><primary>MR</primary>
569 <secondary>mrhealth</secondary>
570 <tertiary>value</tertiary>
571 </indexterm>Health Value</title>
572 <para>The initial health value of a local or remote interface is set to
573 <literal>LNET_MAX_HEALTH_VALUE</literal>, currently set to be
574 <literal>1000</literal>. The value itself is arbitrary and is meant to
575 allow for health granularity, as opposed to having a simple boolean state.
576 The granularity allows the Multi-Rail algorithm to select the interface
577 that has the highest likelihood of sending or receiving a message.</para>
579 <section xml:id="dbdoclet.mrhealthfailuretypes">
580 <title><indexterm><primary>MR</primary>
581 <secondary>mrhealth</secondary>
582 <tertiary>failuretypes</tertiary>
583 </indexterm>Failure Types and Behavior</title>
584 <para>LNet health behavior depends on the type of failure detected:</para>
585 <informaltable frame="all">
587 <colspec colname="c1" colwidth="50*"/>
588 <colspec colname="c2" colwidth="50*"/>
592 <para><emphasis role="bold">Failure Type</emphasis></para>
595 <para><emphasis role="bold">Behavior</emphasis></para>
602 <para><literal>localresend</literal></para>
605 <para>A local failure has occurred, such as no route found or an
606 address resolution error. These failures could be temporary,
607 therefore LNet will attempt to resend the message. LNet will
608 decrement the health value of the local interface and will
609 select it less often if there are multiple available interfaces.
615 <para><literal>localno-resend</literal></para>
618 <para>A local non-recoverable error occurred in the system, such
619 as out of memory error. In these cases LNet will not attempt to
620 resend the message. LNet will decrement the health value of the
621 local interface and will select it less often if there are
622 multiple available interfaces.
628 <para><literal>remoteno-resend</literal></para>
631 <para>If LNet successfully sends a message, but the message does
632 not complete or an expected reply is not received, then it is
633 classified as a remote error. LNet will not attempt to resend the
634 message to avoid duplicate messages on the remote end. LNet will
635 decrement the health value of the remote interface and will
636 select it less often if there are multiple available interfaces.
642 <para><literal>remoteresend</literal></para>
645 <para>There are a set of failures where we can be reasonably sure
646 that the message was dropped before getting to the remote end. In
647 this case, LNet will attempt to resend the message. LNet will
648 decrement the health value of the remote interface and will
649 select it less often if there are multiple available interfaces.
656 <section xml:id="dbdoclet.mrhealthinterface">
657 <title><indexterm><primary>MR</primary>
658 <secondary>mrhealth</secondary>
659 <tertiary>interface</tertiary>
660 </indexterm>User Interface</title>
661 <para>LNet Health is turned off by default. There are multiple module
662 parameters available to control the LNet Health feature.</para>
663 <para>All the module parameters are implemented in sysfs and are located
664 in /sys/module/lnet/parameters/. They can be set directly by echoing a
665 value into them as well as from lnetctl.</para>
666 <informaltable frame="all">
668 <colspec colname="c1" colwidth="50*"/>
669 <colspec colname="c2" colwidth="50*"/>
673 <para><emphasis role="bold">Parameter</emphasis></para>
676 <para><emphasis role="bold">Description</emphasis></para>
683 <para><literal>lnet_health_sensitivity</literal></para>
686 <para>When LNet detects a failure on a particular interface it
687 will decrement its Health Value by
688 <literal>lnet_health_sensitivity</literal>. The greater the value,
689 the longer it takes for that interface to become healthy again.
690 The default value of <literal>lnet_health_sensitivity</literal>
691 is set to 0, which means the health value will not be decremented.
692 In essense, the health feature is turned off.</para>
693 <para>The sensitivity value can be set greater than 0. A
694 <literal>lnet_health_sensitivity</literal> of 100 would mean that
695 10 consecutive message failures or a steady-state failure rate
696 over 1% would degrade the interface Health Value until it is
697 disabled, while a lower failure rate would steer traffic away from
698 the interface but it would continue to be available. When a
699 failure occurs on an interface then its Health Value is
700 decremented and the interface is flagged for recovery.</para>
701 <screen>lnetctl set health_sensitivity: sensitivity to failure
702 0 - turn off health evaluation
703 >0 - sensitivity value not more than 1000</screen>
708 <para><literal>lnet_recovery_interval</literal></para>
711 <para>When LNet detects a failure on a local or remote interface
712 it will place that interface on a recovery queue. There is a
713 recovery queue for local interfaces and another for remote
714 interfaces. The interfaces on the recovery queues will be LNet
715 PINGed every <literal>lnet_recovery_interval</literal>. This value
716 defaults to <literal>1</literal> second. On every successful PING
717 the health value of the interface pinged will be incremented by
718 <literal>1</literal>.</para>
719 <para>Having this value configurable allows system administrators
720 to control the amount of control traffic on the network.</para>
721 <screen>lnetctl set recovery_interval: interval to ping unhealthy interfaces
722 >0 - timeout in seconds</screen>
727 <para><literal>lnet_transaction_timeout</literal></para>
730 <para>This timeout is somewhat of an overloaded value. It carries
731 the following functionality:</para>
734 <para>A message is abandoned if it is not sent successfully
735 when the lnet_transaction_timeout expires and the retry_count
736 is not reached.</para>
739 <para>A GET or a PUT which expects an ACK expires if a REPLY
740 or an ACK respectively, is not received within the
741 <literal>lnet_transaction_timeout</literal>.</para>
744 <para>This value defaults to 30 seconds.</para>
745 <screen>lnetctl set transaction_timeout: Message/Response timeout
746 >0 - timeout in seconds</screen>
747 <note><para>The LND timeout will now be a fraction of the
748 <literal>lnet_transaction_timeout</literal> as described in the
750 <para>This means that in networks where very large delays are
751 expected then it will be necessary to increase this value
752 accordingly.</para></note>
757 <para><literal>lnet_retry_count</literal></para>
760 <para>When LNet detects a failure which it deems appropriate for
761 re-sending a message it will check if a message has passed the
762 maximum retry_count specified. After which if a message wasn't
763 sent successfully a failure event will be passed up to the layer
764 which initiated message sending.</para>
765 <para>Since the message retry interval
766 (<literal>lnet_lnd_timeout</literal>) is computed from
767 <literal>lnet_transaction_timeout / lnet_retry_count</literal>,
768 the <literal>lnet_retry_count</literal> should be kept low enough
769 that the retry interval is not shorter than the round-trip message
770 delay in the network. A <literal>lnet_retry_count</literal> of 5
771 is reasonable for the default
772 <literal>lnet_transaction_timeout</literal> of 50 seconds.</para>
773 <screen>lnetctl set retry_count: number of retries
775 >0 - number of retries, cannot be more than <literal>lnet_transaction_timeout</literal></screen>
780 <para><literal>lnet_lnd_timeout</literal></para>
783 <para>This is not a configurable parameter. But it is derived from
784 two configurable parameters:
785 <literal>lnet_transaction_timeout</literal> and
786 <literal>retry_count</literal>.</para>
787 <screen>lnet_lnd_timeout = lnet_transaction_timeout / retry_count
789 <para>As such there is a restriction that
790 <literal>lnet_transaction_timeout >= retry_count</literal>
792 <para>The core assumption here is that in a healthy network,
793 sending and receiving LNet messages should not have large delays.
794 There could be large delays with RPC messages and their responses,
795 but that's handled at the PtlRPC layer.</para>
802 <section xml:id="dbdoclet.mrhealthdisplay">
803 <title><indexterm><primary>MR</primary>
804 <secondary>mrhealth</secondary>
805 <tertiary>display</tertiary>
806 </indexterm>Displaying Information</title>
807 <section xml:id="dbdoclet.mrhealthdisplayhealth">
808 <title>Showing LNet Health Configuration Settings</title>
809 <para><literal>lnetctl</literal> can be used to show all the LNet health
810 configuration settings using the <literal>lnetctl global show</literal>
812 <screen>#> lnetctl global show
818 transaction_timeout: 10
819 health_sensitivity: 100
820 recovery_interval: 1</screen>
822 <section xml:id="dbdoclet.mrhealthdisplaystats">
823 <title>Showing LNet Health Statistics</title>
824 <para>LNet Health statistics are shown under a higher verbosity
825 settings. To show the local interface health statistics:</para>
826 <screen>lnetctl net show -v 3</screen>
827 <para>To show the remote interface health statistics:</para>
828 <screen>lnetctl peer show -v 3</screen>
829 <para>Sample output:</para>
830 <screen>#> lnetctl net show -v 3
834 - nid: 192.168.122.108@tcp
871 peer_buffer_credits: 0
876 CPT: "[0]"</screen>
877 <para>There is a new YAML block, <literal>health stats</literal>, which
878 displays the health statistics for each local or remote network
880 <para>Global statistics also dump the global health statistics as shown
882 <screen>#> lnetctl stats show
890 response_timeout_count: 0
891 local_interrupt_count: 0
892 local_dropped_count: 10
893 local_aborted_count: 0
894 local_no_route_count: 0
895 local_timeout_count: 0
897 remote_dropped_count: 0
898 remote_error_count: 0
899 remote_timeout_count: 0
900 network_timeout_count: 0
904 send_length: 425791628
907 drop_length: 0</screen>
910 <section xml:id="dbdoclet.mrhealthinitialsetup">
911 <title><indexterm><primary>MR</primary>
912 <secondary>mrhealth</secondary>
913 <tertiary>initialsetup</tertiary>
914 </indexterm>Initial Settings Recommendations</title>
915 <para>LNet Health is off by default. This means that
916 <literal>lnet_health_sensitivity</literal> and
917 <literal>lnet_retry_count</literal> are set to <literal>0</literal>.
919 <para>Setting <literal>lnet_health_sensitivity</literal> to
920 <literal>0</literal> will not decrement the health of the interface on
921 failure and will not change the interface selection behavior. Furthermore,
922 the failed interfaces will not be placed on the recovery queues. In
923 essence, turning off the LNet Health feature.</para>
924 <para>The LNet Health settings will need to be tuned for each cluster.
925 However, the base configuration would be as follows:</para>
926 <screen>#> lnetctl global show
932 transaction_timeout: 10
933 health_sensitivity: 100
934 recovery_interval: 1</screen>
935 <para>This setting will allow a maximum of two retries for failed messages
936 within the 5 second transaction timeout.</para>
937 <para>If there is a failure on the interface the health value will be
938 decremented by 1 and the interface will be LNet PINGed every 1 second.
944 vim:expandtab:shiftwidth=2:tabstop=8: