1 <?xml version='1.0' encoding='UTF-8'?>
2 <!-- This document was created with Syntext Serna Free. --><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="configuringlnet">
3 <title xml:id="configuringlnet.title">Configuring Lustre Networking (LNet)</title>
4 <para>This chapter describes how to configure Lustre Networking (LNet). It
5 includes the following sections:</para>
8 <para><xref linkend="lnet_config"/>
12 <para><xref linkend="lnet_module_params"/>
16 <para><xref linkend="lnet_module_network_params"/>
20 <para><xref linkend="lnet_ip2nets"/>
24 <para><xref linkend="lnet_module_routes"/>
28 <para><xref linkend="lnet_config_testing"/>
32 <para><xref linkend="lnet_router_checker"/>
36 <para><xref linkend="lnet_best_practices"/>
41 <para>Configuring LNet is optional.</para>
42 <para> LNet will use the first TCP/IP interface it discovers on a
43 system (<literal>eth0</literal>) if it's loaded using the
44 <literal>lctl network up</literal>. If this network configuration is
45 sufficient, you do not need to configure LNet. LNet configuration is
46 required if you are using Infiniband or multiple Ethernet
48 <para condition='l27'>The <literal>lnetctl</literal> utility can be used
49 to initialize LNet without bringing up any network interfaces. Network
50 interfaces can be added after configuring LNet via
51 <literal>lnetctl</literal>. <literal>lnetctl</literal> can also be used to
52 manage an operational LNet. However, if it wasn't initialized by
53 <literal>lnetctl</literal> then <literal>lnetctl lnet configure</literal>
54 must be invoked before <literal>lnetctl</literal> can be used to manage
56 <para condition='l27'>DLC also introduces a C-API to enable
57 configuring LNet programatically. See <xref
58 linkend="lnetconfigurationapi"/></para>
60 <section xml:id="lnet_config" condition='l27'>
62 <primary>LNet</primary>
63 <secondary>Configuring LNet</secondary>
64 </indexterm>Configuring LNet via <literal>lnetctl</literal></title>
65 <para>The <literal>lnetctl</literal> utility can be used to initialize
66 and configure the LNet kernel module after it has been loaded via
67 <literal>modprobe</literal>. In general the lnetctl format is as
69 <screen>lnetctl cmd subcmd [options]</screen>
70 <para>The following configuration items are managed by the tool:</para>
74 <para>Configuring/unconfiguring LNet</para>
77 <para>Adding/removing/showing Networks</para>
80 <para>Adding/removing/showing Routes</para>
83 <para>Enabling/Disabling routing</para>
86 <para>Configuring Router Buffer Pools</para>
90 <section xml:id="lnet_config.cli_overview">
92 <primary>LNet</primary>
93 <secondary>cli</secondary>
94 </indexterm>Configuring LNet</title>
95 <para>After LNet has been loaded via <literal>modprobe</literal>,
96 <literal>lnetctl</literal> utility can be used to configure LNet
97 without bringing up networks which are specified in the module
98 parameters. It can also be used to configure network interfaces
99 specified in the module prameters by providing the
100 <literal>--all</literal> option.</para>
101 <screen>lnetctl lnet configure [--all]
102 # --all: load NI configuration from module parameters</screen>
103 <para>The <literal>lnetctl</literal> utility can also be used to
104 unconfigure LNet.</para>
105 <screen>lnetctl lnet unconfigure</screen>
107 <section xml:id="lnet_config.show_global_settings">
109 <primary>LNet</primary>
110 <secondary>cli</secondary>
111 </indexterm>Displaying Global Settings</title>
112 <para>The active LNet global settings can be displayed using the
113 <literal>lnetctl</literal> command shown below:</para>
114 <screen>lnetctl global show</screen>
115 <para>For example:</para>
116 <screen># lnetctl global show
120 discovery: 1</screen>
122 <section xml:id="lnet_config.lnetaddshowdelete">
123 <title><indexterm><primary>LNet</primary>
124 <secondary>cli</secondary></indexterm>Adding, Deleting and Showing
126 <para>Networks can be added, deleted, or shown after the LNet kernel
127 module is loaded.</para>
128 <para>The <emphasis role="bold"><literal>lnetctl net add</literal>
129 </emphasis> command is used to add networks:</para>
130 <screen>lnetctl net add: add a network
131 --net: net name (ex tcp0)
132 --if: physical interface (ex eth0)
133 --peer_timeout: time to wait before declaring a peer dead
134 --peer_credits: defines the max number of inflight messages
135 --peer_buffer_credits: the number of buffer credits per peer
136 --credits: Network Interface credits
137 --cpts: CPU Partitions configured net uses
138 --help: display this help text
141 lnetctl net add --net tcp2 --if eth0
142 --peer_timeout 180 --peer_credits 8</screen>
143 <note condition='l2A'><para>With the addition of Software based Multi-Rail
144 in Lustre 2.10, the following should be noted:</para>
146 <listitem><para>--net: no longer needs to be unique since multiple
147 interfaces can be added to the same network.</para></listitem>
148 <listitem><para>--if: The same interface per network can be added
149 only once, however, more than one interface can now be specified
150 (separated by a comma) for a node. For example: eth0,eth1,eth2.
152 </itemizedlist></para>
153 <para>For examples on adding multiple interfaces via
154 <literal>lnetctl net add</literal> and/or YAML, please see
155 <xref linkend="dbdoclet.mrconfiguring" />
158 <para>Networks can be deleted with the
159 <emphasis role="bold"><literal>lnetctl net del</literal></emphasis>
161 <screen>net del: delete a network
162 --net: net name (ex tcp0)
163 --if: physical inerface (e.g. eth0)
166 lnetctl net del --net tcp2</screen>
167 <note condition='l2A'><para>In a Software Multi-Rail configuration,
168 specifying only the <literal>--net</literal> argument will delete the
169 entire network and all interfaces under it. The new
170 <literal>--if</literal> switch should also be used in conjunction with
171 <literal>--net</literal> to specify deletion of a specific interface.
173 <para>All or a subset of the configured networks can be shown with the
174 <emphasis role="bold"><literal>lnetctl net show</literal></emphasis>
175 command. The output can be non-verbose or verbose.</para>
176 <screen>net show: show networks
177 --net: net name (ex tcp0) to filter on
178 --verbose: display detailed output per network
182 lnetctl net show --verbose
183 lnetctl net show --net tcp2 --verbose</screen>
184 <para>Below are examples of non-detailed and detailed network
185 configuration show.</para>
186 <screen># non-detailed show
187 > lnetctl net show --net tcp2
189 - nid: 192.168.205.130@tcp2
195 > lnetctl net show --net tcp2 --verbose
197 - nid: 192.168.205.130@tcp2
204 peer_buffer_credits: 0
205 credits: 256</screen>
207 <section condition='l2A' xml:id="lnet_config.manual_addshowdelete">
209 <primary>LNet</primary>
210 <secondary>cli</secondary>
211 </indexterm>Manual Adding, Deleting and Showing Peers</title>
212 <para>The <emphasis role="bold"><literal>lnetctl peer add</literal>
213 </emphasis> command is used to manually add a remote peer to a software
214 multi-rail configuration. For the dynamic peer discovery capability
215 introduced in Lustre Release 2.11.0, please see
216 <xref linkend="lnet_config.dynamic_discovery" />.</para>
217 <para>When configuring peers, use the <literal>–-prim_nid</literal>
218 option to specify the key or primary nid of the peer node. Then
219 follow that with the <literal>--nid</literal> option to specify a
220 set of comma separated NIDs.</para>
221 <screen>peer add: add a peer
222 --prim_nid: primary NID of the peer
223 --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0)
224 --non_mr: if specified this interface is created as a non mulit-rail
225 capable peer. Only one NID can be specified in this case.</screen>
226 <para>For example:</para>
228 lnetctl peer add --prim_nid 10.10.10.2@tcp --nid 10.10.3.3@tcp1,10.4.4.5@tcp2
230 <para>The <literal>--prim-nid</literal> (primary nid for the peer
231 node) can go unspecified. In this case, the first listed NID in the
232 <literal>--nid</literal> option becomes the primary nid of the peer.
235 lnetctl peer_add --nid 10.10.10.2@tcp,10.10.3.3@tcp1,10.4.4.5@tcp2</screen>
236 <para>YAML can also be used to configure peers:</para>
238 - primary nid: <key or primary nid>
243 - nid: <nid n></screen>
244 <para>As with all other commands, the result of the
245 <literal>lnetctl peer show</literal> command can be used to gather
246 information to aid in configuring or deleting a peer:</para>
247 <screen>lnetctl peer show -v</screen>
248 <para>Example output from the <literal>lnetctl peer show</literal>
251 - primary nid: 192.168.122.218@tcp
254 - nid: 192.168.122.218@tcp
257 available_tx_credits: 8
258 available_rtr_credits: 8
265 - nid: 192.168.122.78@tcp
268 available_tx_credits: 8
269 available_rtr_credits: 8
276 - nid: 192.168.122.96@tcp
279 available_tx_credits: 8
280 available_rtr_credits: 8
287 <para>Use the following <literal>lnetctl</literal> command to delete a
289 <screen>peer del: delete a peer
290 --prim_nid: Primary NID of the peer
291 --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0)</screen>
292 <para><literal>prim_nid</literal> should always be specified. The
293 <literal>prim_nid</literal> identifies the peer. If the
294 <literal>prim_nid</literal> is the only one specified, then the
295 entire peer is deleted.</para>
296 <para>Example of deleting a single nid of a peer (10.10.10.3@tcp):
298 <screen>lnetctl peer del --prim_nid 10.10.10.2@tcp --nid 10.10.10.3@tcp</screen>
299 <para>Example of deleting the entire peer:</para>
300 <screen>lnetctl peer del --prim_nid 10.10.10.2@tcp</screen>
302 <section condition='l2B' xml:id="lnet_config.dynamic_discovery">
304 <primary>LNet</primary>
305 <secondary>cli</secondary>
306 <tertiary>dynamic discovery</tertiary>
307 </indexterm>Dynamic Peer Discovery</title>
308 <section xml:id="lnet_config.dynamic_discovery.overview">
309 <title>Overview</title>
310 <para>Dynamic Discovery (DD) is a feature that allows nodes to
311 dynamically discover a peer's interfaces without having to explicitly
312 configure them. This is very useful for Multi-Rail (MR)
313 configurations. In large clusters, there could be hundreds of nodes
314 and having to configure MR peers on each node becomes error prone.
315 Dynamic Discovery is enabled by default and uses a new protocol based
316 on LNet pings to discover the interfaces of the remote peers on first
319 <section xml:id="lnet_config.dynamic_discovery.protocol">
320 <title>Protocol</title>
321 <para>When LNet on a node is requested to send a message to a peer it
322 first attempts to ping the peer. The reply to the ping contains the
323 peer's NIDs as well as a feature bit outlining what the peer supports.
324 Dynamic Discovery adds a Multi-Rail feature bit. If the peer is
325 Multi-Rail capable, it sets the MR bit in the ping reply. When the
326 node receives the reply it checks the MR bit, and if it is set it then
327 pushes its own list of NIDs to the peer using a new PUT message,
328 referred to as a "push ping". After this brief protocol, both the peer
329 and the node will have each other's list of interfaces. The MR
330 algorithm can then proceed to use the list of interfaces of the
331 corresponding peer.</para>
332 <para>If the peer is not MR capable, it will not set the MR feature
333 bit in the ping reply. The node will understand that the peer is
334 not MR capable and will only use the interface provided by upper
335 layers for sending messages.</para>
337 <section xml:id="lnet_config.dynamic_discovery.userspace_config">
338 <title>Dynamic Discovery and User-space Configuration</title>
339 <para>It is possible to configure the peer manually while Dynamic
340 Discovery is running. Manual peer configuration always takes precedence
341 over Dynamic Discovery. If there is a discrepancy between the manual
342 configuration and the dynamically discovered information, a warning is
345 <section xml:id="lnet_config.dynamic_discovery.config">
346 <title>Configuration</title>
347 <para>Dynamic Discovery is very light on the configuration side. It can
348 only be turned on or turned off. To turn the feature on or off, the
349 following command is used:</para>
350 <screen>lnetctl set discovery [0 | 1]</screen>
351 <para>To check the current <literal>discovery</literal> setting, the
352 <literal>lnetctl global show</literal> command can be used as shown in
353 <xref linkend="lnet_config.show_global_settings"/>.</para>
355 <section xml:id="lnet_config.dynamic_discovery.ondemand">
356 <title>Initiating Dynamic Discovery on Demand</title>
357 <para>It is possible to initiate the Dynamic Discovery protocol on demand
358 without having to wait for a message to be sent to the peer. This can
359 be done with the following command:</para>
360 <screen>lnetctl discover <peer_nid> [<peer_nid> ...]</screen>
365 <primary>LNet</primary>
366 <secondary>cli</secondary>
367 </indexterm>Adding, Deleting and Showing routes</title>
368 <para>A set of routes can be added to identify how LNet messages are
370 <screen>lnetctl route add: add a route
371 --net: net name (ex tcp0) LNet message is destined to.
372 The can not be a local network.
373 --gateway: gateway node nid (ex 10.1.1.2@tcp) to route
374 all LNet messaged destined for the identified
376 --hop: number of hops to final destination
377 (1 < hops < 255)
378 --priority: priority of route (0 - highest prio)
381 lnetctl route add --net tcp2 --gateway 192.168.205.130@tcp1 --hop 2 --prio 1</screen>
382 <para>Routes can be deleted via the following <literal>lnetctl</literal>
384 <screen>lnetctl route del: delete a route
385 --net: net name (ex tcp0)
386 --gateway: gateway nid (ex 10.1.1.2@tcp)
389 lnetctl route del --net tcp2 --gateway 192.168.205.130@tcp1</screen>
390 <para>Configured routes can be shown via the following
391 <literal>lnetctl</literal> command.</para>
392 <screen>lnetctl route show: show routes
393 --net: net name (ex tcp0) to filter on
394 --gateway: gateway nid (ex 10.1.1.2@tcp) to filter on
395 --hop: number of hops to final destination
396 (1 < hops < 255) to filter on
397 --priority: priority of route (0 - highest prio)
399 --verbose: display detailed output per route
406 lnetctl route show --verbose</screen>
407 <para>When showing routes the <literal>--verbose</literal> option
408 outputs more detailed information. All show and error output are in
409 YAML format. Below are examples of both non-detailed and detailed
410 route show output.</para>
411 <screen>#Non-detailed output
415 gateway: 192.168.205.130@tcp1
418 > lnetctl route show --verbose
421 gateway: 192.168.205.130@tcp1
428 <primary>LNet</primary>
429 <secondary>cli</secondary>
430 </indexterm>Enabling and Disabling Routing</title>
431 <para>When an LNet node is configured as a router it will route LNet
432 messages not destined to itself. This feature can be enabled or
433 disabled as follows.</para>
434 <screen>lnetctl set routing [0 | 1]
435 # 0 - disable routing feature
436 # 1 - enable routing feature</screen>
440 <primary>LNet</primary>
441 <secondary>cli</secondary>
442 </indexterm>Showing routing information</title>
443 <para>When routing is enabled on a node, the tiny, small and large
444 routing buffers are allocated. See <xref
445 linkend="dbdoclet.50438272_73839"/> for more details on router
446 buffers. This information can be shown as follows:</para>
447 <screen>lnetctl routing show: show routing information
450 lnetctl routing show</screen>
451 <para>An example of the show output:</para>
452 <screen>> lnetctl routing show
474 <primary>LNet</primary>
475 <secondary>cli</secondary>
476 </indexterm>Configuring Routing Buffers</title>
477 <para> The routing buffers values configured specify the number of
478 buffers in each of the tiny, small and large groups.</para>
479 <para>It is often desirable to configure the tiny, small and large
480 routing buffers to some values other than the default. These values
481 are global values, when set they are used by all configured CPU
482 partitions. If routing is enabled then the values set take effect
483 immediately. If a larger number of buffers is specified, then
484 buffers are allocated to satisfy the configuration change. If fewer
485 buffers are configured then the excess buffers are freed as they
486 become unused. If routing is not set the values are not changed.
487 The buffer values are reset to default if routing is turned off and
489 <para>The <literal>lnetctl</literal> 'set' command can be
490 used to set these buffer values. A VALUE greater than 0
491 will set the number of buffers accordingly. A VALUE of 0
492 will reset the number of buffers to system defaults.</para>
493 <screen>set tiny_buffers:
494 set tiny routing buffers
495 VALUE must be greater than or equal to 0
497 set small_buffers: set small routing buffers
498 VALUE must be greater than or equal to 0
500 set large_buffers: set large routing buffers
501 VALUE must be greater than or equal to 0</screen>
502 <para>Usage examples:</para>
503 <screen>> lnetctl set tiny_buffers 4096
504 > lnetctl set small_buffers 8192
505 > lnetctl set large_buffers 2048</screen>
506 <para>The buffers can be set back to the default values as follows:</para>
507 <screen>> lnetctl set tiny_buffers 0
508 > lnetctl set small_buffers 0
509 > lnetctl set large_buffers 0</screen>
513 <primary>LNet</primary>
514 <secondary>cli</secondary>
515 </indexterm>Importing YAML Configuration File</title>
516 <para>Configuration can be described in YAML format and can be fed
517 into the <literal>lnetctl</literal> utility. The
518 <literal>lnetctl</literal> utility parses the YAML file and performs
519 the specified operation on all entities described there in. If no
520 operation is defined in the command as shown below, the default
521 operation is 'add'. The YAML syntax is described in a later
522 section.</para> <screen>lnetctl import FILE.yaml
523 lnetctl import < FILE.yaml</screen>
524 <para>The '<literal>lnetctl</literal> import' command provides three
525 optional parameters to define the operation to be performed on the
526 configuration items described in the YAML file.</para>
527 <screen># if no options are given to the command the "add" command is assumed
529 lnetctl import --add FILE.yaml
530 lnetctl import --add < FILE.yaml
532 # to delete all items described in the YAML file
533 lnetctl import --del FILE.yaml
534 lnetctl import --del < FILE.yaml
536 # to show all items described in the YAML file
537 lnetctl import --show FILE.yaml
538 lnetctl import --show < FILE.yaml</screen>
542 <primary>LNet</primary>
543 <secondary>cli</secondary>
544 </indexterm>Exporting Configuration in YAML format</title>
545 <para><literal>lnetctl</literal> utility provides the 'export'
546 command to dump current LNet configuration in YAML format </para>
547 <screen>lnetctl export FILE.yaml
548 lnetctl export > FILE.yaml</screen>
552 <primary>LNet</primary>
553 <secondary>cli</secondary>
554 </indexterm>Showing LNet Traffic Statistics</title>
555 <para><literal>lnetctl</literal> utility can dump the LNet traffic
556 statistiscs as follows</para>
557 <screen>lnetctl stats show</screen>
561 <primary>LNet</primary>
562 <secondary>yaml syntax</secondary>
563 </indexterm>YAML Syntax</title>
564 <para>The <literal>lnetctl</literal> utility can take in a YAML file
565 describing the configuration items that need to be operated on and
566 perform one of the following operations: add, delete or show on the
567 items described there in.</para>
568 <para>Net, routing and route YAML blocks are all defined as a YAML
569 sequence, as shown in the following sections. The stats YAML block
570 is a YAML object. Each sequence item can take a seq_no field. This
571 seq_no field is returned in the error block. This allows the caller
572 to associate the error with the item that caused the error. The
573 <literal>lnetctl</literal> utilty does a best effort at configuring
574 items defined in the YAML file. It does not stop processing the file
575 at the first error.</para>
576 <para>Below is the YAML syntax describing the various
577 configuration elements which can be operated on via DLC. Not all
578 YAML elements are required for all operations (add/delete/show).
579 The system ignores elements which are not pertinent to the requested
583 <primary>LNet</primary>
584 <secondary>network yaml syntax</secondary>
585 </indexterm>Network Configuration</title>
588 - net: <network. Ex: tcp or o2ib>
590 0: <physical interface>
591 detail: <This is only applicable for show command. 1 - output detailed info. 0 - basic output>
593 peer_timeout: <Integer. Timeout before consider a peer dead>
594 peer_credits: <Integer. Transmit credits for a peer>
595 peer_buffer_credits: <Integer. Credits available for receiving messages>
596 credits: <Integer. Network Interface credits>
597 SMP: <An array of integers of the form: "[x,y,...]", where each
598 integer represents the CPT to associate the network interface
599 with> seq_no: <integer. Optional. User generated, and is
600 passed back in the YAML error block></screen>
601 <para>Both seq_no and detail fields do not appear in the show output.
606 <primary>LNet</primary>
607 <secondary>buffer yaml syntax</secondary>
608 </indexterm>Enable Routing and Adjust Router Buffer Configuration
612 - tiny: <Integer. Tiny buffers>
613 small: <Integer. Small buffers>
614 large: <Integer. Large buffers>
615 enable: <0 - disable routing. 1 - enable routing>
616 seq_no: <Integer. Optional. User generated, and is passed back in the YAML error block></screen>
617 <para>The seq_no field does not appear in the show output</para>
621 <primary>LNet</primary>
622 <secondary>statistics yaml syntax</secondary>
623 </indexterm>Show Statistics</title>
626 seq_no: <Integer. Optional. User generated, and is passed back in the YAML error block></screen>
627 <para>The seq_no field does not appear in the show output</para>
631 <primary>LNet</primary>
632 <secondary>router yaml syntax</secondary>
633 </indexterm>Route Configuration</title>
636 - net: <network. Ex: tcp or o2ib>
637 gateway: <nid of the gateway in the form <ip>@<net>: Ex: 192.168.29.1@tcp>
638 hop: <an integer between 1 and 255. Optional>
639 detail: <This is only applicable for show commands. 1 - output detailed info. 0. basic output>
640 seq_no: <integer. Optional. User generated, and is passed back in the YAML error block></screen>
641 <para>Both seq_no and detail fields do not appear in the show output.
646 <section xml:id="lnet_module_params">
647 <title><indexterm><primary>LNet</primary></indexterm>
648 Overview of LNet Module Parameters</title>
649 <para>LNet kernel module (lnet) parameters specify how LNet is to be
650 configured to work with Lustre, including which NICs will be
651 configured to work with Lustre and the routing to be used with
653 <para>Parameters for LNet can be specified in the
654 <literal>/etc/modprobe.d/lustre.conf</literal> file. In some cases
655 the parameters may have been stored in
656 <literal>/etc/modprobe.conf</literal>, but this has been deprecated
657 since before RHEL5 and SLES10, and having a separate
658 <literal>/etc/modprobe.d/lustre.conf</literal> file simplifies
659 administration and distribution of the Lustre networking
660 configuration. This file contains one or more entries with the
662 <screen>options lnet <replaceable>parameter</replaceable>=<replaceable>value</replaceable></screen>
663 <para>To specify the network interfaces that are to be used for
664 Lustre, set either the <literal>networks</literal> parameter or the
665 <literal>ip2nets</literal> parameter (only one of these parameters can
666 be used at a time):</para>
669 <para><literal>networks</literal> - Specifies the networks to be used.
673 <para><literal>ip2nets</literal> - Lists globally-available
674 networks, each with a range of IP addresses. LNet then identifies
675 locally-available networks through address list-matching
679 <para>See <xref linkend="lnet_module_network_params"/> and
680 <xref linkend="lnet_ip2nets"/> for more details.</para>
681 <para>To set up routing between networks, use:</para>
684 <para><literal>routes</literal> - Lists networks and the NIDs of
685 routers that forward to them.</para>
688 <para>See <xref linkend="lnet_module_routes"/> for more details.</para>
689 <para>A <literal>router</literal> checker can be configured to enable
690 Lustre nodes to detect router health status, avoid routers that appear
691 dead, and reuse those that restore service after failures. See <xref
692 linkend="lnet_router_checker"/> for more details.</para>
693 <para>For a complete reference to the LNet module parameters, see
694 <emphasis><xref linkend="configurationfilesmoduleparameters"/>LNet
695 Options</emphasis>.</para>
697 <para>We recommend that you use 'dotted-quad' notation for
698 IP addresses rather than host names to make it easier to read debug
699 logs and debug configurations with multiple interfaces.</para>
702 <title><indexterm><primary>LNet</primary><secondary>using
703 NID</secondary></indexterm>Using a Lustre Network Identifier (NID)
704 to Identify a Node</title>
705 <para>A Lustre network identifier (NID) is used to uniquely identify
706 a Lustre network endpoint by node ID and network type. The format of
708 <screen><replaceable>network_id</replaceable>@<replaceable>network_type</replaceable></screen>
709 <para>Examples are:</para>
710 <screen>10.67.73.200@tcp0
711 10.67.75.100@o2ib</screen>
712 <para>The first entry above identifies a TCP/IP node, while the
713 second entry identifies an InfiniBand node.</para>
714 <para>When a mount command is run on a client, the client uses the
715 NID of the MDS to retrieve configuration information. If an MDS has
716 more than one NID, the client should use the appropriate NID for its
717 local network.</para>
718 <para>To determine the appropriate NID to specify in
719 the mount command, use the <literal>lctl</literal> command. To
720 display MDS NIDs, run on the MDS :</para>
721 <screen>lctl list_nids</screen>
722 <para>To determine if a client can reach the MDS using a particular NID,
723 run on the client:</para>
724 <screen>lctl which_nid <replaceable>MDS_NID</replaceable></screen>
727 <section xml:id="lnet_module_network_params">
728 <title><indexterm><primary>LNet</primary>
729 <secondary>module parameters</secondary>
730 </indexterm>Setting the LNet Module networks Parameter</title>
731 <para>If a node has more than one network interface, you'll
732 typically want to dedicate a specific interface to Lustre. You can do
733 this by including an entry in the <literal>lustre.conf</literal> file
734 on the node that sets the LNet module <literal>networks</literal>
736 <screen>options lnet networks=<replaceable>comma-separated list of
737 networks</replaceable></screen>
738 <para>This example specifies that a Lustre node will use a TCP/IP
739 interface and an InfiniBand interface:</para>
740 <screen>options lnet networks=tcp0(eth0),o2ib(ib0)</screen>
741 <para>This example specifies that the Lustre node will use the TCP/IP
742 interface <literal>eth1</literal>:</para>
743 <screen>options lnet networks=tcp0(eth1)</screen>
744 <para>Depending on the network design, it may be necessary to specify
745 explicit interfaces. To explicitly specify that interface
746 <literal>eth2</literal> be used for network <literal>tcp0</literal>
747 and <literal>eth3</literal> be used for <literal>tcp1</literal> , use
749 <screen>options lnet networks=tcp0(eth2),tcp1(eth3)</screen>
750 <para>When more than one interface is available during the network
751 setup, Lustre chooses the best route based on the hop count. Once the
752 network connection is established, Lustre expects the network to stay
753 connected. In a Lustre network, connections do not fail over to
754 another interface, even if multiple interfaces are available on the
757 <para>LNet lines in <literal>lustre.conf</literal> are only used by
758 the local node to determine what to call its interfaces. They are
759 not used for routing decisions.</para>
762 <title><indexterm><primary>configuring</primary>
763 <secondary>multihome</secondary></indexterm>Multihome Server Example
765 <para>If a server with multiple IP addresses (multihome server) is
766 connected to a Lustre network, certain configuration setting are
767 required. An example illustrating these setting consists of a
768 network with the following nodes:</para>
771 <para> Server svr1 with three TCP NICs (<literal>eth0</literal>,
772 <literal>eth1</literal>, and <literal>eth2</literal>) and an
773 InfiniBand NIC.</para>
776 <para> Server svr2 with three TCP NICs (<literal>eth0</literal>,
777 <literal>eth1</literal>, and <literal>eth2</literal>) and an
778 InfiniBand NIC. Interface eth2 will not be used for Lustre
782 <para> TCP clients, each with a single TCP interface.</para>
785 <para> InfiniBand clients, each with a single Infiniband
786 interface and a TCP/IP interface for administration.</para>
789 <para>To set the <literal>networks</literal> option for this example:
793 <para> On each server, <literal>svr1</literal> and
794 <literal>svr2</literal>, include the following line in the
795 <literal>lustre.conf</literal> file:</para>
798 <screen>options lnet networks=tcp0(eth0),tcp1(eth1),o2ib</screen>
801 <para> For TCP-only clients, the first available non-loopback IP
802 interface is used for <literal>tcp0</literal>. Thus, TCP clients
803 with only one interface do not need to have options defined in
804 the <literal>lustre.conf</literal> file.</para>
807 <para> On the InfiniBand clients, include the following line in
808 the <literal>lustre.conf</literal> file:</para>
811 <screen>options lnet networks=o2ib</screen>
813 <para>By default, Lustre ignores the loopback
814 (<literal>lo0</literal>) interface. Lustre does not ignore IP
815 addresses aliased to the loopback. If you alias IP addresses to
816 the loopback interface, you must specify all Lustre networks using
817 the LNet networks parameter.</para>
820 <para>If the server has multiple interfaces on the same subnet,
821 the Linux kernel will send all traffic using the first configured
822 interface. This is a limitation of Linux, not Lustre. In this
823 case, network interface bonding should be used. For more
824 information about network interface bonding, see <xref
825 linkend="settingupbonding"/>.</para>
829 <section xml:id="lnet_ip2nets">
830 <title><indexterm><primary>LNet</primary>
831 <secondary>ip2nets</secondary>
832 </indexterm>Setting the LNet Module ip2nets Parameter</title>
833 <para>The <literal>ip2nets</literal> option is typically used when a
834 single, universal <literal>lustre.conf</literal> file is run on all
835 servers and clients. Each node identifies the locally available
836 networks based on the listed IP address patterns that match the
837 node's local IP addresses.</para>
838 <para>Note that the IP address patterns listed in the
839 <literal>ip2nets</literal> option are <emphasis>only</emphasis> used
840 to identify the networks that an individual node should instantiate.
841 They are <emphasis>not</emphasis> used by LNet for any other
842 communications purpose.</para>
843 <para>For the example below, the nodes in the network have these IP
847 <para> Server svr1: <literal>eth0</literal> IP address
848 <literal>192.168.0.2</literal>, IP over Infiniband
849 (<literal>o2ib</literal>) address
850 <literal>132.6.1.2</literal>.</para>
853 <para> Server svr2: <literal>eth0</literal> IP address
854 <literal>192.168.0.4</literal>, IP over Infiniband
855 (<literal>o2ib</literal>) address
856 <literal>132.6.1.4</literal>.</para>
859 <para> TCP clients have IP addresses
860 <literal>192.168.0.5-255.</literal></para>
863 <para> Infiniband clients have IP over Infiniband
864 (<literal>o2ib</literal>) addresses <literal>132.6.[2-3].2, .4,
865 .6, .8</literal>.</para>
868 <para>The following entry is placed in the
869 <literal>lustre.conf</literal> file on each server and client:</para>
870 <screen>options lnet 'ip2nets="tcp0(eth0) 192.168.0.[2,4]; \
871 tcp0 192.168.0.*; o2ib0 132.6.[1-3].[2-8/2]"'</screen>
872 <para>Each entry in <literal>ip2nets</literal> is referred to as a
873 'rule'.</para>
874 <para>The order of LNet entries is important when configuring servers.
875 If a server node can be reached using more than one network, the first
876 network specified in <literal>lustre.conf</literal> will be
878 <para>Because <literal>svr1</literal> and <literal>svr2</literal>
879 match the first rule, LNet uses <literal>eth0</literal> for
880 <literal>tcp0</literal> on those machines. (Although
881 <literal>svr1</literal> and <literal>svr2</literal> also match the
882 second rule, the first matching rule for a particular network is
884 <para>The <literal>[2-8/2]</literal> format indicates a range of 2-8
885 stepped by 2; that is 2,4,6,8. Thus, the clients at
886 <literal>132.6.3.5</literal> will not find a matching o2ib
888 <note condition='l2A'>
889 <para>Multi-rail deprecates the kernel parsing of ip2nets. ip2nets
890 patterns are matched in user space and translated into Network
891 interfaces to be added into the system.</para>
892 <para>The first interface that matches the IP pattern will be used when
893 adding a network interface.</para>
894 <para>If an interface is explicitly specified as well as a pattern, the
895 interface matched using the IP pattern will be sanitized against the
896 explicitly-defined interface.</para>
897 <para>For example, <literal>tcp(eth0) 192.168.*.3</literal> and there
898 exists in the system <literal>eth0 == 192.158.19.3</literal> and
899 <literal>eth1 == 192.168.3.3</literal>, then the configuration will
900 fail, because the pattern contradicts the interface specified.
902 <para>A clear warning will be displayed if inconsistent configuration is
904 <para>You could use the following command to configure ip2nets:</para>
905 <screen>lnetctl import < ip2nets.yaml</screen>
906 <para>For example:</para>
919 0: 192.168.*.*</screen>
922 <section xml:id="lnet_module_routes">
923 <title><indexterm><primary>LNet</primary>
924 <secondary>routes</secondary></indexterm>Setting the LNet Module routes
926 <para>The LNet module routes parameter is used to identify routers in
927 a Lustre configuration. These parameters are set in
928 <literal>modprobe.conf</literal> on each Lustre node. </para>
929 <para>Routes are typically set to connect to segregated subnetworks
930 or to cross connect two different types of networks such as tcp and
932 <para>The LNet routes parameter specifies a colon-separated list of
933 router definitions. Each route is defined as a network number,
934 followed by a list of routers:</para>
935 <screen>routes=<replaceable>net_type router_NID(s)</replaceable></screen>
936 <para>This example specifies bi-directional routing in which TCP
937 clients can reach Lustre resources on the IB networks and IB servers
938 can access the TCP networks:</para>
939 <screen>options lnet 'ip2nets="tcp0 192.168.0.*; \
940 o2ib0(ib0) 132.6.1.[1-128]"' 'routes="tcp0 132.6.1.[1-8]@o2ib0; \
941 o2ib0 192.16.8.0.[1-8]@tcp0"'</screen>
942 <para>All LNet routers that bridge two networks are equivalent. They
943 are not configured as primary or secondary, and the load is balanced
944 across all available routers.</para>
945 <para>The number of LNet routers is not limited. Enough routers should
946 be used to handle the required file serving bandwidth plus a 25
947 percent margin for headroom.</para>
949 <title><indexterm><primary>LNet</primary><secondary>routing
950 example</secondary></indexterm>Routing Example</title>
951 <para>On the clients, place the following entry in the
952 <literal>lustre.conf</literal> file</para>
953 <screen>lnet networks="tcp" routes="o2ib0 192.168.0.[1-8]@tcp0"</screen>
954 <para>On the router nodes, use:</para>
955 <screen>lnet networks="tcp o2ib" forwarding=enabled </screen>
956 <para>On the MDS, use the reverse as shown below:</para>
957 <screen>lnet networks="o2ib0" routes="tcp0 132.6.1.[1-8]@o2ib0" </screen>
958 <para>To start the routers, run:</para>
959 <screen>modprobe lnet
960 lctl network configure</screen>
963 <section xml:id="lnet_config_testing">
964 <title><indexterm><primary>LNet</primary>
965 <secondary>testing</secondary></indexterm>Testing the LNet
966 Configuration</title>
967 <para>After configuring Lustre Networking, it is highly recommended
968 that you test your LNet configuration using the LNet Self-Test
969 provided with the Lustre software. For more information about using
970 LNet Self-Test, see <xref linkend="lnetselftest"/>.</para>
972 <section xml:id="lnet_router_checker">
973 <title><indexterm><primary>LNet</primary>
974 <secondary>route checker</secondary>
975 </indexterm>Configuring the Router Checker</title>
976 <para>In a Lustre configuration in which different types of networks,
977 such as a TCP/IP network and an Infiniband network, are connected by
978 routers, a router checker can be run on the clients and servers in the
979 routed configuration to monitor the status of the routers. In a
980 multi-hop routing configuration, router checkers can be configured on
981 routers to monitor the health of their next-hop routers.</para>
982 <para>A router checker is configured by setting LNet parameters in
983 <literal>lustre.conf</literal> by including an entry in this
986 <replaceable>router_checker_parameter</replaceable>=<replaceable>value</replaceable></screen>
987 <para>The router checker parameters are:</para>
990 <para><literal>live_router_check_interval</literal> - Specifies a
991 time interval in seconds after which the router checker will ping
992 the live routers. The default value is 0, meaning no checking is
993 done. To set the value to 60, enter:</para>
994 <screen>options lnet live_router_check_interval=60</screen>
997 <para><literal>dead_router_check_interval</literal> - Specifies a
998 time interval in seconds after which the router checker will check
999 for dead routers. The default value is 0, meaning no checking is
1000 done. To set the value to 60, enter:</para>
1001 <screen>options lnet dead_router_check_interval=60</screen>
1004 <para>auto_down - Enables/disables (1/0) the automatic marking of
1005 router state as up or down. The default value is 1. To disable
1006 router marking, enter:</para>
1007 <screen>options lnet auto_down=0</screen>
1010 <para><literal>router_ping_timeout</literal> - Specifies a
1011 timeout for the router checker when it checks live or dead
1012 routers. The router checker sends a ping message to each dead or
1013 live router once every dead_router_check_interval or
1014 live_router_check_interval respectively. The default value is 50.
1015 To set the value to 60, enter:</para>
1016 <screen>options lnet router_ping_timeout=60</screen>
1018 <para>The <literal>router_ping_timeout</literal> is consistent
1019 with the default LND timeouts. You may have to increase it on very
1020 large clusters if the LND timeout is also increased. For larger
1021 clusters, we suggest increasing the check interval.</para>
1025 <para><literal>check_routers_before_use</literal> - Specifies
1026 that routers are to be checked before use. Set to off by
1027 default. If this parameter is set to on, the
1028 dead_router_check_interval parameter must be given a positive
1029 integer value.</para>
1030 <screen>options lnet check_routers_before_use=on</screen>
1033 <para>The router checker obtains the following information from each router:
1037 <para> Time the router was disabled</para>
1040 <para> Elapsed disable time</para>
1043 <para>If the router checker does not get a reply message from the
1044 router within router_ping_timeout seconds, it considers the router to
1046 <para>If a router is marked 'up' and responds to a ping, the
1047 timeout is reset.</para>
1048 <para>If 100 packets have been sent successfully through a router, the
1049 sent-packets counter for that router will have a value of 100.</para>
1051 <section xml:id="lnet_best_practices">
1052 <title><indexterm><primary>LNet</primary>
1053 <secondary>best practice</secondary>
1054 </indexterm>Best Practices for LNet Options</title>
1055 <para>For the <literal>networks</literal>, <literal>ip2nets</literal>,
1056 and <literal>routes</literal> options, follow these best practices to
1057 avoid configuration errors.</para>
1058 <section xml:id="lnet_best_practices.escape_comments">
1059 <title><indexterm><primary>LNet</primary>
1060 <secondary>escaping commas with quotes</secondary>
1061 </indexterm>Escaping commas with quotes</title>
1062 <para>Depending on the Linux distribution, commas may need to be
1063 escaped using single or double quotes. In the extreme case, the
1064 <literal>options</literal> entry would look like this:</para>
1065 <para><screen>options
1066 lnet'networks="tcp0,elan0"'
1067 'routes="tcp [2,10]@elan0"'</screen></para>
1068 <para>Added quotes may confuse some distributions. Messages such as
1069 the following may indicate an issue related to added quotes:</para>
1070 <para><screen>lnet: Unknown parameter 'networks'</screen></para>
1071 <para>A <literal>'Refusing connection - no matching
1072 NID'</literal> message generally points to an error in the LNet
1073 module configuration.</para>
1075 <section xml:id="lnet_best_practices.comments">
1076 <title><indexterm><primary>LNet</primary>
1077 <secondary>comments</secondary></indexterm>Including comments</title>
1078 <para><emphasis>Place the semicolon terminating a comment
1079 immediately after the comment.</emphasis> LNet silently ignores
1080 everything between the <literal>#</literal> character at the
1081 beginning of the comment and the next semicolon.</para>
1082 <para>In this <emphasis>incorrect</emphasis> example, LNet silently
1083 ignores <literal>pt11 192.168.0.[92,96]</literal>, resulting in
1084 these nodes not being properly initialized. No error message is
1086 <screen>options lnet ip2nets="pt10 192.168.0.[89,93]; # comment
1087 with semicolon BEFORE comment \ pt11 192.168.0.[92,96];</screen>
1088 <para>This <emphasis role="italic">correct</emphasis> example shows
1089 the required syntax: </para>
1090 <para><screen>options lnet ip2nets="pt10 192.168.0.[89,93] \
1091 # comment with semicolon AFTER comment; \
1092 pt11 192.168.0.[92,96] # comment</screen></para>
1093 <para><emphasis role="italic">Do not add an excessive number of
1094 comments.</emphasis> The Linux kernel limits the length of character
1095 strings used in module options (usually to 1KB, but this may differ
1096 between vendor kernels). If you exceed this limit, errors result and
1097 the specified configuration may not be processed correctly.</para>