1 <?xml version='1.0' encoding='UTF-8'?>
2 <!-- This document was created with Syntext Serna Free. --><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="configuringlnet">
3 <title xml:id="configuringlnet.title">Configuring Lustre Networking (LNet)</title>
4 <para>This chapter describes how to configure Lustre Networking (LNet). It
5 includes the following sections:</para>
8 <para><xref linkend="lnet_config"/>
12 <para><xref linkend="lnet_module_params"/>
16 <para><xref linkend="lnet_module_network_params"/>
20 <para><xref linkend="lnet_ip2nets"/>
24 <para><xref linkend="lnet_module_routes"/>
28 <para><xref linkend="lnet_config_testing"/>
32 <para><xref linkend="lnet_router_checker"/>
36 <para><xref linkend="lnet_best_practices"/>
41 <para>Configuring LNet is optional.</para>
42 <para> LNet will use the first TCP/IP interface it discovers on a
43 system (<literal>eth0</literal>) if it's loaded using the
44 <literal>lctl network up</literal>. If this network configuration is
45 sufficient, you do not need to configure LNet. LNet configuration is
46 required if you are using Infiniband or multiple Ethernet
48 <para condition='l27'>The <literal>lnetctl</literal> utility can be used
49 to initialize LNet without bringing up any network interfaces. Network
50 interfaces can be added after configuring LNet via
51 <literal>lnetctl</literal>. <literal>lnetctl</literal> can also be used to
52 manage an operational LNet. However, if it wasn't initialized by
53 <literal>lnetctl</literal> then <literal>lnetctl lnet configure</literal>
54 must be invoked before <literal>lnetctl</literal> can be used to manage
56 <para condition='l27'>DLC also introduces a C-API to enable
57 configuring LNet programatically. See <xref
58 linkend="lnetconfigurationapi"/></para>
60 <section xml:id="lnet_config" condition='l27'>
62 <primary>LNet</primary>
63 <secondary>Configuring LNet</secondary>
64 </indexterm>Configuring LNet via <literal>lnetctl</literal></title>
65 <para>The <literal>lnetctl</literal> utility can be used to initialize
66 and configure the LNet kernel module after it has been loaded via
67 <literal>modprobe</literal>. In general the lnetctl format is as
69 <screen>lnetctl cmd subcmd [options]</screen>
70 <para>The following configuration items are managed by the tool:</para>
74 <para>Configuring/unconfiguring LNet</para>
77 <para>Adding/removing/showing Networks</para>
80 <para>Adding/removing/showing Routes</para>
83 <para>Enabling/Disabling routing</para>
86 <para>Configuring Router Buffer Pools</para>
90 <section xml:id="lnet_config.cli_overview">
92 <primary>LNet</primary>
93 <secondary>cli</secondary>
94 </indexterm>Configuring LNet</title>
95 <para>After LNet has been loaded via <literal>modprobe</literal>,
96 <literal>lnetctl</literal> utility can be used to configure LNet
97 without bringing up networks which are specified in the module
98 parameters. It can also be used to configure network interfaces
99 specified in the module prameters by providing the
100 <literal>--all</literal> option.</para>
101 <screen>lnetctl lnet configure [--all]
102 # --all: load NI configuration from module parameters</screen>
103 <para>The <literal>lnetctl</literal> utility can also be used to
104 unconfigure LNet.</para>
105 <screen>lnetctl lnet unconfigure</screen>
107 <section xml:id="lnet_config.show_global_settings">
109 <primary>LNet</primary>
110 <secondary>cli</secondary>
111 </indexterm>Displaying Global Settings</title>
112 <para>The active LNet global settings can be displayed using the
113 <literal>lnetctl</literal> command shown below:</para>
114 <screen>lnetctl global show</screen>
115 <para>For example:</para>
116 <screen># lnetctl global show
121 drop_asym_route: 0</screen>
123 <section xml:id="lnet_config.lnetaddshowdelete">
124 <title><indexterm><primary>LNet</primary>
125 <secondary>cli</secondary></indexterm>Adding, Deleting and Showing
127 <para>Networks can be added, deleted, or shown after the LNet kernel
128 module is loaded.</para>
129 <para>The <emphasis role="bold"><literal>lnetctl net add</literal>
130 </emphasis> command is used to add networks:</para>
131 <screen>lnetctl net add: add a network
132 --net: net name (ex tcp0)
133 --if: physical interface (ex eth0)
134 --peer_timeout: time to wait before declaring a peer dead
135 --peer_credits: defines the max number of inflight messages
136 --peer_buffer_credits: the number of buffer credits per peer
137 --credits: Network Interface credits
138 --cpts: CPU Partitions configured net uses
139 --help: display this help text
142 lnetctl net add --net tcp2 --if eth0
143 --peer_timeout 180 --peer_credits 8</screen>
144 <note condition='l2A'><para>With the addition of Software based Multi-Rail
145 in Lustre 2.10, the following should be noted:</para>
147 <listitem><para>--net: no longer needs to be unique since multiple
148 interfaces can be added to the same network.</para></listitem>
149 <listitem><para>--if: The same interface per network can be added
150 only once, however, more than one interface can now be specified
151 (separated by a comma) for a node. For example: eth0,eth1,eth2.
153 </itemizedlist></para>
154 <para>For examples on adding multiple interfaces via
155 <literal>lnetctl net add</literal> and/or YAML, please see
156 <xref linkend="dbdoclet.mrconfiguring" />
159 <para>Networks can be deleted with the
160 <emphasis role="bold"><literal>lnetctl net del</literal></emphasis>
162 <screen>net del: delete a network
163 --net: net name (ex tcp0)
164 --if: physical inerface (e.g. eth0)
167 lnetctl net del --net tcp2</screen>
168 <note condition='l2A'><para>In a Software Multi-Rail configuration,
169 specifying only the <literal>--net</literal> argument will delete the
170 entire network and all interfaces under it. The new
171 <literal>--if</literal> switch should also be used in conjunction with
172 <literal>--net</literal> to specify deletion of a specific interface.
174 <para>All or a subset of the configured networks can be shown with the
175 <emphasis role="bold"><literal>lnetctl net show</literal></emphasis>
176 command. The output can be non-verbose or verbose.</para>
177 <screen>net show: show networks
178 --net: net name (ex tcp0) to filter on
179 --verbose: display detailed output per network
183 lnetctl net show --verbose
184 lnetctl net show --net tcp2 --verbose</screen>
185 <para>Below are examples of non-detailed and detailed network
186 configuration show.</para>
187 <screen># non-detailed show
188 > lnetctl net show --net tcp2
190 - nid: 192.168.205.130@tcp2
196 > lnetctl net show --net tcp2 --verbose
198 - nid: 192.168.205.130@tcp2
205 peer_buffer_credits: 0
206 credits: 256</screen>
208 <section condition='l2A' xml:id="lnet_config.manual_addshowdelete">
210 <primary>LNet</primary>
211 <secondary>cli</secondary>
212 </indexterm>Manual Adding, Deleting and Showing Peers</title>
213 <para>The <emphasis role="bold"><literal>lnetctl peer add</literal>
214 </emphasis> command is used to manually add a remote peer to a software
215 multi-rail configuration. For the dynamic peer discovery capability
216 introduced in Lustre Release 2.11.0, please see
217 <xref linkend="lnet_config.dynamic_discovery" />.</para>
218 <para>When configuring peers, use the <literal>–-prim_nid</literal>
219 option to specify the key or primary nid of the peer node. Then
220 follow that with the <literal>--nid</literal> option to specify a
221 set of comma separated NIDs.</para>
222 <screen>peer add: add a peer
223 --prim_nid: primary NID of the peer
224 --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0)
225 --non_mr: if specified this interface is created as a non mulit-rail
226 capable peer. Only one NID can be specified in this case.</screen>
227 <para>For example:</para>
229 lnetctl peer add --prim_nid 10.10.10.2@tcp --nid 10.10.3.3@tcp1,10.4.4.5@tcp2
231 <para>The <literal>--prim-nid</literal> (primary nid for the peer
232 node) can go unspecified. In this case, the first listed NID in the
233 <literal>--nid</literal> option becomes the primary nid of the peer.
236 lnetctl peer_add --nid 10.10.10.2@tcp,10.10.3.3@tcp1,10.4.4.5@tcp2</screen>
237 <para>YAML can also be used to configure peers:</para>
239 - primary nid: <key or primary nid>
244 - nid: <nid n></screen>
245 <para>As with all other commands, the result of the
246 <literal>lnetctl peer show</literal> command can be used to gather
247 information to aid in configuring or deleting a peer:</para>
248 <screen>lnetctl peer show -v</screen>
249 <para>Example output from the <literal>lnetctl peer show</literal>
252 - primary nid: 192.168.122.218@tcp
255 - nid: 192.168.122.218@tcp
258 available_tx_credits: 8
259 available_rtr_credits: 8
266 - nid: 192.168.122.78@tcp
269 available_tx_credits: 8
270 available_rtr_credits: 8
277 - nid: 192.168.122.96@tcp
280 available_tx_credits: 8
281 available_rtr_credits: 8
288 <para>Use the following <literal>lnetctl</literal> command to delete a
290 <screen>peer del: delete a peer
291 --prim_nid: Primary NID of the peer
292 --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0)</screen>
293 <para><literal>prim_nid</literal> should always be specified. The
294 <literal>prim_nid</literal> identifies the peer. If the
295 <literal>prim_nid</literal> is the only one specified, then the
296 entire peer is deleted.</para>
297 <para>Example of deleting a single nid of a peer (10.10.10.3@tcp):
299 <screen>lnetctl peer del --prim_nid 10.10.10.2@tcp --nid 10.10.10.3@tcp</screen>
300 <para>Example of deleting the entire peer:</para>
301 <screen>lnetctl peer del --prim_nid 10.10.10.2@tcp</screen>
303 <section condition='l2B' xml:id="lnet_config.dynamic_discovery">
305 <primary>LNet</primary>
306 <secondary>cli</secondary>
307 <tertiary>dynamic discovery</tertiary>
308 </indexterm>Dynamic Peer Discovery</title>
309 <section xml:id="lnet_config.dynamic_discovery.overview">
310 <title>Overview</title>
311 <para>Dynamic Discovery (DD) is a feature that allows nodes to
312 dynamically discover a peer's interfaces without having to explicitly
313 configure them. This is very useful for Multi-Rail (MR)
314 configurations. In large clusters, there could be hundreds of nodes
315 and having to configure MR peers on each node becomes error prone.
316 Dynamic Discovery is enabled by default and uses a new protocol based
317 on LNet pings to discover the interfaces of the remote peers on first
320 <section xml:id="lnet_config.dynamic_discovery.protocol">
321 <title>Protocol</title>
322 <para>When LNet on a node is requested to send a message to a peer it
323 first attempts to ping the peer. The reply to the ping contains the
324 peer's NIDs as well as a feature bit outlining what the peer supports.
325 Dynamic Discovery adds a Multi-Rail feature bit. If the peer is
326 Multi-Rail capable, it sets the MR bit in the ping reply. When the
327 node receives the reply it checks the MR bit, and if it is set it then
328 pushes its own list of NIDs to the peer using a new PUT message,
329 referred to as a "push ping". After this brief protocol, both the peer
330 and the node will have each other's list of interfaces. The MR
331 algorithm can then proceed to use the list of interfaces of the
332 corresponding peer.</para>
333 <para>If the peer is not MR capable, it will not set the MR feature
334 bit in the ping reply. The node will understand that the peer is
335 not MR capable and will only use the interface provided by upper
336 layers for sending messages.</para>
338 <section xml:id="lnet_config.dynamic_discovery.userspace_config">
339 <title>Dynamic Discovery and User-space Configuration</title>
340 <para>It is possible to configure the peer manually while Dynamic
341 Discovery is running. Manual peer configuration always takes precedence
342 over Dynamic Discovery. If there is a discrepancy between the manual
343 configuration and the dynamically discovered information, a warning is
346 <section xml:id="lnet_config.dynamic_discovery.config">
347 <title>Configuration</title>
348 <para>Dynamic Discovery is very light on the configuration side. It can
349 only be turned on or turned off. To turn the feature on or off, the
350 following command is used:</para>
351 <screen>lnetctl set discovery [0 | 1]</screen>
352 <para>To check the current <literal>discovery</literal> setting, the
353 <literal>lnetctl global show</literal> command can be used as shown in
354 <xref linkend="lnet_config.show_global_settings"/>.</para>
356 <section xml:id="lnet_config.dynamic_discovery.ondemand">
357 <title>Initiating Dynamic Discovery on Demand</title>
358 <para>It is possible to initiate the Dynamic Discovery protocol on demand
359 without having to wait for a message to be sent to the peer. This can
360 be done with the following command:</para>
361 <screen>lnetctl discover <peer_nid> [<peer_nid> ...]</screen>
366 <primary>LNet</primary>
367 <secondary>cli</secondary>
368 </indexterm>Adding, Deleting and Showing routes</title>
369 <para>A set of routes can be added to identify how LNet messages are
371 <screen>lnetctl route add: add a route
372 --net: net name (ex tcp0) LNet message is destined to.
373 The can not be a local network.
374 --gateway: gateway node nid (ex 10.1.1.2@tcp) to route
375 all LNet messaged destined for the identified
377 --hop: number of hops to final destination
378 (1 < hops < 255)
379 --priority: priority of route (0 - highest prio)
382 lnetctl route add --net tcp2 --gateway 192.168.205.130@tcp1 --hop 2 --prio 1</screen>
383 <para>Routes can be deleted via the following <literal>lnetctl</literal>
385 <screen>lnetctl route del: delete a route
386 --net: net name (ex tcp0)
387 --gateway: gateway nid (ex 10.1.1.2@tcp)
390 lnetctl route del --net tcp2 --gateway 192.168.205.130@tcp1</screen>
391 <para>Configured routes can be shown via the following
392 <literal>lnetctl</literal> command.</para>
393 <screen>lnetctl route show: show routes
394 --net: net name (ex tcp0) to filter on
395 --gateway: gateway nid (ex 10.1.1.2@tcp) to filter on
396 --hop: number of hops to final destination
397 (1 < hops < 255) to filter on
398 --priority: priority of route (0 - highest prio)
400 --verbose: display detailed output per route
407 lnetctl route show --verbose</screen>
408 <para>When showing routes the <literal>--verbose</literal> option
409 outputs more detailed information. All show and error output are in
410 YAML format. Below are examples of both non-detailed and detailed
411 route show output.</para>
412 <screen>#Non-detailed output
416 gateway: 192.168.205.130@tcp1
419 > lnetctl route show --verbose
422 gateway: 192.168.205.130@tcp1
429 <primary>LNet</primary>
430 <secondary>cli</secondary>
431 </indexterm>Enabling and Disabling Routing</title>
432 <para>When an LNet node is configured as a router it will route LNet
433 messages not destined to itself. This feature can be enabled or
434 disabled as follows.</para>
435 <screen>lnetctl set routing [0 | 1]
436 # 0 - disable routing feature
437 # 1 - enable routing feature</screen>
441 <primary>LNet</primary>
442 <secondary>cli</secondary>
443 </indexterm>Showing routing information</title>
444 <para>When routing is enabled on a node, the tiny, small and large
445 routing buffers are allocated. See <xref
446 linkend="dbdoclet.50438272_73839"/> for more details on router
447 buffers. This information can be shown as follows:</para>
448 <screen>lnetctl routing show: show routing information
451 lnetctl routing show</screen>
452 <para>An example of the show output:</para>
453 <screen>> lnetctl routing show
475 <primary>LNet</primary>
476 <secondary>cli</secondary>
477 </indexterm>Configuring Routing Buffers</title>
478 <para> The routing buffers values configured specify the number of
479 buffers in each of the tiny, small and large groups.</para>
480 <para>It is often desirable to configure the tiny, small and large
481 routing buffers to some values other than the default. These values
482 are global values, when set they are used by all configured CPU
483 partitions. If routing is enabled then the values set take effect
484 immediately. If a larger number of buffers is specified, then
485 buffers are allocated to satisfy the configuration change. If fewer
486 buffers are configured then the excess buffers are freed as they
487 become unused. If routing is not set the values are not changed.
488 The buffer values are reset to default if routing is turned off and
490 <para>The <literal>lnetctl</literal> 'set' command can be
491 used to set these buffer values. A VALUE greater than 0
492 will set the number of buffers accordingly. A VALUE of 0
493 will reset the number of buffers to system defaults.</para>
494 <screen>set tiny_buffers:
495 set tiny routing buffers
496 VALUE must be greater than or equal to 0
498 set small_buffers: set small routing buffers
499 VALUE must be greater than or equal to 0
501 set large_buffers: set large routing buffers
502 VALUE must be greater than or equal to 0</screen>
503 <para>Usage examples:</para>
504 <screen>> lnetctl set tiny_buffers 4096
505 > lnetctl set small_buffers 8192
506 > lnetctl set large_buffers 2048</screen>
507 <para>The buffers can be set back to the default values as follows:</para>
508 <screen>> lnetctl set tiny_buffers 0
509 > lnetctl set small_buffers 0
510 > lnetctl set large_buffers 0</screen>
512 <section condition='l2D' xml:id="lnet_config.asym_route">
514 <primary>LNet</primary>
515 <secondary>cli</secondary>
516 <tertiary>asymmetrical route</tertiary>
517 </indexterm>Asymmetrical Routes</title>
518 <section xml:id="lnet_config.asym_route.overview">
519 <title>Overview</title>
520 <para>An asymmetrical route is when a message from a remote peer is
521 coming through a router that is not known by this node
522 to reach the remote peer.</para>
523 <para>Asymmetrical routes can be an issue when debugging network, and
524 allowing them also opens the door to attacks where hostile clients
525 inject data to the servers.</para>
526 <para>So it is possible to activate a check in LNet, that will detect
527 any asymmetrical route message and drop it.</para>
529 <section xml:id="lnet_config.dynamic_discovery.configuration">
530 <title>Configuration</title>
531 <para>In order to switch asymmetric route detection on or off, the
532 following command is used:</para>
533 <screen>lnetctl set drop_asym_route [0 | 1]</screen>
534 <para>This command works on a per-node basis. This means each node in a
535 Lustre cluster can decide whether it accepts asymmetrical route
537 <para>To check the current <literal>drop_asym_route</literal> setting, the
538 <literal>lnetctl global show</literal> command can be used as shown in
539 <xref linkend="lnet_config.show_global_settings"/>.</para>
540 <para>By default, asymmetric route detection is off.</para>
545 <primary>LNet</primary>
546 <secondary>cli</secondary>
547 </indexterm>Importing YAML Configuration File</title>
548 <para>Configuration can be described in YAML format and can be fed
549 into the <literal>lnetctl</literal> utility. The
550 <literal>lnetctl</literal> utility parses the YAML file and performs
551 the specified operation on all entities described there in. If no
552 operation is defined in the command as shown below, the default
553 operation is 'add'. The YAML syntax is described in a later
554 section.</para> <screen>lnetctl import FILE.yaml
555 lnetctl import < FILE.yaml</screen>
556 <para>The '<literal>lnetctl</literal> import' command provides three
557 optional parameters to define the operation to be performed on the
558 configuration items described in the YAML file.</para>
559 <screen># if no options are given to the command the "add" command is assumed
561 lnetctl import --add FILE.yaml
562 lnetctl import --add < FILE.yaml
564 # to delete all items described in the YAML file
565 lnetctl import --del FILE.yaml
566 lnetctl import --del < FILE.yaml
568 # to show all items described in the YAML file
569 lnetctl import --show FILE.yaml
570 lnetctl import --show < FILE.yaml</screen>
574 <primary>LNet</primary>
575 <secondary>cli</secondary>
576 </indexterm>Exporting Configuration in YAML format</title>
577 <para><literal>lnetctl</literal> utility provides the 'export'
578 command to dump current LNet configuration in YAML format </para>
579 <screen>lnetctl export FILE.yaml
580 lnetctl export > FILE.yaml</screen>
584 <primary>LNet</primary>
585 <secondary>cli</secondary>
586 </indexterm>Showing LNet Traffic Statistics</title>
587 <para><literal>lnetctl</literal> utility can dump the LNet traffic
588 statistiscs as follows</para>
589 <screen>lnetctl stats show</screen>
593 <primary>LNet</primary>
594 <secondary>yaml syntax</secondary>
595 </indexterm>YAML Syntax</title>
596 <para>The <literal>lnetctl</literal> utility can take in a YAML file
597 describing the configuration items that need to be operated on and
598 perform one of the following operations: add, delete or show on the
599 items described there in.</para>
600 <para>Net, routing and route YAML blocks are all defined as a YAML
601 sequence, as shown in the following sections. The stats YAML block
602 is a YAML object. Each sequence item can take a seq_no field. This
603 seq_no field is returned in the error block. This allows the caller
604 to associate the error with the item that caused the error. The
605 <literal>lnetctl</literal> utilty does a best effort at configuring
606 items defined in the YAML file. It does not stop processing the file
607 at the first error.</para>
608 <para>Below is the YAML syntax describing the various
609 configuration elements which can be operated on via DLC. Not all
610 YAML elements are required for all operations (add/delete/show).
611 The system ignores elements which are not pertinent to the requested
615 <primary>LNet</primary>
616 <secondary>network yaml syntax</secondary>
617 </indexterm>Network Configuration</title>
620 - net: <network. Ex: tcp or o2ib>
622 0: <physical interface>
623 detail: <This is only applicable for show command. 1 - output detailed info. 0 - basic output>
625 peer_timeout: <Integer. Timeout before consider a peer dead>
626 peer_credits: <Integer. Transmit credits for a peer>
627 peer_buffer_credits: <Integer. Credits available for receiving messages>
628 credits: <Integer. Network Interface credits>
629 SMP: <An array of integers of the form: "[x,y,...]", where each
630 integer represents the CPT to associate the network interface
631 with> seq_no: <integer. Optional. User generated, and is
632 passed back in the YAML error block></screen>
633 <para>Both seq_no and detail fields do not appear in the show output.
638 <primary>LNet</primary>
639 <secondary>buffer yaml syntax</secondary>
640 </indexterm>Enable Routing and Adjust Router Buffer Configuration
644 - tiny: <Integer. Tiny buffers>
645 small: <Integer. Small buffers>
646 large: <Integer. Large buffers>
647 enable: <0 - disable routing. 1 - enable routing>
648 seq_no: <Integer. Optional. User generated, and is passed back in the YAML error block></screen>
649 <para>The seq_no field does not appear in the show output</para>
653 <primary>LNet</primary>
654 <secondary>statistics yaml syntax</secondary>
655 </indexterm>Show Statistics</title>
658 seq_no: <Integer. Optional. User generated, and is passed back in the YAML error block></screen>
659 <para>The seq_no field does not appear in the show output</para>
663 <primary>LNet</primary>
664 <secondary>router yaml syntax</secondary>
665 </indexterm>Route Configuration</title>
668 - net: <network. Ex: tcp or o2ib>
669 gateway: <nid of the gateway in the form <ip>@<net>: Ex: 192.168.29.1@tcp>
670 hop: <an integer between 1 and 255. Optional>
671 detail: <This is only applicable for show commands. 1 - output detailed info. 0. basic output>
672 seq_no: <integer. Optional. User generated, and is passed back in the YAML error block></screen>
673 <para>Both seq_no and detail fields do not appear in the show output.
678 <section xml:id="lnet_module_params">
679 <title><indexterm><primary>LNet</primary></indexterm>
680 Overview of LNet Module Parameters</title>
681 <para>LNet kernel module (lnet) parameters specify how LNet is to be
682 configured to work with Lustre, including which NICs will be
683 configured to work with Lustre and the routing to be used with
685 <para>Parameters for LNet can be specified in the
686 <literal>/etc/modprobe.d/lustre.conf</literal> file. In some cases
687 the parameters may have been stored in
688 <literal>/etc/modprobe.conf</literal>, but this has been deprecated
689 since before RHEL5 and SLES10, and having a separate
690 <literal>/etc/modprobe.d/lustre.conf</literal> file simplifies
691 administration and distribution of the Lustre networking
692 configuration. This file contains one or more entries with the
694 <screen>options lnet <replaceable>parameter</replaceable>=<replaceable>value</replaceable></screen>
695 <para>To specify the network interfaces that are to be used for
696 Lustre, set either the <literal>networks</literal> parameter or the
697 <literal>ip2nets</literal> parameter (only one of these parameters can
698 be used at a time):</para>
701 <para><literal>networks</literal> - Specifies the networks to be used.
705 <para><literal>ip2nets</literal> - Lists globally-available
706 networks, each with a range of IP addresses. LNet then identifies
707 locally-available networks through address list-matching
711 <para>See <xref linkend="lnet_module_network_params"/> and
712 <xref linkend="lnet_ip2nets"/> for more details.</para>
713 <para>To set up routing between networks, use:</para>
716 <para><literal>routes</literal> - Lists networks and the NIDs of
717 routers that forward to them.</para>
720 <para>See <xref linkend="lnet_module_routes"/> for more details.</para>
721 <para>A <literal>router</literal> checker can be configured to enable
722 Lustre nodes to detect router health status, avoid routers that appear
723 dead, and reuse those that restore service after failures. See <xref
724 linkend="lnet_router_checker"/> for more details.</para>
725 <para>For a complete reference to the LNet module parameters, see
726 <emphasis><xref linkend="configurationfilesmoduleparameters"/>LNet
727 Options</emphasis>.</para>
729 <para>We recommend that you use 'dotted-quad' notation for
730 IP addresses rather than host names to make it easier to read debug
731 logs and debug configurations with multiple interfaces.</para>
734 <title><indexterm><primary>LNet</primary><secondary>using
735 NID</secondary></indexterm>Using a Lustre Network Identifier (NID)
736 to Identify a Node</title>
737 <para>A Lustre network identifier (NID) is used to uniquely identify
738 a Lustre network endpoint by node ID and network type. The format of
740 <screen><replaceable>network_id</replaceable>@<replaceable>network_type</replaceable></screen>
741 <para>Examples are:</para>
742 <screen>10.67.73.200@tcp0
743 10.67.75.100@o2ib</screen>
744 <para>The first entry above identifies a TCP/IP node, while the
745 second entry identifies an InfiniBand node.</para>
746 <para>When a mount command is run on a client, the client uses the
747 NID of the MDS to retrieve configuration information. If an MDS has
748 more than one NID, the client should use the appropriate NID for its
749 local network.</para>
750 <para>To determine the appropriate NID to specify in
751 the mount command, use the <literal>lctl</literal> command. To
752 display MDS NIDs, run on the MDS :</para>
753 <screen>lctl list_nids</screen>
754 <para>To determine if a client can reach the MDS using a particular NID,
755 run on the client:</para>
756 <screen>lctl which_nid <replaceable>MDS_NID</replaceable></screen>
759 <section xml:id="lnet_module_network_params">
760 <title><indexterm><primary>LNet</primary>
761 <secondary>module parameters</secondary>
762 </indexterm>Setting the LNet Module networks Parameter</title>
763 <para>If a node has more than one network interface, you'll
764 typically want to dedicate a specific interface to Lustre. You can do
765 this by including an entry in the <literal>lustre.conf</literal> file
766 on the node that sets the LNet module <literal>networks</literal>
768 <screen>options lnet networks=<replaceable>comma-separated list of
769 networks</replaceable></screen>
770 <para>This example specifies that a Lustre node will use a TCP/IP
771 interface and an InfiniBand interface:</para>
772 <screen>options lnet networks=tcp0(eth0),o2ib(ib0)</screen>
773 <para>This example specifies that the Lustre node will use the TCP/IP
774 interface <literal>eth1</literal>:</para>
775 <screen>options lnet networks=tcp0(eth1)</screen>
776 <para>Depending on the network design, it may be necessary to specify
777 explicit interfaces. To explicitly specify that interface
778 <literal>eth2</literal> be used for network <literal>tcp0</literal>
779 and <literal>eth3</literal> be used for <literal>tcp1</literal> , use
781 <screen>options lnet networks=tcp0(eth2),tcp1(eth3)</screen>
782 <para>When more than one interface is available during the network
783 setup, Lustre chooses the best route based on the hop count. Once the
784 network connection is established, Lustre expects the network to stay
785 connected. In a Lustre network, connections do not fail over to
786 another interface, even if multiple interfaces are available on the
789 <para>LNet lines in <literal>lustre.conf</literal> are only used by
790 the local node to determine what to call its interfaces. They are
791 not used for routing decisions.</para>
794 <title><indexterm><primary>configuring</primary>
795 <secondary>multihome</secondary></indexterm>Multihome Server Example
797 <para>If a server with multiple IP addresses (multihome server) is
798 connected to a Lustre network, certain configuration setting are
799 required. An example illustrating these setting consists of a
800 network with the following nodes:</para>
803 <para> Server svr1 with three TCP NICs (<literal>eth0</literal>,
804 <literal>eth1</literal>, and <literal>eth2</literal>) and an
805 InfiniBand NIC.</para>
808 <para> Server svr2 with three TCP NICs (<literal>eth0</literal>,
809 <literal>eth1</literal>, and <literal>eth2</literal>) and an
810 InfiniBand NIC. Interface eth2 will not be used for Lustre
814 <para> TCP clients, each with a single TCP interface.</para>
817 <para> InfiniBand clients, each with a single Infiniband
818 interface and a TCP/IP interface for administration.</para>
821 <para>To set the <literal>networks</literal> option for this example:
825 <para> On each server, <literal>svr1</literal> and
826 <literal>svr2</literal>, include the following line in the
827 <literal>lustre.conf</literal> file:</para>
830 <screen>options lnet networks=tcp0(eth0),tcp1(eth1),o2ib</screen>
833 <para> For TCP-only clients, the first available non-loopback IP
834 interface is used for <literal>tcp0</literal>. Thus, TCP clients
835 with only one interface do not need to have options defined in
836 the <literal>lustre.conf</literal> file.</para>
839 <para> On the InfiniBand clients, include the following line in
840 the <literal>lustre.conf</literal> file:</para>
843 <screen>options lnet networks=o2ib</screen>
845 <para>By default, Lustre ignores the loopback
846 (<literal>lo0</literal>) interface. Lustre does not ignore IP
847 addresses aliased to the loopback. If you alias IP addresses to
848 the loopback interface, you must specify all Lustre networks using
849 the LNet networks parameter.</para>
852 <para>If the server has multiple interfaces on the same subnet,
853 the Linux kernel will send all traffic using the first configured
854 interface. This is a limitation of Linux, not Lustre. In this
855 case, network interface bonding should be used. For more
856 information about network interface bonding, see <xref
857 linkend="settingupbonding"/>.</para>
861 <section xml:id="lnet_ip2nets">
862 <title><indexterm><primary>LNet</primary>
863 <secondary>ip2nets</secondary>
864 </indexterm>Setting the LNet Module ip2nets Parameter</title>
865 <para>The <literal>ip2nets</literal> option is typically used when a
866 single, universal <literal>lustre.conf</literal> file is run on all
867 servers and clients. Each node identifies the locally available
868 networks based on the listed IP address patterns that match the
869 node's local IP addresses.</para>
870 <para>Note that the IP address patterns listed in the
871 <literal>ip2nets</literal> option are <emphasis>only</emphasis> used
872 to identify the networks that an individual node should instantiate.
873 They are <emphasis>not</emphasis> used by LNet for any other
874 communications purpose.</para>
875 <para>For the example below, the nodes in the network have these IP
879 <para> Server svr1: <literal>eth0</literal> IP address
880 <literal>192.168.0.2</literal>, IP over Infiniband
881 (<literal>o2ib</literal>) address
882 <literal>132.6.1.2</literal>.</para>
885 <para> Server svr2: <literal>eth0</literal> IP address
886 <literal>192.168.0.4</literal>, IP over Infiniband
887 (<literal>o2ib</literal>) address
888 <literal>132.6.1.4</literal>.</para>
891 <para> TCP clients have IP addresses
892 <literal>192.168.0.5-255.</literal></para>
895 <para> Infiniband clients have IP over Infiniband
896 (<literal>o2ib</literal>) addresses <literal>132.6.[2-3].2, .4,
897 .6, .8</literal>.</para>
900 <para>The following entry is placed in the
901 <literal>lustre.conf</literal> file on each server and client:</para>
902 <screen>options lnet 'ip2nets="tcp0(eth0) 192.168.0.[2,4]; \
903 tcp0 192.168.0.*; o2ib0 132.6.[1-3].[2-8/2]"'</screen>
904 <para>Each entry in <literal>ip2nets</literal> is referred to as a
905 'rule'.</para>
906 <para>The order of LNet entries is important when configuring servers.
907 If a server node can be reached using more than one network, the first
908 network specified in <literal>lustre.conf</literal> will be
910 <para>Because <literal>svr1</literal> and <literal>svr2</literal>
911 match the first rule, LNet uses <literal>eth0</literal> for
912 <literal>tcp0</literal> on those machines. (Although
913 <literal>svr1</literal> and <literal>svr2</literal> also match the
914 second rule, the first matching rule for a particular network is
916 <para>The <literal>[2-8/2]</literal> format indicates a range of 2-8
917 stepped by 2; that is 2,4,6,8. Thus, the clients at
918 <literal>132.6.3.5</literal> will not find a matching o2ib
920 <note condition='l2A'>
921 <para>Multi-rail deprecates the kernel parsing of ip2nets. ip2nets
922 patterns are matched in user space and translated into Network
923 interfaces to be added into the system.</para>
924 <para>The first interface that matches the IP pattern will be used when
925 adding a network interface.</para>
926 <para>If an interface is explicitly specified as well as a pattern, the
927 interface matched using the IP pattern will be sanitized against the
928 explicitly-defined interface.</para>
929 <para>For example, <literal>tcp(eth0) 192.168.*.3</literal> and there
930 exists in the system <literal>eth0 == 192.158.19.3</literal> and
931 <literal>eth1 == 192.168.3.3</literal>, then the configuration will
932 fail, because the pattern contradicts the interface specified.
934 <para>A clear warning will be displayed if inconsistent configuration is
936 <para>You could use the following command to configure ip2nets:</para>
937 <screen>lnetctl import < ip2nets.yaml</screen>
938 <para>For example:</para>
951 0: 192.168.*.*</screen>
954 <section xml:id="lnet_module_routes">
955 <title><indexterm><primary>LNet</primary>
956 <secondary>routes</secondary></indexterm>Setting the LNet Module routes
958 <para>The LNet module routes parameter is used to identify routers in
959 a Lustre configuration. These parameters are set in
960 <literal>modprobe.conf</literal> on each Lustre node. </para>
961 <para>Routes are typically set to connect to segregated subnetworks
962 or to cross connect two different types of networks such as tcp and
964 <para>The LNet routes parameter specifies a colon-separated list of
965 router definitions. Each route is defined as a network number,
966 followed by a list of routers:</para>
967 <screen>routes=<replaceable>net_type router_NID(s)</replaceable></screen>
968 <para>This example specifies bi-directional routing in which TCP
969 clients can reach Lustre resources on the IB networks and IB servers
970 can access the TCP networks:</para>
971 <screen>options lnet 'ip2nets="tcp0 192.168.0.*; \
972 o2ib0(ib0) 132.6.1.[1-128]"' 'routes="tcp0 132.6.1.[1-8]@o2ib0; \
973 o2ib0 192.16.8.0.[1-8]@tcp0"'</screen>
974 <para>All LNet routers that bridge two networks are equivalent. They
975 are not configured as primary or secondary, and the load is balanced
976 across all available routers.</para>
977 <para>The number of LNet routers is not limited. Enough routers should
978 be used to handle the required file serving bandwidth plus a 25
979 percent margin for headroom.</para>
981 <title><indexterm><primary>LNet</primary><secondary>routing
982 example</secondary></indexterm>Routing Example</title>
983 <para>On the clients, place the following entry in the
984 <literal>lustre.conf</literal> file</para>
985 <screen>lnet networks="tcp" routes="o2ib0 192.168.0.[1-8]@tcp0"</screen>
986 <para>On the router nodes, use:</para>
987 <screen>lnet networks="tcp o2ib" forwarding=enabled </screen>
988 <para>On the MDS, use the reverse as shown below:</para>
989 <screen>lnet networks="o2ib0" routes="tcp0 132.6.1.[1-8]@o2ib0" </screen>
990 <para>To start the routers, run:</para>
991 <screen>modprobe lnet
992 lctl network configure</screen>
995 <section xml:id="lnet_config_testing">
996 <title><indexterm><primary>LNet</primary>
997 <secondary>testing</secondary></indexterm>Testing the LNet
998 Configuration</title>
999 <para>After configuring Lustre Networking, it is highly recommended
1000 that you test your LNet configuration using the LNet Self-Test
1001 provided with the Lustre software. For more information about using
1002 LNet Self-Test, see <xref linkend="lnetselftest"/>.</para>
1004 <section xml:id="lnet_router_checker">
1005 <title><indexterm><primary>LNet</primary>
1006 <secondary>route checker</secondary>
1007 </indexterm>Configuring the Router Checker</title>
1008 <para>In a Lustre configuration in which different types of networks,
1009 such as a TCP/IP network and an Infiniband network, are connected by
1010 routers, a router checker can be run on the clients and servers in the
1011 routed configuration to monitor the status of the routers. In a
1012 multi-hop routing configuration, router checkers can be configured on
1013 routers to monitor the health of their next-hop routers.</para>
1014 <para>A router checker is configured by setting LNet parameters in
1015 <literal>lustre.conf</literal> by including an entry in this
1017 <screen>options lnet
1018 <replaceable>router_checker_parameter</replaceable>=<replaceable>value</replaceable></screen>
1019 <para>The router checker parameters are:</para>
1022 <para><literal>live_router_check_interval</literal> - Specifies a
1023 time interval in seconds after which the router checker will ping
1024 the live routers. The default value is 0, meaning no checking is
1025 done. To set the value to 60, enter:</para>
1026 <screen>options lnet live_router_check_interval=60</screen>
1029 <para><literal>dead_router_check_interval</literal> - Specifies a
1030 time interval in seconds after which the router checker will check
1031 for dead routers. The default value is 0, meaning no checking is
1032 done. To set the value to 60, enter:</para>
1033 <screen>options lnet dead_router_check_interval=60</screen>
1036 <para>auto_down - Enables/disables (1/0) the automatic marking of
1037 router state as up or down. The default value is 1. To disable
1038 router marking, enter:</para>
1039 <screen>options lnet auto_down=0</screen>
1042 <para><literal>router_ping_timeout</literal> - Specifies a
1043 timeout for the router checker when it checks live or dead
1044 routers. The router checker sends a ping message to each dead or
1045 live router once every dead_router_check_interval or
1046 live_router_check_interval respectively. The default value is 50.
1047 To set the value to 60, enter:</para>
1048 <screen>options lnet router_ping_timeout=60</screen>
1050 <para>The <literal>router_ping_timeout</literal> is consistent
1051 with the default LND timeouts. You may have to increase it on very
1052 large clusters if the LND timeout is also increased. For larger
1053 clusters, we suggest increasing the check interval.</para>
1057 <para><literal>check_routers_before_use</literal> - Specifies
1058 that routers are to be checked before use. Set to off by
1059 default. If this parameter is set to on, the
1060 dead_router_check_interval parameter must be given a positive
1061 integer value.</para>
1062 <screen>options lnet check_routers_before_use=on</screen>
1065 <para>The router checker obtains the following information from each router:
1069 <para> Time the router was disabled</para>
1072 <para> Elapsed disable time</para>
1075 <para>If the router checker does not get a reply message from the
1076 router within router_ping_timeout seconds, it considers the router to
1078 <para>If a router is marked 'up' and responds to a ping, the
1079 timeout is reset.</para>
1080 <para>If 100 packets have been sent successfully through a router, the
1081 sent-packets counter for that router will have a value of 100.</para>
1083 <section xml:id="lnet_best_practices">
1084 <title><indexterm><primary>LNet</primary>
1085 <secondary>best practice</secondary>
1086 </indexterm>Best Practices for LNet Options</title>
1087 <para>For the <literal>networks</literal>, <literal>ip2nets</literal>,
1088 and <literal>routes</literal> options, follow these best practices to
1089 avoid configuration errors.</para>
1090 <section xml:id="lnet_best_practices.escape_comments">
1091 <title><indexterm><primary>LNet</primary>
1092 <secondary>escaping commas with quotes</secondary>
1093 </indexterm>Escaping commas with quotes</title>
1094 <para>Depending on the Linux distribution, commas may need to be
1095 escaped using single or double quotes. In the extreme case, the
1096 <literal>options</literal> entry would look like this:</para>
1097 <para><screen>options
1098 lnet'networks="tcp0,elan0"'
1099 'routes="tcp [2,10]@elan0"'</screen></para>
1100 <para>Added quotes may confuse some distributions. Messages such as
1101 the following may indicate an issue related to added quotes:</para>
1102 <para><screen>lnet: Unknown parameter 'networks'</screen></para>
1103 <para>A <literal>'Refusing connection - no matching
1104 NID'</literal> message generally points to an error in the LNet
1105 module configuration.</para>
1107 <section xml:id="lnet_best_practices.comments">
1108 <title><indexterm><primary>LNet</primary>
1109 <secondary>comments</secondary></indexterm>Including comments</title>
1110 <para><emphasis>Place the semicolon terminating a comment
1111 immediately after the comment.</emphasis> LNet silently ignores
1112 everything between the <literal>#</literal> character at the
1113 beginning of the comment and the next semicolon.</para>
1114 <para>In this <emphasis>incorrect</emphasis> example, LNet silently
1115 ignores <literal>pt11 192.168.0.[92,96]</literal>, resulting in
1116 these nodes not being properly initialized. No error message is
1118 <screen>options lnet ip2nets="pt10 192.168.0.[89,93]; # comment
1119 with semicolon BEFORE comment \ pt11 192.168.0.[92,96];</screen>
1120 <para>This <emphasis role="italic">correct</emphasis> example shows
1121 the required syntax: </para>
1122 <para><screen>options lnet ip2nets="pt10 192.168.0.[89,93] \
1123 # comment with semicolon AFTER comment; \
1124 pt11 192.168.0.[92,96] # comment</screen></para>
1125 <para><emphasis role="italic">Do not add an excessive number of
1126 comments.</emphasis> The Linux kernel limits the length of character
1127 strings used in module options (usually to 1KB, but this may differ
1128 between vendor kernels). If you exceed this limit, errors result and
1129 the specified configuration may not be processed correctly.</para>