1 <?xml version='1.0' encoding='UTF-8'?>
2 <!-- This document was created with Syntext Serna Free. -->
3 <chapter xmlns="http://docbook.org/ns/docbook"
4 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
5 xml:id="configuringlnet">
6 <title xml:id="configuringlnet.title">Configuring Lustre Networking (LNet)</title>
7 <para>This chapter describes how to configure Lustre Networking (LNet). It
8 includes the following sections:</para>
11 <para><xref linkend="lnet_config"/>
15 <para><xref linkend="lnet_module_params"/>
19 <para><xref linkend="lnet_module_network_params"/>
23 <para><xref linkend="lnet_ip2nets"/>
27 <para><xref linkend="lnet_module_routes"/>
31 <para><xref linkend="lnet_config_testing"/>
35 <para><xref linkend="lnet_router_checker"/>
39 <para><xref linkend="lnet_best_practices"/>
44 <para>Configuring LNet is optional.</para>
45 <para> LNet will use the first TCP/IP interface it discovers on a
46 system (<literal>eth0</literal>) if it's loaded using the
47 <literal>lctl network up</literal>. If this network configuration is
48 sufficient, you do not need to configure LNet. LNet configuration is
49 required if you are using Infiniband or multiple Ethernet
51 <para condition='l27'>The <literal>lnetctl</literal> utility can be used
52 to initialize LNet without bringing up any network interfaces. Network
53 interfaces can be added after configuring LNet via
54 <literal>lnetctl</literal>. <literal>lnetctl</literal> can also be used to
55 manage an operational LNet. However, if it wasn't initialized by
56 <literal>lnetctl</literal> then <literal>lnetctl lnet configure</literal>
57 must be invoked before <literal>lnetctl</literal> can be used to manage
59 <para condition='l27'>DLC also introduces a C-API to enable
60 configuring LNet programatically. See <xref
61 linkend="lnetconfigurationapi"/></para>
63 <section xml:id="lnet_config" condition='l27'>
65 <primary>LNet</primary>
66 <secondary>Configuring LNet</secondary>
67 </indexterm>Configuring LNet via <literal>lnetctl</literal></title>
68 <para>The <literal>lnetctl</literal> utility can be used to initialize
69 and configure the LNet kernel module after it has been loaded via
70 <literal>modprobe</literal>. In general the lnetctl format is as
72 <screen>lnetctl cmd subcmd [options]</screen>
73 <para>The following configuration items are managed by the tool:</para>
77 <para>Configuring/unconfiguring LNet</para>
80 <para>Adding/removing/showing Networks</para>
83 <para>Adding/removing/showing Routes</para>
86 <para>Enabling/Disabling routing</para>
89 <para>Configuring Router Buffer Pools</para>
93 <section xml:id="lnet_config.cli_overview">
95 <primary>LNet</primary>
96 <secondary>cli</secondary>
97 </indexterm>Configuring LNet</title>
98 <para>After LNet has been loaded via <literal>modprobe</literal>,
99 <literal>lnetctl</literal> utility can be used to configure LNet
100 without bringing up networks which are specified in the module
101 parameters. It can also be used to configure network interfaces
102 specified in the module prameters by providing the
103 <literal>--all</literal> option.</para>
104 <screen>lnetctl lnet configure [--all]
105 # --all: load NI configuration from module parameters</screen>
106 <para>The <literal>lnetctl</literal> utility can also be used to
107 unconfigure LNet.</para>
108 <screen>lnetctl lnet unconfigure</screen>
110 <section xml:id="lnet_config.show_global_settings">
112 <primary>LNet</primary>
113 <secondary>cli</secondary>
114 </indexterm>Displaying Global Settings</title>
115 <para>The active LNet global settings can be displayed using the
116 <literal>lnetctl</literal> command shown below:</para>
117 <screen>lnetctl global show</screen>
118 <para>For example:</para>
119 <screen># lnetctl global show
124 drop_asym_route: 0</screen>
126 <section xml:id="lnet_config.lnetaddshowdelete">
127 <title><indexterm><primary>LNet</primary>
128 <secondary>cli</secondary></indexterm>Adding, Deleting and Showing
130 <para>Networks can be added, deleted, or shown after the LNet kernel
131 module is loaded.</para>
132 <para>The <emphasis role="bold"><literal>lnetctl net add</literal>
133 </emphasis> command is used to add networks:</para>
134 <screen>lnetctl net add: add a network
135 --net: net name (ex tcp0)
136 --if: physical interface (ex eth0)
137 --peer_timeout: time to wait before declaring a peer dead
138 --peer_credits: defines the max number of inflight messages
139 --peer_buffer_credits: the number of buffer credits per peer
140 --credits: Network Interface credits
141 --cpts: CPU Partitions configured net uses
142 --help: display this help text
145 lnetctl net add --net tcp2 --if eth0
146 --peer_timeout 180 --peer_credits 8</screen>
147 <note condition='l2A'><para>With the addition of Software based Multi-Rail
148 in Lustre 2.10, the following should be noted:</para>
150 <listitem><para>--net: no longer needs to be unique since multiple
151 interfaces can be added to the same network.</para></listitem>
152 <listitem><para>--if: The same interface per network can be added
153 only once, however, more than one interface can now be specified
154 (separated by a comma) for a node. For example: eth0,eth1,eth2.
156 </itemizedlist></para>
157 <para>For examples on adding multiple interfaces via
158 <literal>lnetctl net add</literal> and/or YAML, please see
159 <xref linkend="dbdoclet.mrconfiguring" />
162 <para>Networks can be deleted with the
163 <emphasis role="bold"><literal>lnetctl net del</literal></emphasis>
165 <screen>net del: delete a network
166 --net: net name (ex tcp0)
167 --if: physical inerface (e.g. eth0)
170 lnetctl net del --net tcp2</screen>
171 <note condition='l2A'><para>In a Software Multi-Rail configuration,
172 specifying only the <literal>--net</literal> argument will delete the
173 entire network and all interfaces under it. The new
174 <literal>--if</literal> switch should also be used in conjunction with
175 <literal>--net</literal> to specify deletion of a specific interface.
177 <para>All or a subset of the configured networks can be shown with the
178 <emphasis role="bold"><literal>lnetctl net show</literal></emphasis>
179 command. The output can be non-verbose or verbose.</para>
180 <screen>net show: show networks
181 --net: net name (ex tcp0) to filter on
182 --verbose: display detailed output per network
186 lnetctl net show --verbose
187 lnetctl net show --net tcp2 --verbose</screen>
188 <para>Below are examples of non-detailed and detailed network
189 configuration show.</para>
190 <screen># non-detailed show
191 > lnetctl net show --net tcp2
193 - nid: 192.168.205.130@tcp2
199 > lnetctl net show --net tcp2 --verbose
201 - nid: 192.168.205.130@tcp2
208 peer_buffer_credits: 0
209 credits: 256</screen>
211 <section condition='l2A' xml:id="lnet_config.manual_addshowdelete">
213 <primary>LNet</primary>
214 <secondary>cli</secondary>
215 </indexterm>Manual Adding, Deleting and Showing Peers</title>
216 <para>The <emphasis role="bold"><literal>lnetctl peer add</literal>
217 </emphasis> command is used to manually add a remote peer to a software
218 multi-rail configuration. For the dynamic peer discovery capability
219 introduced in Lustre Release 2.11.0, please see
220 <xref linkend="lnet_config.dynamic_discovery" />.</para>
221 <para>When configuring peers, use the <literal>–-prim_nid</literal>
222 option to specify the key or primary nid of the peer node. Then
223 follow that with the <literal>--nid</literal> option to specify a
224 set of comma separated NIDs.</para>
225 <screen>peer add: add a peer
226 --prim_nid: primary NID of the peer
227 --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0)
228 --non_mr: if specified this interface is created as a non mulit-rail
229 capable peer. Only one NID can be specified in this case.</screen>
230 <para>For example:</para>
232 lnetctl peer add --prim_nid 10.10.10.2@tcp --nid 10.10.3.3@tcp1,10.4.4.5@tcp2
234 <para>The <literal>--prim-nid</literal> (primary nid for the peer
235 node) can go unspecified. In this case, the first listed NID in the
236 <literal>--nid</literal> option becomes the primary nid of the peer.
239 lnetctl peer_add --nid 10.10.10.2@tcp,10.10.3.3@tcp1,10.4.4.5@tcp2</screen>
240 <para>YAML can also be used to configure peers:</para>
242 - primary nid: <key or primary nid>
247 - nid: <nid n></screen>
248 <para>As with all other commands, the result of the
249 <literal>lnetctl peer show</literal> command can be used to gather
250 information to aid in configuring or deleting a peer:</para>
251 <screen>lnetctl peer show -v</screen>
252 <para>Example output from the <literal>lnetctl peer show</literal>
255 - primary nid: 192.168.122.218@tcp
258 - nid: 192.168.122.218@tcp
261 available_tx_credits: 8
262 available_rtr_credits: 8
269 - nid: 192.168.122.78@tcp
272 available_tx_credits: 8
273 available_rtr_credits: 8
280 - nid: 192.168.122.96@tcp
283 available_tx_credits: 8
284 available_rtr_credits: 8
291 <para>Use the following <literal>lnetctl</literal> command to delete a
293 <screen>peer del: delete a peer
294 --prim_nid: Primary NID of the peer
295 --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0)</screen>
296 <para><literal>prim_nid</literal> should always be specified. The
297 <literal>prim_nid</literal> identifies the peer. If the
298 <literal>prim_nid</literal> is the only one specified, then the
299 entire peer is deleted.</para>
300 <para>Example of deleting a single nid of a peer (10.10.10.3@tcp):
302 <screen>lnetctl peer del --prim_nid 10.10.10.2@tcp --nid 10.10.10.3@tcp</screen>
303 <para>Example of deleting the entire peer:</para>
304 <screen>lnetctl peer del --prim_nid 10.10.10.2@tcp</screen>
306 <section condition='l2B' xml:id="lnet_config.dynamic_discovery">
308 <primary>LNet</primary>
309 <secondary>cli</secondary>
310 <tertiary>dynamic discovery</tertiary>
311 </indexterm>Dynamic Peer Discovery</title>
312 <section xml:id="lnet_config.dynamic_discovery.overview">
313 <title>Overview</title>
314 <para>Dynamic Discovery (DD) is a feature that allows nodes to
315 dynamically discover a peer's interfaces without having to explicitly
316 configure them. This is very useful for Multi-Rail (MR)
317 configurations. In large clusters, there could be hundreds of nodes
318 and having to configure MR peers on each node becomes error prone.
319 Dynamic Discovery is enabled by default and uses a new protocol based
320 on LNet pings to discover the interfaces of the remote peers on first
323 <section xml:id="lnet_config.dynamic_discovery.protocol">
324 <title>Protocol</title>
325 <para>When LNet on a node is requested to send a message to a peer it
326 first attempts to ping the peer. The reply to the ping contains the
327 peer's NIDs as well as a feature bit outlining what the peer supports.
328 Dynamic Discovery adds a Multi-Rail feature bit. If the peer is
329 Multi-Rail capable, it sets the MR bit in the ping reply. When the
330 node receives the reply it checks the MR bit, and if it is set it then
331 pushes its own list of NIDs to the peer using a new PUT message,
332 referred to as a "push ping". After this brief protocol, both the peer
333 and the node will have each other's list of interfaces. The MR
334 algorithm can then proceed to use the list of interfaces of the
335 corresponding peer.</para>
336 <para>If the peer is not MR capable, it will not set the MR feature
337 bit in the ping reply. The node will understand that the peer is
338 not MR capable and will only use the interface provided by upper
339 layers for sending messages.</para>
341 <section xml:id="lnet_config.dynamic_discovery.userspace_config">
342 <title>Dynamic Discovery and User-space Configuration</title>
343 <para>It is possible to configure the peer manually while Dynamic
344 Discovery is running. Manual peer configuration always takes precedence
345 over Dynamic Discovery. If there is a discrepancy between the manual
346 configuration and the dynamically discovered information, a warning is
349 <section xml:id="lnet_config.dynamic_discovery.config">
350 <title>Configuration</title>
351 <para>Dynamic Discovery is very light on the configuration side. It can
352 only be turned on or turned off. To turn the feature on or off, the
353 following command is used:</para>
354 <screen>lnetctl set discovery [0 | 1]</screen>
355 <para>To check the current <literal>discovery</literal> setting, the
356 <literal>lnetctl global show</literal> command can be used as shown in
357 <xref linkend="lnet_config.show_global_settings"/>.</para>
359 <section xml:id="lnet_config.dynamic_discovery.ondemand">
360 <title>Initiating Dynamic Discovery on Demand</title>
361 <para>It is possible to initiate the Dynamic Discovery protocol on demand
362 without having to wait for a message to be sent to the peer. This can
363 be done with the following command:</para>
364 <screen>lnetctl discover <peer_nid> [<peer_nid> ...]</screen>
369 <primary>LNet</primary>
370 <secondary>cli</secondary>
371 </indexterm>Adding, Deleting and Showing routes</title>
372 <para>A set of routes can be added to identify how LNet messages are
374 <screen>lnetctl route add: add a route
375 --net: net name (ex tcp0) LNet message is destined to.
376 The can not be a local network.
377 --gateway: gateway node nid (ex 10.1.1.2@tcp) to route
378 all LNet messaged destined for the identified
380 --hop: number of hops to final destination
381 (1 < hops < 255)
382 --priority: priority of route (0 - highest prio)
385 lnetctl route add --net tcp2 --gateway 192.168.205.130@tcp1 --hop 2 --prio 1</screen>
386 <para>Routes can be deleted via the following <literal>lnetctl</literal>
388 <screen>lnetctl route del: delete a route
389 --net: net name (ex tcp0)
390 --gateway: gateway nid (ex 10.1.1.2@tcp)
393 lnetctl route del --net tcp2 --gateway 192.168.205.130@tcp1</screen>
394 <para>Configured routes can be shown via the following
395 <literal>lnetctl</literal> command.</para>
396 <screen>lnetctl route show: show routes
397 --net: net name (ex tcp0) to filter on
398 --gateway: gateway nid (ex 10.1.1.2@tcp) to filter on
399 --hop: number of hops to final destination
400 (1 < hops < 255) to filter on
401 --priority: priority of route (0 - highest prio)
403 --verbose: display detailed output per route
410 lnetctl route show --verbose</screen>
411 <para>When showing routes the <literal>--verbose</literal> option
412 outputs more detailed information. All show and error output are in
413 YAML format. Below are examples of both non-detailed and detailed
414 route show output.</para>
415 <screen>#Non-detailed output
419 gateway: 192.168.205.130@tcp1
422 > lnetctl route show --verbose
425 gateway: 192.168.205.130@tcp1
432 <primary>LNet</primary>
433 <secondary>cli</secondary>
434 </indexterm>Enabling and Disabling Routing</title>
435 <para>When an LNet node is configured as a router it will route LNet
436 messages not destined to itself. This feature can be enabled or
437 disabled as follows.</para>
438 <screen>lnetctl set routing [0 | 1]
439 # 0 - disable routing feature
440 # 1 - enable routing feature</screen>
444 <primary>LNet</primary>
445 <secondary>cli</secondary>
446 </indexterm>Showing routing information</title>
447 <para>When routing is enabled on a node, the tiny, small and large
448 routing buffers are allocated. See <xref
449 linkend="dbdoclet.tuning_lnet_params"/> for more details on router
450 buffers. This information can be shown as follows:</para>
451 <screen>lnetctl routing show: show routing information
454 lnetctl routing show</screen>
455 <para>An example of the show output:</para>
456 <screen>> lnetctl routing show
478 <primary>LNet</primary>
479 <secondary>cli</secondary>
480 </indexterm>Configuring Routing Buffers</title>
481 <para> The routing buffers values configured specify the number of
482 buffers in each of the tiny, small and large groups.</para>
483 <para>It is often desirable to configure the tiny, small and large
484 routing buffers to some values other than the default. These values
485 are global values, when set they are used by all configured CPU
486 partitions. If routing is enabled then the values set take effect
487 immediately. If a larger number of buffers is specified, then
488 buffers are allocated to satisfy the configuration change. If fewer
489 buffers are configured then the excess buffers are freed as they
490 become unused. If routing is not set the values are not changed.
491 The buffer values are reset to default if routing is turned off and
493 <para>The <literal>lnetctl</literal> 'set' command can be
494 used to set these buffer values. A VALUE greater than 0
495 will set the number of buffers accordingly. A VALUE of 0
496 will reset the number of buffers to system defaults.</para>
497 <screen>set tiny_buffers:
498 set tiny routing buffers
499 VALUE must be greater than or equal to 0
501 set small_buffers: set small routing buffers
502 VALUE must be greater than or equal to 0
504 set large_buffers: set large routing buffers
505 VALUE must be greater than or equal to 0</screen>
506 <para>Usage examples:</para>
507 <screen>> lnetctl set tiny_buffers 4096
508 > lnetctl set small_buffers 8192
509 > lnetctl set large_buffers 2048</screen>
510 <para>The buffers can be set back to the default values as follows:</para>
511 <screen>> lnetctl set tiny_buffers 0
512 > lnetctl set small_buffers 0
513 > lnetctl set large_buffers 0</screen>
515 <section condition='l2D' xml:id="lnet_config.asym_route">
517 <primary>LNet</primary>
518 <secondary>cli</secondary>
519 <tertiary>asymmetrical route</tertiary>
520 </indexterm>Asymmetrical Routes</title>
521 <section xml:id="lnet_config.asym_route.overview">
522 <title>Overview</title>
523 <para>An asymmetrical route is when a message from a remote peer is
524 coming through a router that is not known by this node
525 to reach the remote peer.</para>
526 <para>Asymmetrical routes can be an issue when debugging network, and
527 allowing them also opens the door to attacks where hostile clients
528 inject data to the servers.</para>
529 <para>So it is possible to activate a check in LNet, that will detect
530 any asymmetrical route message and drop it.</para>
532 <section xml:id="lnet_config.dynamic_discovery.configuration">
533 <title>Configuration</title>
534 <para>In order to switch asymmetric route detection on or off, the
535 following command is used:</para>
536 <screen>lnetctl set drop_asym_route [0 | 1]</screen>
537 <para>This command works on a per-node basis. This means each node in a
538 Lustre cluster can decide whether it accepts asymmetrical route
540 <para>To check the current <literal>drop_asym_route</literal> setting, the
541 <literal>lnetctl global show</literal> command can be used as shown in
542 <xref linkend="lnet_config.show_global_settings"/>.</para>
543 <para>By default, asymmetric route detection is off.</para>
548 <primary>LNet</primary>
549 <secondary>cli</secondary>
550 </indexterm>Importing YAML Configuration File</title>
551 <para>Configuration can be described in YAML format and can be fed
552 into the <literal>lnetctl</literal> utility. The
553 <literal>lnetctl</literal> utility parses the YAML file and performs
554 the specified operation on all entities described there in. If no
555 operation is defined in the command as shown below, the default
556 operation is 'add'. The YAML syntax is described in a later
557 section.</para> <screen>lnetctl import FILE.yaml
558 lnetctl import < FILE.yaml</screen>
559 <para>The '<literal>lnetctl</literal> import' command provides three
560 optional parameters to define the operation to be performed on the
561 configuration items described in the YAML file.</para>
562 <screen># if no options are given to the command the "add" command is assumed
564 lnetctl import --add FILE.yaml
565 lnetctl import --add < FILE.yaml
567 # to delete all items described in the YAML file
568 lnetctl import --del FILE.yaml
569 lnetctl import --del < FILE.yaml
571 # to show all items described in the YAML file
572 lnetctl import --show FILE.yaml
573 lnetctl import --show < FILE.yaml</screen>
577 <primary>LNet</primary>
578 <secondary>cli</secondary>
579 </indexterm>Exporting Configuration in YAML format</title>
580 <para><literal>lnetctl</literal> utility provides the 'export'
581 command to dump current LNet configuration in YAML format </para>
582 <screen>lnetctl export FILE.yaml
583 lnetctl export > FILE.yaml</screen>
587 <primary>LNet</primary>
588 <secondary>cli</secondary>
589 </indexterm>Showing LNet Traffic Statistics</title>
590 <para><literal>lnetctl</literal> utility can dump the LNet traffic
591 statistiscs as follows</para>
592 <screen>lnetctl stats show</screen>
596 <primary>LNet</primary>
597 <secondary>yaml syntax</secondary>
598 </indexterm>YAML Syntax</title>
599 <para>The <literal>lnetctl</literal> utility can take in a YAML file
600 describing the configuration items that need to be operated on and
601 perform one of the following operations: add, delete or show on the
602 items described there in.</para>
603 <para>Net, routing and route YAML blocks are all defined as a YAML
604 sequence, as shown in the following sections. The stats YAML block
605 is a YAML object. Each sequence item can take a seq_no field. This
606 seq_no field is returned in the error block. This allows the caller
607 to associate the error with the item that caused the error. The
608 <literal>lnetctl</literal> utilty does a best effort at configuring
609 items defined in the YAML file. It does not stop processing the file
610 at the first error.</para>
611 <para>Below is the YAML syntax describing the various
612 configuration elements which can be operated on via DLC. Not all
613 YAML elements are required for all operations (add/delete/show).
614 The system ignores elements which are not pertinent to the requested
618 <primary>LNet</primary>
619 <secondary>network yaml syntax</secondary>
620 </indexterm>Network Configuration</title>
623 - net: <network. Ex: tcp or o2ib>
625 0: <physical interface>
626 detail: <This is only applicable for show command. 1 - output detailed info. 0 - basic output>
628 peer_timeout: <Integer. Timeout before consider a peer dead>
629 peer_credits: <Integer. Transmit credits for a peer>
630 peer_buffer_credits: <Integer. Credits available for receiving messages>
631 credits: <Integer. Network Interface credits>
632 SMP: <An array of integers of the form: "[x,y,...]", where each
633 integer represents the CPT to associate the network interface
634 with> seq_no: <integer. Optional. User generated, and is
635 passed back in the YAML error block></screen>
636 <para>Both seq_no and detail fields do not appear in the show output.
641 <primary>LNet</primary>
642 <secondary>buffer yaml syntax</secondary>
643 </indexterm>Enable Routing and Adjust Router Buffer Configuration
647 - tiny: <Integer. Tiny buffers>
648 small: <Integer. Small buffers>
649 large: <Integer. Large buffers>
650 enable: <0 - disable routing. 1 - enable routing>
651 seq_no: <Integer. Optional. User generated, and is passed back in the YAML error block></screen>
652 <para>The seq_no field does not appear in the show output</para>
656 <primary>LNet</primary>
657 <secondary>statistics yaml syntax</secondary>
658 </indexterm>Show Statistics</title>
661 seq_no: <Integer. Optional. User generated, and is passed back in the YAML error block></screen>
662 <para>The seq_no field does not appear in the show output</para>
666 <primary>LNet</primary>
667 <secondary>router yaml syntax</secondary>
668 </indexterm>Route Configuration</title>
671 - net: <network. Ex: tcp or o2ib>
672 gateway: <nid of the gateway in the form <ip>@<net>: Ex: 192.168.29.1@tcp>
673 hop: <an integer between 1 and 255. Optional>
674 detail: <This is only applicable for show commands. 1 - output detailed info. 0. basic output>
675 seq_no: <integer. Optional. User generated, and is passed back in the YAML error block></screen>
676 <para>Both seq_no and detail fields do not appear in the show output.
681 <section xml:id="lnet_module_params">
682 <title><indexterm><primary>LNet</primary></indexterm>
683 Overview of LNet Module Parameters</title>
684 <para>LNet kernel module (lnet) parameters specify how LNet is to be
685 configured to work with Lustre, including which NICs will be
686 configured to work with Lustre and the routing to be used with
688 <para>Parameters for LNet can be specified in the
689 <literal>/etc/modprobe.d/lustre.conf</literal> file. In some cases
690 the parameters may have been stored in
691 <literal>/etc/modprobe.conf</literal>, but this has been deprecated
692 since before RHEL5 and SLES10, and having a separate
693 <literal>/etc/modprobe.d/lustre.conf</literal> file simplifies
694 administration and distribution of the Lustre networking
695 configuration. This file contains one or more entries with the
697 <screen>options lnet <replaceable>parameter</replaceable>=<replaceable>value</replaceable></screen>
698 <para>To specify the network interfaces that are to be used for
699 Lustre, set either the <literal>networks</literal> parameter or the
700 <literal>ip2nets</literal> parameter (only one of these parameters can
701 be used at a time):</para>
704 <para><literal>networks</literal> - Specifies the networks to be used.
708 <para><literal>ip2nets</literal> - Lists globally-available
709 networks, each with a range of IP addresses. LNet then identifies
710 locally-available networks through address list-matching
714 <para>See <xref linkend="lnet_module_network_params"/> and
715 <xref linkend="lnet_ip2nets"/> for more details.</para>
716 <para>To set up routing between networks, use:</para>
719 <para><literal>routes</literal> - Lists networks and the NIDs of
720 routers that forward to them.</para>
723 <para>See <xref linkend="lnet_module_routes"/> for more details.</para>
724 <para>A <literal>router</literal> checker can be configured to enable
725 Lustre nodes to detect router health status, avoid routers that appear
726 dead, and reuse those that restore service after failures. See <xref
727 linkend="lnet_router_checker"/> for more details.</para>
728 <para>For a complete reference to the LNet module parameters, see
729 <emphasis><xref linkend="configurationfilesmoduleparameters"/>LNet
730 Options</emphasis>.</para>
732 <para>We recommend that you use 'dotted-quad' notation for
733 IP addresses rather than host names to make it easier to read debug
734 logs and debug configurations with multiple interfaces.</para>
737 <title><indexterm><primary>LNet</primary><secondary>using
738 NID</secondary></indexterm>Using a Lustre Network Identifier (NID)
739 to Identify a Node</title>
740 <para>A Lustre network identifier (NID) is used to uniquely identify
741 a Lustre network endpoint by node ID and network type. The format of
743 <screen><replaceable>network_id</replaceable>@<replaceable>network_type</replaceable></screen>
744 <para>Examples are:</para>
745 <screen>10.67.73.200@tcp0
746 10.67.75.100@o2ib</screen>
747 <para>The first entry above identifies a TCP/IP node, while the
748 second entry identifies an InfiniBand node.</para>
749 <para>When a mount command is run on a client, the client uses the
750 NID of the MDS to retrieve configuration information. If an MDS has
751 more than one NID, the client should use the appropriate NID for its
752 local network.</para>
753 <para>To determine the appropriate NID to specify in
754 the mount command, use the <literal>lctl</literal> command. To
755 display MDS NIDs, run on the MDS :</para>
756 <screen>lctl list_nids</screen>
757 <para>To determine if a client can reach the MDS using a particular NID,
758 run on the client:</para>
759 <screen>lctl which_nid <replaceable>MDS_NID</replaceable></screen>
762 <section xml:id="lnet_module_network_params">
763 <title><indexterm><primary>LNet</primary>
764 <secondary>module parameters</secondary>
765 </indexterm>Setting the LNet Module networks Parameter</title>
766 <para>If a node has more than one network interface, you'll
767 typically want to dedicate a specific interface to Lustre. You can do
768 this by including an entry in the <literal>lustre.conf</literal> file
769 on the node that sets the LNet module <literal>networks</literal>
771 <screen>options lnet networks=<replaceable>comma-separated list of
772 networks</replaceable></screen>
773 <para>This example specifies that a Lustre node will use a TCP/IP
774 interface and an InfiniBand interface:</para>
775 <screen>options lnet networks=tcp0(eth0),o2ib(ib0)</screen>
776 <para>This example specifies that the Lustre node will use the TCP/IP
777 interface <literal>eth1</literal>:</para>
778 <screen>options lnet networks=tcp0(eth1)</screen>
779 <para>Depending on the network design, it may be necessary to specify
780 explicit interfaces. To explicitly specify that interface
781 <literal>eth2</literal> be used for network <literal>tcp0</literal>
782 and <literal>eth3</literal> be used for <literal>tcp1</literal> , use
784 <screen>options lnet networks=tcp0(eth2),tcp1(eth3)</screen>
785 <para>When more than one interface is available during the network
786 setup, Lustre chooses the best route based on the hop count. Once the
787 network connection is established, Lustre expects the network to stay
788 connected. In a Lustre network, connections do not fail over to
789 another interface, even if multiple interfaces are available on the
792 <para>LNet lines in <literal>lustre.conf</literal> are only used by
793 the local node to determine what to call its interfaces. They are
794 not used for routing decisions.</para>
797 <title><indexterm><primary>configuring</primary>
798 <secondary>multihome</secondary></indexterm>Multihome Server Example
800 <para>If a server with multiple IP addresses (multihome server) is
801 connected to a Lustre network, certain configuration setting are
802 required. An example illustrating these setting consists of a
803 network with the following nodes:</para>
806 <para> Server svr1 with three TCP NICs (<literal>eth0</literal>,
807 <literal>eth1</literal>, and <literal>eth2</literal>) and an
808 InfiniBand NIC.</para>
811 <para> Server svr2 with three TCP NICs (<literal>eth0</literal>,
812 <literal>eth1</literal>, and <literal>eth2</literal>) and an
813 InfiniBand NIC. Interface eth2 will not be used for Lustre
817 <para> TCP clients, each with a single TCP interface.</para>
820 <para> InfiniBand clients, each with a single Infiniband
821 interface and a TCP/IP interface for administration.</para>
824 <para>To set the <literal>networks</literal> option for this example:
828 <para> On each server, <literal>svr1</literal> and
829 <literal>svr2</literal>, include the following line in the
830 <literal>lustre.conf</literal> file:</para>
833 <screen>options lnet networks=tcp0(eth0),tcp1(eth1),o2ib</screen>
836 <para> For TCP-only clients, the first available non-loopback IP
837 interface is used for <literal>tcp0</literal>. Thus, TCP clients
838 with only one interface do not need to have options defined in
839 the <literal>lustre.conf</literal> file.</para>
842 <para> On the InfiniBand clients, include the following line in
843 the <literal>lustre.conf</literal> file:</para>
846 <screen>options lnet networks=o2ib</screen>
848 <para>By default, Lustre ignores the loopback
849 (<literal>lo0</literal>) interface. Lustre does not ignore IP
850 addresses aliased to the loopback. If you alias IP addresses to
851 the loopback interface, you must specify all Lustre networks using
852 the LNet networks parameter.</para>
855 <para>If the server has multiple interfaces on the same subnet,
856 the Linux kernel will send all traffic using the first configured
857 interface. This is a limitation of Linux, not Lustre. In this
858 case, network interface bonding should be used. For more
859 information about network interface bonding, see <xref
860 linkend="settingupbonding"/>.</para>
864 <section xml:id="lnet_ip2nets">
865 <title><indexterm><primary>LNet</primary>
866 <secondary>ip2nets</secondary>
867 </indexterm>Setting the LNet Module ip2nets Parameter</title>
868 <para>The <literal>ip2nets</literal> option is typically used when a
869 single, universal <literal>lustre.conf</literal> file is run on all
870 servers and clients. Each node identifies the locally available
871 networks based on the listed IP address patterns that match the
872 node's local IP addresses.</para>
873 <para>Note that the IP address patterns listed in the
874 <literal>ip2nets</literal> option are <emphasis>only</emphasis> used
875 to identify the networks that an individual node should instantiate.
876 They are <emphasis>not</emphasis> used by LNet for any other
877 communications purpose.</para>
878 <para>For the example below, the nodes in the network have these IP
882 <para> Server svr1: <literal>eth0</literal> IP address
883 <literal>192.168.0.2</literal>, IP over Infiniband
884 (<literal>o2ib</literal>) address
885 <literal>132.6.1.2</literal>.</para>
888 <para> Server svr2: <literal>eth0</literal> IP address
889 <literal>192.168.0.4</literal>, IP over Infiniband
890 (<literal>o2ib</literal>) address
891 <literal>132.6.1.4</literal>.</para>
894 <para> TCP clients have IP addresses
895 <literal>192.168.0.5-255.</literal></para>
898 <para> Infiniband clients have IP over Infiniband
899 (<literal>o2ib</literal>) addresses <literal>132.6.[2-3].2, .4,
900 .6, .8</literal>.</para>
903 <para>The following entry is placed in the
904 <literal>lustre.conf</literal> file on each server and client:</para>
905 <screen>options lnet 'ip2nets="tcp0(eth0) 192.168.0.[2,4]; \
906 tcp0 192.168.0.*; o2ib0 132.6.[1-3].[2-8/2]"'</screen>
907 <para>Each entry in <literal>ip2nets</literal> is referred to as a
908 'rule'.</para>
909 <para>The order of LNet entries is important when configuring servers.
910 If a server node can be reached using more than one network, the first
911 network specified in <literal>lustre.conf</literal> will be
913 <para>Because <literal>svr1</literal> and <literal>svr2</literal>
914 match the first rule, LNet uses <literal>eth0</literal> for
915 <literal>tcp0</literal> on those machines. (Although
916 <literal>svr1</literal> and <literal>svr2</literal> also match the
917 second rule, the first matching rule for a particular network is
919 <para>The <literal>[2-8/2]</literal> format indicates a range of 2-8
920 stepped by 2; that is 2,4,6,8. Thus, the clients at
921 <literal>132.6.3.5</literal> will not find a matching o2ib
923 <note condition='l2A'>
924 <para>Multi-rail deprecates the kernel parsing of ip2nets. ip2nets
925 patterns are matched in user space and translated into Network
926 interfaces to be added into the system.</para>
927 <para>The first interface that matches the IP pattern will be used when
928 adding a network interface.</para>
929 <para>If an interface is explicitly specified as well as a pattern, the
930 interface matched using the IP pattern will be sanitized against the
931 explicitly-defined interface.</para>
932 <para>For example, <literal>tcp(eth0) 192.168.*.3</literal> and there
933 exists in the system <literal>eth0 == 192.158.19.3</literal> and
934 <literal>eth1 == 192.168.3.3</literal>, then the configuration will
935 fail, because the pattern contradicts the interface specified.
937 <para>A clear warning will be displayed if inconsistent configuration is
939 <para>You could use the following command to configure ip2nets:</para>
940 <screen>lnetctl import < ip2nets.yaml</screen>
941 <para>For example:</para>
954 0: 192.168.*.*</screen>
957 <section xml:id="lnet_module_routes">
958 <title><indexterm><primary>LNet</primary>
959 <secondary>routes</secondary></indexterm>Setting the LNet Module routes
961 <para>The LNet module routes parameter is used to identify routers in
962 a Lustre configuration. These parameters are set in
963 <literal>modprobe.conf</literal> on each Lustre node. </para>
964 <para>Routes are typically set to connect to segregated subnetworks
965 or to cross connect two different types of networks such as tcp and
967 <para>The LNet routes parameter specifies a colon-separated list of
968 router definitions. Each route is defined as a network number,
969 followed by a list of routers:</para>
970 <screen>routes=<replaceable>net_type router_NID(s)</replaceable></screen>
971 <para>This example specifies bi-directional routing in which TCP
972 clients can reach Lustre resources on the IB networks and IB servers
973 can access the TCP networks:</para>
974 <screen>options lnet 'ip2nets="tcp0 192.168.0.*; \
975 o2ib0(ib0) 132.6.1.[1-128]"' 'routes="tcp0 132.6.1.[1-8]@o2ib0; \
976 o2ib0 192.16.8.0.[1-8]@tcp0"'</screen>
977 <para>All LNet routers that bridge two networks are equivalent. They
978 are not configured as primary or secondary, and the load is balanced
979 across all available routers.</para>
980 <para>The number of LNet routers is not limited. Enough routers should
981 be used to handle the required file serving bandwidth plus a 25
982 percent margin for headroom.</para>
984 <title><indexterm><primary>LNet</primary><secondary>routing
985 example</secondary></indexterm>Routing Example</title>
986 <para>On the clients, place the following entry in the
987 <literal>lustre.conf</literal> file</para>
988 <screen>lnet networks="tcp" routes="o2ib0 192.168.0.[1-8]@tcp0"</screen>
989 <para>On the router nodes, use:</para>
990 <screen>lnet networks="tcp o2ib" forwarding=enabled </screen>
991 <para>On the MDS, use the reverse as shown below:</para>
992 <screen>lnet networks="o2ib0" routes="tcp0 132.6.1.[1-8]@o2ib0" </screen>
993 <para>To start the routers, run:</para>
994 <screen>modprobe lnet
995 lctl network configure</screen>
998 <section xml:id="lnet_config_testing">
999 <title><indexterm><primary>LNet</primary>
1000 <secondary>testing</secondary></indexterm>Testing the LNet
1001 Configuration</title>
1002 <para>After configuring Lustre Networking, it is highly recommended
1003 that you test your LNet configuration using the LNet Self-Test
1004 provided with the Lustre software. For more information about using
1005 LNet Self-Test, see <xref linkend="lnetselftest"/>.</para>
1007 <section xml:id="lnet_router_checker">
1008 <title><indexterm><primary>LNet</primary>
1009 <secondary>route checker</secondary>
1010 </indexterm>Configuring the Router Checker</title>
1011 <para>In a Lustre configuration in which different types of networks,
1012 such as a TCP/IP network and an Infiniband network, are connected by
1013 routers, a router checker can be run on the clients and servers in the
1014 routed configuration to monitor the status of the routers. In a
1015 multi-hop routing configuration, router checkers can be configured on
1016 routers to monitor the health of their next-hop routers.</para>
1017 <para>A router checker is configured by setting LNet parameters in
1018 <literal>lustre.conf</literal> by including an entry in this
1020 <screen>options lnet
1021 <replaceable>router_checker_parameter</replaceable>=<replaceable>value</replaceable></screen>
1022 <para>The router checker parameters are:</para>
1025 <para><literal>live_router_check_interval</literal> - Specifies a
1026 time interval in seconds after which the router checker will ping
1027 the live routers. The default value is 0, meaning no checking is
1028 done. To set the value to 60, enter:</para>
1029 <screen>options lnet live_router_check_interval=60</screen>
1032 <para><literal>dead_router_check_interval</literal> - Specifies a
1033 time interval in seconds after which the router checker will check
1034 for dead routers. The default value is 0, meaning no checking is
1035 done. To set the value to 60, enter:</para>
1036 <screen>options lnet dead_router_check_interval=60</screen>
1039 <para>auto_down - Enables/disables (1/0) the automatic marking of
1040 router state as up or down. The default value is 1. To disable
1041 router marking, enter:</para>
1042 <screen>options lnet auto_down=0</screen>
1045 <para><literal>router_ping_timeout</literal> - Specifies a
1046 timeout for the router checker when it checks live or dead
1047 routers. The router checker sends a ping message to each dead or
1048 live router once every dead_router_check_interval or
1049 live_router_check_interval respectively. The default value is 50.
1050 To set the value to 60, enter:</para>
1051 <screen>options lnet router_ping_timeout=60</screen>
1053 <para>The <literal>router_ping_timeout</literal> is consistent
1054 with the default LND timeouts. You may have to increase it on very
1055 large clusters if the LND timeout is also increased. For larger
1056 clusters, we suggest increasing the check interval.</para>
1060 <para><literal>check_routers_before_use</literal> - Specifies
1061 that routers are to be checked before use. Set to off by
1062 default. If this parameter is set to on, the
1063 dead_router_check_interval parameter must be given a positive
1064 integer value.</para>
1065 <screen>options lnet check_routers_before_use=on</screen>
1068 <para>The router checker obtains the following information from each router:
1072 <para> Time the router was disabled</para>
1075 <para> Elapsed disable time</para>
1078 <para>If the router checker does not get a reply message from the
1079 router within router_ping_timeout seconds, it considers the router to
1081 <para>If a router is marked 'up' and responds to a ping, the
1082 timeout is reset.</para>
1083 <para>If 100 packets have been sent successfully through a router, the
1084 sent-packets counter for that router will have a value of 100.</para>
1086 <section xml:id="lnet_best_practices">
1087 <title><indexterm><primary>LNet</primary>
1088 <secondary>best practice</secondary>
1089 </indexterm>Best Practices for LNet Options</title>
1090 <para>For the <literal>networks</literal>, <literal>ip2nets</literal>,
1091 and <literal>routes</literal> options, follow these best practices to
1092 avoid configuration errors.</para>
1093 <section xml:id="lnet_best_practices.escape_comments">
1094 <title><indexterm><primary>LNet</primary>
1095 <secondary>escaping commas with quotes</secondary>
1096 </indexterm>Escaping commas with quotes</title>
1097 <para>Depending on the Linux distribution, commas may need to be
1098 escaped using single or double quotes. In the extreme case, the
1099 <literal>options</literal> entry would look like this:</para>
1100 <para><screen>options
1101 lnet'networks="tcp0,elan0"'
1102 'routes="tcp [2,10]@elan0"'</screen></para>
1103 <para>Added quotes may confuse some distributions. Messages such as
1104 the following may indicate an issue related to added quotes:</para>
1105 <para><screen>lnet: Unknown parameter 'networks'</screen></para>
1106 <para>A <literal>'Refusing connection - no matching
1107 NID'</literal> message generally points to an error in the LNet
1108 module configuration.</para>
1110 <section xml:id="lnet_best_practices.comments">
1111 <title><indexterm><primary>LNet</primary>
1112 <secondary>comments</secondary></indexterm>Including comments</title>
1113 <para><emphasis>Place the semicolon terminating a comment
1114 immediately after the comment.</emphasis> LNet silently ignores
1115 everything between the <literal>#</literal> character at the
1116 beginning of the comment and the next semicolon.</para>
1117 <para>In this <emphasis>incorrect</emphasis> example, LNet silently
1118 ignores <literal>pt11 192.168.0.[92,96]</literal>, resulting in
1119 these nodes not being properly initialized. No error message is
1121 <screen>options lnet ip2nets="pt10 192.168.0.[89,93]; # comment
1122 with semicolon BEFORE comment \ pt11 192.168.0.[92,96];</screen>
1123 <para>This <emphasis role="italic">correct</emphasis> example shows
1124 the required syntax: </para>
1125 <para><screen>options lnet ip2nets="pt10 192.168.0.[89,93] \
1126 # comment with semicolon AFTER comment; \
1127 pt11 192.168.0.[92,96] # comment</screen></para>
1128 <para><emphasis role="italic">Do not add an excessive number of
1129 comments.</emphasis> The Linux kernel limits the length of character
1130 strings used in module options (usually to 1KB, but this may differ
1131 between vendor kernels). If you exceed this limit, errors result and
1132 the specified configuration may not be processed correctly.</para>
1137 vim:expandtab:shiftwidth=2:tabstop=8: