1 <?xml version='1.0' encoding='UTF-8'?>
2 <chapter xmlns="http://docbook.org/ns/docbook"
3 xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US"
4 xml:id="configurationfilesmoduleparameters">
5 <title xml:id="configurationfilesmoduleparameters.title">Configuration Files and Module Parameters</title>
6 <para>This section describes configuration files and module parameters and includes the following sections:</para>
9 <para><xref linkend="tuning_lnet_mod_params"/></para>
12 <para><xref linkend="module_options"/></para>
15 <section xml:id="tuning_lnet_mod_params">
17 <indexterm><primary>configuring</primary></indexterm>
18 <indexterm><primary>LNet</primary><see>configuring</see></indexterm>
22 <para>LNet network hardware and routing are now configured via module parameters. Parameters should be specified in the <literal>/etc/modprobe.d/lustre.conf</literal>file, for example:</para>
23 <screen>options lnet networks=tcp0(eth2)</screen>
24 <para>The above option specifies that this node should use the TCP protocol on the eth2 network interface.</para>
25 <para>Module parameters are read when the module is first loaded. Type-specific LND modules (for instance, <literal>ksocklnd</literal>) are loaded automatically by the LNet module when LNet starts (typically upon <literal>modprobe ptlrpc</literal>).</para>
26 <para>LNet configuration parameters can be viewed under <literal>/sys/module/lnet/parameters/</literal>, and LND-specific parameters under the name of the corresponding LND, for example <literal>/sys/module/ksocklnd/parameters/</literal> for the socklnd (TCP) LND.</para>
27 <para>For the following parameters, default option settings are shown in parenthesis. Changes to parameters marked with a W affect running systems. Unmarked parameters can only be set when LNet loads for the first time. Changes to parameters marked with <literal>Wc</literal> only have effect when connections are established (existing connections are not affected by these changes.)</para>
29 <section xml:id="module_options">
31 <indexterm><primary>configuring</primary><secondary>module options</secondary></indexterm>
33 Module Options</title>
36 <para>With routed or other multi-network configurations, use <literal>ip2nets</literal> rather than networks, so all nodes can use the same configuration.</para>
39 <para>For a routed network, use the same 'routes' configuration everywhere. Nodes specified as routers automatically enable forwarding and any routes that are not relevant to a particular node are ignored. Keep a common configuration to guarantee that all nodes have consistent routing tables.</para>
42 <para>A separate <literal>lustre.conf</literal> file makes distributing the configuration much easier.</para>
45 <para>If you set <literal>config_on_load=1</literal>, LNet starts at
46 <literal>modprobe</literal> time rather than waiting for the Lustre file system to
47 start. This ensures routers start working at module load time.</para>
51 # lctl> net down</screen>
54 <para>Remember the <literal>lctl ping {nid}</literal> command - it is a handy way to check your LNet configuration.</para>
58 <title><indexterm><primary>configuring</primary><secondary>LNet options</secondary></indexterm>
60 <para>This section describes LNet options.</para>
62 <title><indexterm><primary>configuring</primary><secondary>network topology</secondary></indexterm>
63 Network Topology</title>
64 <para>Network topology module parameters determine which networks a node should join, whether it should route between these networks, and how it communicates with non-local networks.</para>
65 <para>Here is a list of various networks and the supported software stacks:</para>
66 <informaltable frame="all">
68 <colspec colname="c1" colwidth="50*"/>
69 <colspec colname="c2" colwidth="50*"/>
73 <para><emphasis role="bold">Network</emphasis></para>
76 <para><emphasis role="bold">Software Stack</emphasis></para>
86 <para> OFED Version 2</para>
93 <para>The Lustre software ignores the loopback interface (<literal>lo0</literal>), but the
94 Lustre file system uses any IP addresses aliased to the loopback (by default). When in
95 doubt, explicitly specify networks.</para>
99 <title><indexterm><primary>configuring</primary>
100 <secondary>network</secondary><tertiary>ip2nets</tertiary></indexterm>
101 ip2nets ("tcp")</title>
102 <para><literal>ip2nets</literal> is a string that lists globally
103 available networks, each with a set of IP address ranges. LNet
104 determines the locally-available networks from this list by matching
105 the IP address ranges with the local IPs of a node. Its purpose is
106 to allow the same <literal>modules.conf</literal> file to be used
107 across a variety of nodes on different networks. The string has the
108 following syntax.</para>
109 <screen><ip2nets> :== <net-match> [ <comment> ] { <net-sep> <net-match> }
110 <net-match> :== [ <w> ] <net-spec> <w> <ip-range> { <w> <ip-range> }
112 <net-spec> :== <network> [ "(" <interface-list> ")" ]
113 <network> :== <nettype> [ <number> ]
114 <nettype> :== "tcp" | "elan" | "o2ib" | ...
115 <iface-list> :== <interface> [ "," <iface-list> ]
116 <ip-range> :== <r-expr> "." <r-expr> "." <r-expr> "." <r-expr>
117 <r-expr> :== <number> | "*" | "[" <r-list> "]"
118 <r-list> :== <range> [ "," <r-list> ]
119 <range> :== <number> [ "-" <number> [ "/" <number> ] ]
120 <comment :== "#" { <non-net-sep-chars> }
121 <net-sep> :== ";" | "\n"
122 <w> :== <whitespace-chars> { <whitespace-chars> }
124 <para><literal><net-spec></literal> contains enough information to
125 uniquely identify the network and load an appropriate LND. The LND
126 determines the missing "address-within-network" part of the
127 NID based on the interfaces it can use.</para>
128 <para><literal><iface-list></literal> specifies which hardware
129 interface the network can use. If omitted, all interfaces are used. LNDs
130 that do not support the <literal><iface-list></literal> syntax
131 cannot be configured to use particular interfaces and just use what is
132 there. Only a single instance of these LNDs can exist on a node at any
133 time, and <literal><iface-list></literal> must be omitted.</para>
134 <para><literal><net-match></literal> entries are scanned in the
135 order declared to see if one of the node's IP addresses matches one
136 of the <literal><ip-range></literal> expressions. If there is a
137 match, <literal><net-spec></literal> specifies the network to
138 instantiate. Note that it is the first match for a particular network
139 that counts. This can be used to simplify the match expression for the
140 general case by placing it after the special cases. For example:</para>
141 <screen>ip2nets="tcp(eth1,eth2) 134.32.1.[4-10/2]; tcp(eth1) *.*.*.*"</screen>
142 <para>4 nodes on the 134.32.1.* network have 2 interfaces
143 (134.32.1.{4,6,8,10}) but all the rest have 1.</para>
144 <screen>ip2nets="<emphasis role="bold">o2ib</emphasis> 192.168.0.*; tcp(eth2) 192.168.0.[1,7,4,12]" </screen>
145 <para>This describes an IB cluster on 192.168.0.*. Four of these nodes
146 also have IP interfaces; these four could be used as routers.</para>
147 <para>Note that match-all expressions (For instance,
148 <literal>*.*.*.*</literal>) effectively mask all other</para>
149 <para> <literal><net-match></literal> entries specified after
150 them. They should be used with caution.</para>
151 <para>Here is a more complicated situation, the route parameter is
152 explained below. We have:</para>
155 <para>Two TCP subnets</para>
158 <para>One Elan subnet</para>
161 <para>One machine set up as a router, with both TCP and Elan
165 <para>IP over Elan configured, but only IP will be used to label the
169 <screen>options lnet ip2nets=â€tcp 198.129.135.* 192.128.88.98; \
170 elan 198.128.88.98 198.129.135.3; \
171 routes='cp 1022@elan # Elan NID of router; \
172 elan 198.128.88.98@tcp # TCP NID of router '</screen>
175 <title><indexterm><primary>configuring</primary>
176 <secondary>network</secondary><tertiary>tcp</tertiary></indexterm>
177 networks ("tcp")</title>
178 <para>This is an alternative to "<literal>ip2nets</literal>"
179 which can be used to specify the networks to be instantiated explicitly.
180 The syntax is a simple comma separated list of
181 <literal><net-spec></literal>s (see above). The default is only
182 used if neither 'ip2nets' nor 'networks' is
186 <title><indexterm><primary>configuring</primary><secondary>network</secondary><tertiary>routes</tertiary></indexterm>
187 routes ("")</title>
188 <para>This is a string that lists networks and the NIDs of routers that forward to them.</para>
189 <para>It has the following syntax (<literal><w></literal> is one or more whitespace characters):</para>
190 <screen><routes> :== <route>{ ; <route> }
191 <route> :== [<net>[<w><hopcount>]<w><nid>[:<priority>]{<w><nid>[:<priority>]}</screen>
192 <para>Note: the priority parameter was added in release 2.5.</para>
193 <para>So a node on the network <literal>tcp1</literal> that needs to go through a router to get to the Elan network:</para>
194 <screen>options lnet networks=tcp1 routes="elan 1 192.168.2.2@tcpA"</screen>
195 <para>The hopcount and priority numbers are used to help choose the best path between multiply-routed configurations.</para>
196 <para>A simple but powerful expansion syntax is provided, both for target networks and router NIDs as follows.</para>
197 <screen><expansion> :== "[" <entry> { "," <entry> } "]"
198 <entry> :== <numeric range> | <non-numeric item>
199 <numeric range> :== <number> [ "-" <number> [ "/" <number> ] ]</screen>
200 <para>The expansion is a list enclosed in square brackets. Numeric
201 items in the list may be a single number, a contiguous range of numbers,
202 or a strided range of numbers. For example, <literal>routes="elan
203 192.168.1.[22-24]@tcp"</literal> says that network
204 <literal>elan0</literal> may be adjacent or behind another network
205 (hopcount is undefined); and is accessible via 3 routers on the
206 <literal>tcp0</literal> network (<literal>192.168.1.22@tcp</literal>,
207 <literal>192.168.1.23@tcp</literal> and
208 <literal>192.168.1.24@tcp</literal>).</para>
209 <para><literal>routes="[tcp,o2ib] 2 [8-14/2]@elan"</literal>
210 says that 2 networks (<literal>tcp0</literal> and <literal>o2ib0</literal>) are accessible through 4 routers (<literal>8@elan</literal>, <literal>10@elan</literal>, <literal>12@elan</literal> and <literal>14@elan</literal>). The hopcount of 2 means that traffic to both these networks will be traversed 2 routers - first one of the routers specified in this entry, then one more.</para>
211 <para>Duplicate entries, entries that route to a local network, and entries that specify routers on a non-local network are ignored.</para>
212 <para>Prior to release 2.5, a conflict between equivalent entries was
213 resolved in favor of the route with the shorter hopcount. The hopcount,
214 if omitted, is undefined, but is treated as 1 when being compared to
215 other routes during selection (as if the remote network is adjacent).
217 <para condition='l25'>Since 2.5, equivalent entries are resolved in
218 favor of the route with the lowest priority number or shorter hopcount
219 if the priorities are equal. The priority, if omitted, defaults to 0.
220 The hopcount, if omitted, is undefined, but is treated as 1 when being
221 compared to other routes during selection (as if the remote network is
223 <para>It is an error to specify routes to the same destination with
224 routers on different local networks.</para>
225 <para>If a route string contains no hop count, then the hop count is
226 undefined. Explicitly setting the hop count to 1 is recommended if the
227 remote network is adjacent and
228 <literal>avoid_asym_router_failure</literal> is enabled
229 to ensure proper operation of the feature.</para>
232 <title><indexterm><primary>configuring</primary>
233 <secondary>network</secondary>
234 <tertiary>forwarding</tertiary></indexterm>
235 forwarding ("")</title>
236 <para>This is a string that can be set either to
237 "<literal>enabled</literal>" or
238 "<literal>disabled</literal>" for explicit control of whether
239 this node should act as a router, forwarding communications between all
240 local networks.</para>
241 <para>A standalone router can be started by simply starting LNet
242 ('<literal>modprobe ptlrpc</literal>') with appropriate
243 network topology options.</para>
246 <title><indexterm><primary>configuring</primary>
247 <secondary>network</secondary>
248 <tertiary>accept</tertiary></indexterm>accept (secure)</title>
249 <para>The acceptor is a TCP/IP service that some LNDs use to
250 establish communications. If a local network requires it and it has
251 not been disabled, the acceptor listens on a single port for
252 connection requests that it redirects to the appropriate local
253 network. The acceptor is part of the LNet module and configured
254 by the following options:</para>
255 <informaltable frame="all">
257 <colspec colname="c1" colwidth="50*"/>
258 <colspec colname="c2" colwidth="50*"/>
262 <para><emphasis role="bold">Variable</emphasis></para>
265 <para><emphasis role="bold">Description</emphasis></para>
272 <para><literal>accept</literal></para>
273 <para><literal>(secure)</literal></para>
276 <para>The type of connections that the acceptor will allow
277 from remote nodes.</para>
280 <para><literal>secure</literal> - Accept connections
281 only from reserved TCP ports (below 1023). This is the
282 default, and prevents userspace processes from trying
283 to connect to the server.</para>
286 <para><literal>all</literal> - Accept connections from
287 any TCP port. This may be needed to allow connections
288 on non-privileged ports, for example from a client in a
289 virtual machine running in userspace.</para>
292 <para><literal>none</literal> - Do not run the acceptor.
293 This may prevent the client from receiving server RPCs
294 if the TCP connection is lost and the server needs to
295 contact the client for some reason (e.g. LDLM lock
296 callback or size glimpse).
304 <para> <literal>accept_port</literal></para>
305 <para> <literal>(988)</literal></para>
308 <para>Port number on which the acceptor should listen for
309 connection requests. All nodes in a site configuration that
310 require an acceptor must use the same port.</para>
315 <para> <literal>accept_backlog</literal></para>
316 <para> <literal>(127)</literal></para>
319 <para>Maximum length that the queue of pending connections may
320 grow to (see listen(2)).</para>
325 <para> <literal>accept_timeout</literal></para>
326 <para> <literal>(5, W)</literal></para>
329 <para>Maximum time in seconds the acceptor is allowed to block
330 while communicating with a peer.</para>
335 <para> <literal>accept_proto_version</literal></para>
338 <para>Version of the acceptor protocol that should be used by
339 outgoing connection requests. It defaults to the most recent
340 acceptor protocol version, but it may be set to the previous
341 version to allow the node to initiate connections with nodes
342 that only understand that version of the acceptor protocol.
343 The acceptor can, with some restrictions, handle either
344 version (that is, it can accept connections from both
345 'old' and 'new' peers). For the current
346 version of the acceptor protocol (version 1), the acceptor
347 is compatible with old peers if it is only required by a
348 single local network.</para>
356 <title><indexterm><primary>configuring</primary><secondary>network</secondary><tertiary>rnet_htable_size</tertiary></indexterm>
357 <literal>rnet_htable_size</literal></title>
358 <para><literal>rnet_htable_size</literal> is an integer that indicates how many remote networks the internal LNet hash table is configured to handle. <literal>rnet_htable_size</literal> is used for optimizing the hash table size and does not put a limit on how many remote networks you can have. The default hash table size when this parameter is not specified is: 128.</para>
361 <section remap="h3" xml:id="section_ngq_qhy_zl">
363 <primary>configuring</primary>
364 <secondary>network</secondary>
365 <tertiary>SOCKLND</tertiary>
367 <literal>SOCKLND</literal> Kernel TCP/IP LND</title>
368 <para>The <literal>SOCKLND</literal> kernel TCP/IP LND (<literal>socklnd</literal>) is
369 connection-based and uses the acceptor to establish communications via sockets with its
371 <para>It supports multiple instances and load balances dynamically over multiple interfaces.
372 If no interfaces are specified by the <literal>ip2nets</literal> or networks module
373 parameter, all non-loopback IP interfaces are used. The address-within-network is determined
374 by the address of the first IP interface an instance of the <literal>socklnd</literal>
376 <para>Consider a node on the 'edge' of an InfiniBand network,
377 with a low-bandwidth management Ethernet (<literal>eth0</literal>), IP
378 over IB configured (<literal>ipoib0</literal>), and a pair of GigE NICs
379 (<literal>eth1</literal>,<literal>eth2</literal>) providing off-cluster
380 connectivity. This node should be configured with '
381 <literal>networks=o2ib,tcp(eth1,eth2)</literal>' to ensure that the
382 <literal>socklnd</literal> ignores the management Ethernet and IPoIB.
384 <informaltable frame="all">
386 <colspec colname="c1" colwidth="50*"/>
387 <colspec colname="c2" colwidth="50*"/>
391 <para><emphasis role="bold">Variable</emphasis></para>
394 <para><emphasis role="bold">Description</emphasis></para>
402 <literal>timeout</literal></para>
404 <literal>(50,W)</literal></para>
407 <para>Time (in seconds) that communications may be stalled before the LND completes
408 them with failure.</para>
414 <literal>nconnds</literal></para>
416 <literal>(4)</literal></para>
419 <para>Sets the number of connection daemons.</para>
425 <literal>min_reconnectms</literal></para>
427 <literal>(1000,W)</literal></para>
430 <para>Minimum connection retry interval (in milliseconds). After a failed connection
431 attempt, this is the time that must elapse before the first retry. As connections
432 attempts fail, this time is doubled on each successive retry up to a maximum of
433 '<literal>max_reconnectms</literal>'.</para>
439 <literal>max_reconnectms</literal></para>
441 <literal>(6000,W)</literal></para>
444 <para>Maximum connection retry interval (in milliseconds).</para>
450 <literal>eager_ack</literal></para>
452 <literal>(0 on linux,</literal></para>
454 <literal>1 on darwin,W)</literal></para>
457 <para>Boolean that determines whether the <literal>socklnd</literal> should attempt
458 to flush sends on message boundaries.</para>
464 <literal>typed_conns</literal></para>
466 <literal>(1,Wc)</literal></para>
469 <para>Boolean that determines whether the <literal>socklnd</literal> should use
470 different sockets for different types of messages. When clear, all communication
471 with a particular peer takes place on the same socket. Otherwise, separate sockets
472 are used for bulk sends, bulk receives and everything else.</para>
478 <literal>min_bulk</literal></para>
480 <literal>(1024,W)</literal></para>
483 <para>Determines when a message is considered "bulk".</para>
489 <literal>tx_buffer_size, rx_buffer_size</literal></para>
491 <literal>(8388608,Wc)</literal></para>
494 <para>Socket buffer sizes. Setting this option to zero (0), allows the system to
495 auto-tune buffer sizes. </para>
497 <para>Be very careful changing this value as improper sizing can harm
505 <literal>nagle</literal></para>
507 <literal>(0,Wc)</literal></para>
510 <para>Boolean that determines if <literal>nagle</literal> should be enabled. It
511 should never be set in production systems.</para>
517 <literal>keepalive_idle</literal></para>
519 <literal>(30,Wc)</literal></para>
522 <para>Time (in seconds) that a socket can remain idle before a keepalive probe is
523 sent. Setting this value to zero (0) disables keepalives.</para>
529 <literal>keepalive_intvl</literal></para>
531 <literal>(2,Wc)</literal></para>
534 <para>Time (in seconds) to repeat unanswered keepalive probes. Setting this value to
535 zero (0) disables keepalives.</para>
541 <literal>keepalive_count</literal></para>
543 <literal>(10,Wc)</literal></para>
546 <para>Number of unanswered keepalive probes before pronouncing socket (hence peer)
553 <literal>enable_irq_affinity</literal></para>
555 <literal>(0,Wc)</literal></para>
558 <para>Boolean that determines whether to enable IRQ affinity. The default is zero
560 <para>When set, <literal>socklnd</literal> attempts to maximize performance by
561 handling device interrupts and data movement for particular (hardware) interfaces
562 on particular CPUs. This option is not available on all platforms. This option
563 requires an SMP system to exist and produces best performance with multiple NICs.
564 Systems with multiple CPUs and a single NIC may see increase in the performance
565 with this parameter disabled.</para>
571 <literal>zc_min_frag</literal></para>
573 <literal>(2048,W)</literal></para>
576 <para>Determines the minimum message fragment that should be considered for
577 zero-copy sends. Increasing it above the platform's <literal>PAGE_SIZE
578 </literal>disables all zero copy sends. This option is not available on all
589 vim:expandtab:shiftwidth=2:tabstop=8: