1 <?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="lnetmr" condition='l210'>
2 <title xml:id="lnetmr.title">LNet Software Multi-Rail</title>
3 <para>This chapter describes LNet Software Multi-Rail configuration and
7 <para><xref linkend="dbdoclet.mroverview"/></para>
8 <para><xref linkend="dbdoclet.mrconfiguring"/></para>
9 <para><xref linkend="dbdoclet.mrrouting"/></para>
12 <section xml:id="dbdoclet.mroverview">
13 <title><indexterm><primary>MR</primary><secondary>overview</secondary>
14 </indexterm>Multi-Rail Overview</title>
15 <para>In computer networking, multi-rail is an arrangement in which two or
16 more network interfaces to a single network on a computer node are employed,
17 to achieve increased throughput. Multi-rail can also be where a node has
18 one or more interfaces to multiple, even different kinds of networks, such
19 as Ethernet, Infiniband, and IntelĀ® Omni-Path. For Lustre clients,
20 multi-rail generally presents the combined network capabilities as a single
21 LNet network. Peer nodes that are multi-rail capable are established during
22 configuration, as are user-defined interface-section policies.</para>
23 <para>The following link contains a detailed high-level design for the
25 <link xl:href="http://wiki.lustre.org/images/b/bb/Multi-Rail_High-Level_Design_20150119.pdf">
26 Multi-Rail High-Level Design</link></para>
28 <section xml:id="dbdoclet.mrconfiguring">
29 <title><indexterm><primary>MR</primary><secondary>configuring</secondary>
30 </indexterm>Configuring Multi-Rail</title>
31 <para>Every node using multi-rail networking needs to be properly
32 configured. Multi-rail uses <literal>lnetctl</literal> and the LNet
33 Configuration Library for configuration. Configuring multi-rail for a
34 given node involves two tasks:</para>
36 <listitem><para>Configuring multiple network interfaces present on the
37 local node.</para></listitem>
38 <listitem><para>Adding remote peers that are multi-rail capable (are
39 connected to one or more common networks with at least two interfaces).
42 <para>This section is a supplement to
43 <xref linkend="dbdoclet.lnetaddshowdelete" /> and contains further
44 examples for Multi-Rail configurations.</para>
45 <section xml:id="dbdoclet.addinterfaces">
46 <title><indexterm><primary>MR</primary>
47 <secondary>multipleinterfaces</secondary>
48 </indexterm>Configure Multiple Interfaces on the Local Node</title>
49 <para>Example <literal>lnetctl add</literal> command with multiple
50 interfaces in a Multi-Rail configuration:</para>
51 <screen>lnetctl net add --net tcp --if eth0,eth1</screen>
52 <para>Example of YAML net show:</para>
53 <screen>lnetctl net show -v
66 peer_buffer_credits: 0
74 - nid: 192.168.122.10@tcp
85 peer_buffer_credits: 0
91 - nid: 192.168.122.11@tcp
102 peer_buffer_credits: 0
109 <section xml:id="dbdoclet.deleteinterfaces">
110 <title><indexterm><primary>MR</primary>
111 <secondary>deleteinterfaces</secondary>
112 </indexterm>Deleting Network Interfaces</title>
113 <para>Example delete with <literal>lnetctl net del</literal>:</para>
114 <para>Assuming the network configuration is as shown above with the
115 <literal>lnetctl net show -v</literal> in the previous section, we can
116 delete a net with following command:</para>
117 <screen>lnetctl net del --net tcp --if eth0</screen>
118 <para>The resultant net information would look like:</para>
119 <screen>lnetctl net show -v
132 peer_buffer_credits: 0
137 CPT: "[0,1,2,3]"</screen>
138 <para>The syntax of a YAML file to perform a delete would be:</para>
139 <screen>- net type: tcp
141 - nid: 192.168.122.10@tcp
145 <section xml:id="dbdoclet.addremotepeers">
146 <title><indexterm><primary>MR</primary>
147 <secondary>addremotepeers</secondary>
148 </indexterm>Adding Remote Peers that are Multi-Rail Capable</title>
149 <para>The following example <literal>lnetctl peer add</literal>
150 command adds a peer with 2 nids, with
151 <literal>192.168.122.30@tcp</literal> being the primary nid:</para>
152 <screen>lnetctl peer add --prim_nid 192.168.122.30@tcp --nid 192.168.122.30@tcp,192.168.122.31@tcp
154 <para>The resulting <literal>lnetctl peer show</literal> would be:
155 <screen>lnetctl peer show -v
157 - primary nid: 192.168.122.30@tcp
160 - nid: 192.168.122.30@tcp
163 available_tx_credits: 8
166 available_rtr_credits: 8
173 - nid: 192.168.122.31@tcp
176 available_tx_credits: 8
179 available_rtr_credits: 8
185 drop_count: 0</screen>
187 <para>The following is an example YAML file for adding a peer:</para>
190 - primary nid: 192.168.122.30@tcp
193 - nid: 192.168.122.31@tcp</screen>
195 <section xml:id="dbdoclet.deleteremotepeers">
196 <title><indexterm><primary>MR</primary>
197 <secondary>deleteremotepeers</secondary>
198 </indexterm>Deleting Remote Peers</title>
199 <para>Example of deleting a single nid of a peer (192.168.122.31@tcp):
201 <screen>lnetctl peer del --prim_nid 192.168.122.30@tcp --nid 192.168.122.31@tcp</screen>
202 <para>Example of deleting the entire peer:</para>
203 <screen>lnetctl peer del --prim_nid 192.168.122.30@tcp</screen>
204 <para>Example of deleting a peer via YAML:</para>
205 <screen>Assuming the following peer configuration:
207 - primary nid: 192.168.122.30@tcp
210 - nid: 192.168.122.30@tcp
212 - nid: 192.168.122.31@tcp
214 - nid: 192.168.122.32@tcp
217 You can delete 192.168.122.32@tcp as follows:
221 - primary nid: 192.168.122.30@tcp
224 - nid: 192.168.122.32@tcp
226 % lnetctl import --del < delPeer.yaml</screen>
229 <section xml:id="dbdoclet.mrrouting">
230 <title><indexterm><primary>MR</primary>
231 <secondary>mrrouting</secondary>
232 </indexterm>Notes on routing with Multi-Rail</title>
233 <para>Multi-Rail configuration can be applied on the Router to aggregate
234 the interfaces performance.</para>
235 <section xml:id="dbdoclet.mrroutingex">
236 <title><indexterm><primary>MR</primary>
237 <secondary>mrrouting</secondary>
238 <tertiary>routingex</tertiary>
239 </indexterm>Multi-Rail Cluster Example</title>
240 <para>The below example outlines a simple system where all the Lustre
241 nodes are MR capable. Each node in the cluster has two interfaces.</para>
242 <figure xml:id="lnetmultirail.fig.routingdiagram">
243 <title>Routing Configuration with Multi-Rail</title>
246 <imagedata scalefit="1" width="100%"
247 fileref="./figures/MR_RoutingConfig.png" />
250 <phrase>Routing Configuration with Multi-Rail</phrase>
254 <para>The routers can aggregate the interfaces on each side of the network
255 by configuring them on the appropriate network.</para>
256 <para>An example configuration:</para>
258 lnetctl net add --net o2ib0 --if ib0,ib1
259 lnetctl net add --net o2ib1 --if ib2,ib3
260 lnetctl peer add --nid <peer1-nidA>@o2ib,<peer1-nidB>@o2ib,...
261 lnetctl peer add --nid <peer2-nidA>@o2ib1,<peer2-nidB>>@o2ib1,...
262 lnetctl set routing 1
265 lnetctl net add --net o2ib0 --if ib0,ib1
266 lnetctl route add --net o2ib1 --gateway <rtrX-nidA>@o2ib
267 lnetctl peer add --nid <rtrX-nidA>@o2ib,<rtrX-nidB>@o2ib
270 lnetctl net add --net o2ib1 --if ib0,ib1
271 lnetctl route add --net o2ib0 --gateway <rtrX-nidA>@o2ib1
272 lnetctl peer add --nid <rtrX-nidA>@o2ib1,<rtrX-nidB>@o2ib1</screen>
273 <para>In the above configuration the clients and the servers are
274 configured with only one route entry per router. This works because the
275 routers are MR capable. By adding the routers as peers with multiple
276 interfaces to the clients and the servers, when sending to the router the
277 MR algorithm will ensure that bot interfaces of the routers are used.
279 <para>However, as of the Lustre 2.10 release LNet Resiliency is still
280 under development and single interface failure will still cause the entire
281 router to go down.</para>
283 <section xml:id="dbdoclet.mrroutingresiliency">
284 <title><indexterm><primary>MR</primary>
285 <secondary>mrrouting</secondary>
286 <tertiary>routingresiliency</tertiary>
287 </indexterm>Utilizing Router Resiliency</title>
288 <para>Currently, LNet provides a mechanism to monitor each route entry.
289 LNet pings each gateway identified in the route entry on regular,
290 configurable interval to ensure that it is alive. If sending over a
291 specific route fails or if the router pinger determines that the gateway
292 is down, then the route is marked as down and is not used. It is
293 subsequently pinged on regular, configurable intervals to determine when
294 it becomes alive again.</para>
295 <para>This mechanism can be combined with the MR feature in Lustre 2.10 to
296 add this router resiliency feature to the configuration.</para>
298 lnetctl net add --net o2ib0 --if ib0,ib1
299 lnetctl net add --net o2ib1 --if ib2,ib3
300 lnetctl peer add --nid <peer1-nidA>@o2ib,<peer1-nidB>@o2ib,...
301 lnetctl peer add --nid <peer2-nidA>@o2ib1,<peer2-nidB>@o2ib1,...
302 lnetctl set routing 1
305 lnetctl net add --net o2ib0 --if ib0,ib1
306 lnetctl route add --net o2ib1 --gateway <rtrX-nidA>@o2ib
307 lnetctl route add --net o2ib1 --gateway <rtrX-nidB>@o2ib
310 lnetctl net add --net o2ib1 --if ib0,ib1
311 lnetctl route add --net o2ib0 --gateway <rtrX-nidA>@o2ib1
312 lnetctl route add --net o2ib0 --gateway <rtrX-nidB>@o2ib1</screen>
313 <para>There are a few things to note in the above configuration:</para>
316 <para>The clients and the servers are now configured with two
317 routes, each route's gateway is one of the interfaces of the
318 route. The clients and servers will view each interface of the
319 same router as a separate gateway and will monitor them as
320 described above.</para>
323 <para>The clients and the servers are not configured to view the
324 routers as MR capable. This is important because we want to deal
325 with each interface as a separate peers and not different
326 interfaces of the same peer.</para>
329 <para>The routers are configured to view the peers as MR capable.
330 This is an oddity in the configuration, but is currently required
331 in order to allow the routers to load balance the traffic load
332 across its interfaces evenly.</para>
336 <section xml:id="dbdoclet.mrroutingmixed">
337 <title><indexterm><primary>MR</primary>
338 <secondary>mrrouting</secondary>
339 <tertiary>routingmixed</tertiary>
340 </indexterm>Mixed Multi-Rail/Non-Multi-Rail Cluster</title>
341 <para>The above principles can be applied to mixed MR/Non-MR cluster.
342 For example, the same configuration shown above can be applied if the
343 clients and the servers are non-MR while the routers are MR capable.
344 This appears to be a common cluster upgrade scenario.</para>