From: Joseph Gmitter Date: Fri, 16 Jun 2017 15:23:04 +0000 (-0400) Subject: LUDOC-367 lnet: Initial MR Documentation X-Git-Tag: 2.10.0~1 X-Git-Url: https://git.whamcloud.com/?a=commitdiff_plain;h=ef8bb7d5d8377af1d74e2db0e33ed5c9984b3f6f;p=doc%2Fmanual.git LUDOC-367 lnet: Initial MR Documentation Initial documention for the LNet multi-rail feature added in LU-7734. lnetctl commands have been updated in section 9 where appropriate to reflect MR changes and syntax. A new section has also been created for current and future multi-rail specific examples and content. Signed-off-by: Joseph Gmitter Change-Id: Ie1d5a174f3fd02e39e011d349828654dfa8d540c Reviewed-on: https://review.whamcloud.com/27687 Tested-by: Jenkins Reviewed-by: Sonia Sharma Reviewed-by: Amir Shehata --- diff --git a/ConfiguringLNet.xml b/ConfiguringLNet.xml index fa461e1..da67cbb 100755 --- a/ConfiguringLNet.xml +++ b/ConfiguringLNet.xml @@ -99,13 +99,14 @@ unconfigure LNet. lnetctl lnet unconfigure -
- <indexterm> - <primary>LNet</primary> - <secondary>cli</secondary> - </indexterm>Adding, Deleting and Showing networks - Networks can be added and deleted after the LNet kernel module - is loaded. +
+ <indexterm><primary>LNet</primary> + <secondary>cli</secondary></indexterm>Adding, Deleting and Showing + Networks + Networks can be added, deleted, or shown after the LNet kernel + module is loaded. + The lnetctl net add + command is used to add networks: lnetctl net add: add a network --net: net name (ex tcp0) --if: physical interface (ex eth0) @@ -119,15 +120,39 @@ Example: lnetctl net add --net tcp2 --if eth0 --peer_timeout 180 --peer_credits 8 - Networks can be deleted as shown below: + With the addition of Software based Multi-Rail + in Lustre 2.10, the following should be noted: + + --net: no longer needs to be unique since multiple + interfaces can be added to the same network. + --if: The same interface per network can be added + only once, however, more than one interface can now be specified + (separated by a comma) for a node. For example: eth0,eth1,eth2. + + + For examples on adding multiple interfaces via + lnetctl net add and/or YAML, please see + + + + Networks can be deleted with the + lnetctl net del + command: net del: delete a network --net: net name (ex tcp0) + --if: physical inerface (e.g. eth0) Example: lnetctl net del --net tcp2 - All or a subset of the configured networks can be shown. The - output can be non-verbose - or verbose. + In a Software Multi-Rail configuration, + specifying only the --net argument will delete the + entire network and all interfaces under it. The new + --if switch should also be used in conjunction with + --net to specify deletion of a specific interface. + + All or a subset of the configured networks can be shown with the + lnetctl net show + command. The output can be non-verbose or verbose. net show: show networks --net: net name (ex tcp0) to filter on --verbose: display detailed output per network @@ -159,6 +184,100 @@ net: peer_buffer_credits: 0 credits: 256
+
+ <indexterm> + <primary>LNet</primary> + <secondary>cli</secondary> + </indexterm>Adding, Deleting and Showing Peers + The lnetctl peer add + command is used to add a remote peer to a software + multi-rail configuration. + When configuring peers, use the –-prim_nid + option to specify the key or primary nid of the peer node. Then + follow that with the --nid option to specify a set + of comma separated NIDs. + peer add: add a peer + --prim_nid: primary NID of the peer + --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0) + --non_mr: if specified this interface is created as a non mulit-rail + capable peer. Only one NID can be specified in this case. + For example: + + lnetctl peer add --prim_nid 10.10.10.2@tcp --nid 10.10.3.3@tcp1,10.4.4.5@tcp2 + + The --prim-nid (primary nid for the peer + node) can go unspecified. In this case, the first listed NID in the + --nid option becomes the primary nid of the peer. + For example: + + lnetctl peer_add --nid 10.10.10.2@tcp,10.10.3.3@tcp1,10.4.4.5@tcp2 + YAML can also be used to configure peers: + peer: + - primary nid: <key or primary nid> + Multi-Rail: True + peer ni: + - nid: <nid 1> + - nid: <nid 2> + - nid: <nid n> + As with all other commands, the result of the + lnetctl peer show command can be used to gather + information to aid in configuring or deleting a peer: + lnetctl peer show -v + Example output from the lnetctl peer show + command: + peer: + - primary nid: 192.168.122.218@tcp + Multi-Rail: True + peer ni: + - nid: 192.168.122.218@tcp + state: NA + max_ni_tx_credits: 8 + available_tx_credits: 8 + available_rtr_credits: 8 + min_rtr_credits: -1 + tx_q_num_of_buf: 0 + send_count: 6819 + recv_count: 6264 + drop_count: 0 + refcount: 1 + - nid: 192.168.122.78@tcp + state: NA + max_ni_tx_credits: 8 + available_tx_credits: 8 + available_rtr_credits: 8 + min_rtr_credits: -1 + tx_q_num_of_buf: 0 + send_count: 7061 + recv_count: 6273 + drop_count: 0 + refcount: 1 + - nid: 192.168.122.96@tcp + state: NA + max_ni_tx_credits: 8 + available_tx_credits: 8 + available_rtr_credits: 8 + min_rtr_credits: -1 + tx_q_num_of_buf: 0 + send_count: 6939 + recv_count: 6286 + drop_count: 0 + refcount: 1 + Use the following lnetctl command to delete a + peer: + peer del: delete a peer + --prim_nid: Primary NID of the peer + --nid: comma separated list of peer nids (e.g. 10.1.1.2@tcp0) + prim_nid should always be specified. The + prim_nid identifies the peer. If the + prim_nid is the only one specified, then the entire + peer is deleted. + Example of deleting a single nid of a peer (10.10.10.3@tcp): + + lnetctl peer del --prim_nid 10.10.10.2@tcp --nid 10.10.10.3@tcp + Example of deleting the entire peer: + lnetctl peer del --prim_nid 10.10.10.2@tcp + +
<indexterm> <primary>LNet</primary> @@ -668,6 +787,38 @@ tcp0 192.168.0.*; o2ib0 132.6.[1-3].[2-8/2]"'</screen> stepped by 2; that is 2,4,6,8. Thus, the clients at <literal>132.6.3.5</literal> will not find a matching o2ib network.</para> + <note condition='l2A'> + <para>Multi-rail deprecates the kernel parsing of ip2nets. ip2nets + patterns are matched in user space and translated into Network interfaces + to be added into the system.</para> + <para>The first interface that matches the IP pattern will be used when + adding a network interface.</para> + <para>If an interface is explicitly specified as well as a pattern, the + interface matched using the IP pattern will be sanitized against the + explicitly-defined interface.</para> + <para>For example, <literal>tcp(eth0) 192.168.*.3</literal> and there + exists in the system <literal>eth0 == 192.158.19.3</literal> and + <literal>eth1 == 192.168.3.3</literal>, then the configuration will fail, + because the pattern contradicts the interface specified.</para> + <para>A clear warning will be displayed if inconsistent configuration is + encountered.</para> + <para>You could use the following command to configure ip2nets:</para> + <screen>lnetctl import < ip2nets.yaml</screen> + <para>For example:</para> + <screen>ip2nets: + - net-spec: tcp1 + interfaces: + 0: eth0 + 1: eth1 + ip-range: + 0: 192.168.*.19 + 1: 192.168.100.105 + - net-spec: tcp2 + interfaces: + 0: eth2 + ip-range: + 0: 192.168.*.*</screen> + </note> </section> <section xml:id="dbdoclet.50438216_71227"> <title><indexterm><primary>LNet</primary><secondary>routes</secondary></indexterm>Setting diff --git a/III_LustreAdministration.xml b/III_LustreAdministration.xml index 996780c..27f44df 100644 --- a/III_LustreAdministration.xml +++ b/III_LustreAdministration.xml @@ -86,6 +86,7 @@ <xi:include href="LustreOperations.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> <xi:include href="LustreMaintenance.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> <xi:include href="ManagingLNet.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> + <xi:include href="LNetMultiRail.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> <xi:include href="UpgradingLustre.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> <xi:include href="BackupAndRestore.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> <xi:include href="ManagingStripingFreeSpace.xml" xmlns:xi="http://www.w3.org/2001/XInclude" /> diff --git a/LNetMultiRail.xml b/LNetMultiRail.xml new file mode 100644 index 0000000..f79e791 --- /dev/null +++ b/LNetMultiRail.xml @@ -0,0 +1,347 @@ +<?xml version='1.0' encoding='UTF-8'?><chapter xmlns="http://docbook.org/ns/docbook" xmlns:xl="http://www.w3.org/1999/xlink" version="5.0" xml:lang="en-US" xml:id="lnetmr" condition='l210'> + <title xml:id="lnetmr.title">LNet Software Multi-Rail + This chapter describes LNet Software Multi-Rail configuration and + administration. + + + + + + + +
+ <indexterm><primary>MR</primary><secondary>overview</secondary> + </indexterm>Multi-Rail Overview + In computer networking, multi-rail is an arrangement in which two or + more network interfaces to a single network on a computer node are employed, + to achieve increased throughput. Multi-rail can also be where a node has + one or more interfaces to multiple, even different kinds of networks, such + as Ethernet, Infiniband, and Intel® Omni-Path. For Lustre clients, + multi-rail generally presents the combined network capabilities as a single + LNet network. Peer nodes that are multi-rail capable are established during + configuration, as are user-defined interface-section policies. + The following link contains a detailed high-level design for the + feature: + + Multi-Rail High-Level Design +
+
+ <indexterm><primary>MR</primary><secondary>configuring</secondary> + </indexterm>Configuring Multi-Rail + Every node using multi-rail networking needs to be properly + configured. Multi-rail uses lnetctl and the LNet + Configuration Library for configuration. Configuring multi-rail for a + given node involves two tasks: + + Configuring multiple network interfaces present on the + local node. + Adding remote peers that are multi-rail capable (are + connected to one or more common networks with at least two interfaces). + + + This section is a supplement to + and contains further + examples for Multi-Rail configurations. +
+ <indexterm><primary>MR</primary> + <secondary>multipleinterfaces</secondary> + </indexterm>Configure Multiple Interfaces on the Local Node + Example lnetctl add command with multiple + interfaces in a Multi-Rail configuration: + lnetctl net add --net tcp --if eth0,eth1 + Example of YAML net show: + lnetctl net show -v +net: + - net type: lo + local NI(s): + - nid: 0@lo + status: up + statistics: + send_count: 0 + recv_count: 0 + drop_count: 0 + tunables: + peer_timeout: 0 + peer_credits: 0 + peer_buffer_credits: 0 + credits: 0 + lnd tunables: + tcp bonding: 0 + dev cpt: 0 + CPT: "[0]" + - net type: tcp + local NI(s): + - nid: 192.168.122.10@tcp + status: up + interfaces: + 0: eth0 + statistics: + send_count: 0 + recv_count: 0 + drop_count: 0 + tunables: + peer_timeout: 180 + peer_credits: 8 + peer_buffer_credits: 0 + credits: 256 + lnd tunables: + tcp bonding: 0 + dev cpt: -1 + CPT: "[0]" + - nid: 192.168.122.11@tcp + status: up + interfaces: + 0: eth1 + statistics: + send_count: 0 + recv_count: 0 + drop_count: 0 + tunables: + peer_timeout: 180 + peer_credits: 8 + peer_buffer_credits: 0 + credits: 256 + lnd tunables: + tcp bonding: 0 + dev cpt: -1 + CPT: "[0]" +
+
+ <indexterm><primary>MR</primary> + <secondary>deleteinterfaces</secondary> + </indexterm>Deleting Network Interfaces + Example delete with lnetctl net del: + Assuming the network configuration is as shown above with the + lnetctl net show -v in the previous section, we can + delete a net with following command: + lnetctl net del --net tcp --if eth0 + The resultant net information would look like: + lnetctl net show -v +net: + - net type: lo + local NI(s): + - nid: 0@lo + status: up + statistics: + send_count: 0 + recv_count: 0 + drop_count: 0 + tunables: + peer_timeout: 0 + peer_credits: 0 + peer_buffer_credits: 0 + credits: 0 + lnd tunables: + tcp bonding: 0 + dev cpt: 0 + CPT: "[0,1,2,3]" + The syntax of a YAML file to perform a delete would be: + - net type: tcp + local NI(s): + - nid: 192.168.122.10@tcp + interfaces: + 0: eth0 +
+
+ <indexterm><primary>MR</primary> + <secondary>addremotepeers</secondary> + </indexterm>Adding Remote Peers that are Multi-Rail Capable + The following example lnetctl peer add + command adds a peer with 2 nids, with + 192.168.122.30@tcp being the primary nid: + lnetctl peer add --prim_nid 192.168.122.30@tcp --nid 192.168.122.30@tcp,192.168.122.31@tcp + + The resulting lnetctl peer show would be: + lnetctl peer show -v +peer: + - primary nid: 192.168.122.30@tcp + Multi-Rail: True + peer ni: + - nid: 192.168.122.30@tcp + state: NA + max_ni_tx_credits: 8 + available_tx_credits: 8 + min_tx_credits: 7 + tx_q_num_of_buf: 0 + available_rtr_credits: 8 + min_rtr_credits: 8 + refcount: 1 + statistics: + send_count: 2 + recv_count: 2 + drop_count: 0 + - nid: 192.168.122.31@tcp + state: NA + max_ni_tx_credits: 8 + available_tx_credits: 8 + min_tx_credits: 7 + tx_q_num_of_buf: 0 + available_rtr_credits: 8 + min_rtr_credits: 8 + refcount: 1 + statistics: + send_count: 1 + recv_count: 1 + drop_count: 0 + + The following is an example YAML file for adding a peer: + addPeer.yaml +peer: + - primary nid: 192.168.122.30@tcp + Multi-Rail: True + peer ni: + - nid: 192.168.122.31@tcp +
+
+ <indexterm><primary>MR</primary> + <secondary>deleteremotepeers</secondary> + </indexterm>Deleting Remote Peers + Example of deleting a single nid of a peer (192.168.122.31@tcp): + + lnetctl peer del --prim_nid 192.168.122.30@tcp --nid 192.168.122.31@tcp + Example of deleting the entire peer: + lnetctl peer del --prim_nid 192.168.122.30@tcp + Example of deleting a peer via YAML: + Assuming the following peer configuration: +peer: + - primary nid: 192.168.122.30@tcp + Multi-Rail: True + peer ni: + - nid: 192.168.122.30@tcp + state: NA + - nid: 192.168.122.31@tcp + state: NA + - nid: 192.168.122.32@tcp + state: NA + +You can delete 192.168.122.32@tcp as follows: + +delPeer.yaml +peer: + - primary nid: 192.168.122.30@tcp + Multi-Rail: True + peer ni: + - nid: 192.168.122.32@tcp + +% lnetctl import --del < delPeer.yaml +
+
+
+ <indexterm><primary>MR</primary> + <secondary>mrrouting</secondary> + </indexterm>Notes on routing with Multi-Rail + Multi-Rail configuration can be applied on the Router to aggregate + the interfaces performance. +
+ <indexterm><primary>MR</primary> + <secondary>mrrouting</secondary> + <tertiary>routingex</tertiary> + </indexterm>Multi-Rail Cluster Example + The below example outlines a simple system where all the Lustre + nodes are MR capable. Each node in the cluster has two interfaces. +
+ Routing Configuration with Multi-Rail + + + + + + Routing Configuration with Multi-Rail + + +
+ The routers can aggregate the interfaces on each side of the network + by configuring them on the appropriate network. + An example configuration: + Routers +lnetctl net add --net o2ib0 --if ib0,ib1 +lnetctl net add --net o2ib1 --if ib2,ib3 +lnetctl peer add --nid <peer1-nidA>@o2ib,<peer1-nidB>@o2ib,... +lnetctl peer add --nid <peer2-nidA>@o2ib1,<peer2-nidB>>@o2ib1,... +lnetctl set routing 1 + +Clients +lnetctl net add --net o2ib0 --if ib0,ib1 +lnetctl route add --net o2ib1 --gateway <rtrX-nidA>@o2ib +lnetctl peer add --nid <rtrX-nidA>@o2ib,<rtrX-nidB>@o2ib + +Servers +lnetctl net add --net o2ib1 --if ib0,ib1 +lnetctl route add --net o2ib0 --gateway <rtrX-nidA>@o2ib1 +lnetctl peer add --nid <rtrX-nidA>@o2ib1,<rtrX-nidB>@o2ib1 + In the above configuration the clients and the servers are + configured with only one route entry per router. This works because the + routers are MR capable. By adding the routers as peers with multiple + interfaces to the clients and the servers, when sending to the router the + MR algorithm will ensure that bot interfaces of the routers are used. + + However, as of the Lustre 2.10 release LNet Resiliency is still + under development and single interface failure will still cause the entire + router to go down. +
+
+ <indexterm><primary>MR</primary> + <secondary>mrrouting</secondary> + <tertiary>routingresiliency</tertiary> + </indexterm>Utilizing Router Resiliency + Currently, LNet provides a mechanism to monitor each route entry. + LNet pings each gateway identified in the route entry on regular, + configurable interval to ensure that it is alive. If sending over a + specific route fails or if the router pinger determines that the gateway + is down, then the route is marked as down and is not used. It is + subsequently pinged on regular, configurable intervals to determine when + it becomes alive again. + This mechanism can be combined with the MR feature in Lustre 2.10 to + add this router resiliency feature to the configuration. + Routers +lnetctl net add --net o2ib0 --if ib0,ib1 +lnetctl net add --net o2ib1 --if ib2,ib3 +lnetctl peer add --nid <peer1-nidA>@o2ib,<peer1-nidB>@o2ib,... +lnetctl peer add --nid <peer2-nidA>@o2ib1,<peer2-nidB>@o2ib1,... +lnetctl set routing 1 + +Clients +lnetctl net add --net o2ib0 --if ib0,ib1 +lnetctl route add --net o2ib1 --gateway <rtrX-nidA>@o2ib +lnetctl route add --net o2ib1 --gateway <rtrX-nidB>@o2ib + +Servers +lnetctl net add --net o2ib1 --if ib0,ib1 +lnetctl route add --net o2ib0 --gateway <rtrX-nidA>@o2ib1 +lnetctl route add --net o2ib0 --gateway <rtrX-nidB>@o2ib1 + There are a few things to note in the above configuration: + + + The clients and the servers are now configured with two + routes, each route's gateway is one of the interfaces of the + route. The clients and servers will view each interface of the + same router as a separate gateway and will monitor them as + described above. + + + The clients and the servers are not configured to view the + routers as MR capable. This is important because we want to deal + with each interface as a separate peers and not different + interfaces of the same peer. + + + The routers are configured to view the peers as MR capable. + This is an oddity in the configuration, but is currently required + in order to allow the routers to load balance the traffic load + across its interfaces evenly. + + +
+
+ <indexterm><primary>MR</primary> + <secondary>mrrouting</secondary> + <tertiary>routingmixed</tertiary> + </indexterm>Mixed Multi-Rail/Non-Multi-Rail Cluster + The above principles can be applied to mixed MR/Non-MR cluster. + For example, the same configuration shown above can be applied if the + clients and the servers are non-MR while the routers are MR capable. + This appears to be a common cluster upgrade scenario. +
+
+ diff --git a/ManagingLNet.xml b/ManagingLNet.xml index ea3fc20..cb581da 100644 --- a/ManagingLNet.xml +++ b/ManagingLNet.xml @@ -93,29 +93,40 @@ To remove all Lustre modules, run:
- <indexterm><primary>LNet</primary><secondary>multi-rail configuration</secondary></indexterm>Multi-Rail Configurations with LNet - To aggregate bandwidth across both rails of a dual-rail IB cluster (o2iblnd) - Multi-rail configurations are only supported by o2iblnd; other IB LNDs do not support multiple interfaces. + <indexterm><primary>LNet</primary><secondary>hardware multi-rail + configuration</secondary></indexterm>Hardware Based Multi-Rail + Configurations with LNet + To aggregate bandwidth across both rails of a dual-rail IB cluster + (o2iblnd) + Hardware multi-rail configurations are only supported by o2iblnd; + other IB LNDs do not support multiple interfaces. using LNet, consider these points: - LNet can work with multiple rails, however, it does not load balance across them. The actual rail used for any communication is determined by the peer NID. + LNet can work with multiple rails, however, it does not load + balance across them. The actual rail used for any communication is + determined by the peer NID. - Multi-rail LNet configurations do not provide an additional level of network fault - tolerance. The configurations described below are for bandwidth aggregation only. + Hardware multi-rail LNet configurations do not provide an + additional level of network fault tolerance. The configurations + described below are for bandwidth aggregation only. - A Lustre node always uses the same local NID to communicate with a given peer NID. The criteria used to determine the local NID are: + A Lustre node always uses the same local NID to communicate with a + given peer NID. The criteria used to determine the local NID are: - - Lowest route priority number (lower number, higher priority). - + + Lowest route priority number (lower number, + higher priority). + Fewest hops (to minimize routing), and - Appears first in the "networks" or "ip2nets" LNet configuration strings + Appears first in the "networks" + or "ip2nets" LNet configuration strings + diff --git a/figures/MR_RoutingConfig.png b/figures/MR_RoutingConfig.png new file mode 100644 index 0000000..9bdf6d2 Binary files /dev/null and b/figures/MR_RoutingConfig.png differ