LU-17578 lnet: fix &the_lnet.ln_mt_peerNIRecovq race To avoid race &the_lnet.ln_mt_peerNIRecovq must always be accessed with lnet_net_lock(0) protection. Test-Parameters: trivial Fixes: da23037 ("LU-16563 lnet: use discovered ni status to set initial health") Change-Id: Ic5e0194020200afdecba4cbf5afed274b14da388 Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54163 Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com>
LU-17545 lnet: use unsafe_memcpy() when flexible array To avoid <memcpy: detected field-spanning write (size 64) of single field "&lp->lp_data->pb_info" at .../lnet/lnet/peer.c:2456 (size 16)> false positive msgs/error. Signed-off-by: Bruno Faccini <bfaccini@nvidia.com> Change-Id: I4e2fc58e31f60b434a9050393cd65b89c54f0798 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54069 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17379 lnet: add LNetPeerDiscovered to LNet API LNetPeerDiscovered is added to allow lustre check whether the peer has been successfully discovered by LNet before attempting to open a connection to it. For example, given a mount command with a list of NIDs, Lustre can use LNetAddPeer API to initiate discovery on every candidate first, and later use LNetPeerDiscovered to select a reachable peer to connect to. Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I7c9964148a5a2a24d7889b8b4c2e488a433ca258 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53926 Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17000 lnet: don't assign unused return codes In lnet_peer_discovery() return from lnet_peer_ping_failed() and lnet_peer_push_failed() is unused and return value of former get quashed without getting used. Remove rc assignment and cast function to void to make it clear the return code can be ignored. Test-Parameters: trivial CoverityID: 412758 ("Unused Value") CoverityID: 412759 ("Unused Value") Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I02d5e883fc02814d5dbe307b78f028703023db52 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53608 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-14810 lnet: Cancel discovery ping/push on shutdown Discovery shutdown can race with ping and push events. In some cases this can result in failing to unlink ping/push MDs on shutdown. Protect against this by checking for PING/PUSH_FAILED state on peers on the request queue. Test-Parameters: trivial Test-Parameters: testlist=sanity-lnet env=ONLY=500,ONLY_REPEAT=50 Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I84a1f5beb6508651bc62e1dd93271f9e72f5081c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53356 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17103 lnet: use workqueue for lnd ping buffer updates Introduce workqueue for handling lnd-initiated ping buffer update requests. This is done to avoid the possibility of monitor thread lock up waiting for the "old" ping buffer refcount to get decremented during the update, while the message which triggers the decrement is on the monitor thread's own queue waiting to be processed. Test-Parameters: trivial Test-Parameters: testlist=sanity-lnet env=ONLY="207 500",ONLY_REPEAT=50 Fixes: 7ac399c5 ("LU-16949 lnet: get monitor thread to update ping buffer") Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I5176581703e52f4adbfff417040bebcc2489b79e Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52522 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
LU-17062 lnet: Update lnet_peer_*_decref_locked usage Move decref's to occur after last reference to prevent use after free. HPE-bug-id: LUS-11799 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I2382ece560039383f644b6aee73a9481d6bb5673 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52184 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: migrate peer NI control to Netlink Move peer creation and deletion to the Netlink API. This change enables the creation of peers with large NID addresses. Test-Parameters: trivial testlist=sanity-lnet Change-Id: I7f2f75e73e3f39856751f65e240f2172f703d0bc Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49574 Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-9680 lnet: collect data about peer_ni by using Netlink Migrate the LNet peer NI API to use the Netlink API for the case of collecting data on peer. This change also allows large NID support for IPv6. Since this doesn't cover creation and deletion of peers we can't setup large NID peers just yet. Test-Parameters: trivial testlist=sanity-lnet Change-Id: Iefa3f566255e768047b0f9ff21d64bc74634f284 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49516 Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-8191 lnet: remove unused, fix non-static functions lnet_selftest_structure_assertion() and lnet_net_is_pref_rtr_locked() are never called. This patch removes both functions. Static analysis shows that a number of functions could be made static. This patch also declares several functions in lnet static. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Ie1b49c5652553715cd9f96b56090d33a95e3b438 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51436 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: jsimmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16709 lnet: fix locking multiple NIDs of the MR peer If Lustre identifies the same peer with multiple NIDs, as a result of peer discovery it is possible that the discovered peer is found to contain a NID which is locked as primary by a different existing peer record. In this case it is safe to merge the peer records, but the NID which got locked the earliest should be kept as primary. This allows for the first of the two locked NIDs to stay primary as intended for the purpose of communicating with Lustre even if peer discovery succeeded using a different NID of MR peer. Fixes: aacb16191a ("LU-14668 lnet: Lock primary NID logic") Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: Iec9f8b70053fe24cddee552358500dfad0234b7f Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50530 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: change LNetAddPeer() to take struct lnet_nid Rather than an array of lnet_nid_t, LNetAddPeer now takes an array of struct lnet_nid. The array passed is *always* from struct uuid_nid_data, so that data structure is changed to store struct lnet_nid. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I0931c1dbbe50fcd7970bba6b68464eea14b1d25e Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50085 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: jsimmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com>
LU-14668 lnet: add 'lock_prim_nid" lnet module parameter Add 'lock_prim_nid' lnet module parameter to allow control of how Lustre peer primary NID is selected. If set to 1 (default), the NID specified by Lustre when calling LNet API is designated as primary for the peer, allowing for non-blocking discovery in the background. If set to 0, peer discovery is blocking until complete and the NID listed first in discovery response is designated as primary. Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I6ed1cb0c637f4aa7a7340a6f01819ba9a85858f4 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50159 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16563 lnet: use discovered ni status to set initial health If not routing, track local NI status in the ping buffer such that locally recognized "down" state, for example, due to a downed network interface/link, is available to any discovering peer. If NI 'fatal' status is changed, push update to peers. On the active side of discovery, check peer NI status so if NI is down, decrement its health score and queue for recovery. Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I513c7942099c0da9088fa6d4460f76386ea91d3b Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50027 Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-14668 lnet: add 'force' option to lnetctl peer del Add --force option to 'lnetctl peer del' command. If the peer has primary NID locked, this option allows for the peer to be deleted manually: lnetctl peer del --prim_nid <nid> --force Add --prim_lock option to 'lnetctl peer add' command. If specified, the primary NID of the peer is locked such that it is going to be the NID used to identify the peer in communications with Lustre layer. Test-Parameters: trivial Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: Ia6001856cfbce7b0c3288cff9b244b569d259647 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50149 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14668 lnet: don't delete peer created by Lustre Peers created by Lustre have their primary NIDs locked. If that peer is deleted, it'll confuse lustre. So when manually deleting a peer using: lnetctl peer del --prim_nid ... We must continue to preserve the primary NID. Therefore we delete all the constituent NIDs, but keep the primary NID. We then flag the peer for rediscovery. Signed-off-by: Amir Shehata <ashehata@whamcloud.com> Change-Id: I34eef9b0049435a01fde87dc8263dd50f631c551 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43565 Tested-by: Maloo <maloo@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
LU-14668 lnet: Peers added via kernel API should be permanent The LNetAddPeer() API allows Lustre to predefine the Peer for LNet. Originally these peers would be temporary and potentially re-created via discovery. Instead, let's make these peers permanent. This allows Lustre to dictate the primary NID of the peer. LNet makes sure this primary NID is not changed afterwards. Test-Parameters: trivial Signed-off-by: Amir Shehata <ashehata@whamcloud.com> Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I3f54c04719c9e0374176682af08183f0c93ef737 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43788 Tested-by: Maloo <maloo@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
LU-14668 lnet: Lock primary NID logic If a peer is created by Lustre make sure to lock that peer's primary NID. This peer can be discovered in the background. There is no need to block until discovery is complete, as Lustre can continue on with the primary NID it provided. Discovery will populate the peer with other interfaces the peer has but will not change the peer's primary NID. It can also delete peer's NIDs which Lustre told it about (not the Primary NID). If a peer has been manually discovered via lnetctl discover <nid> command, then make sure to delete the manually discovered peer and recreate it with the Lustre NID information provided for us. Signed-off-by: Amir Shehata <ashehata@whamcloud.com> Change-Id: I8fc8a69caccca047e3085bb33d026a3f09fb359b Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50106 Tested-by: Maloo <maloo@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
LU-16509 lnet: quash memcpy WARN_ONCE false positive Linux v6.1-rc1-4-g6f7630b1b5bc fortify: Capture __bos() results in const temp var In lnet_peer_push_event() the memcpy triggers a WARN_ONCE due to the flexible array at the end of struct lnet_ping_info contained in struct lnet_ping_buffer Use unsafe_memcpy() to avoid this false positive warning. Test-Parameters: trivial HPE-bug-id: LUS-11455 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I4aa8f38678cd1522004d98b58a3f440d8a38589c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49801 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
LU-10003 lnet: use Netlink to support LNet ping commands Completely replace the old pre-MR ping command ioctl using Netlink which will also handle large NIDs. We do update IOC_LIBCFS_PING_PEER, which only supports only small NIDs, so older tools will keep working. Test-Parameters: trivial testlist=sanity-lnet Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Change-Id: Ic82a18dc38e4bd4e78bf61da766f7a847da509a8 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49360 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>