LU-17062 lnet: Update lnet_peer_*_decref_locked usage Move decref's to occur after last reference to prevent use after free. HPE-bug-id: LUS-11799 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I2382ece560039383f644b6aee73a9481d6bb5673 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52184 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16594 build: get_random_u32_below, get_acl with dentry Linux commit v6.1-13825-g3c202d14a9d7 prandom: remove prandom_u32_max() Use get_random_u32_below() and provide a replacement when get_random_u32_below is not available. Linux commit v6.1-rc1-2-g138060ba92b3 fs: pass dentry to set acl method Linux commit v6.1-rc1-4-g7420332a6ff4 fs: add new get acl method get_acl() and set_acl() have new signatures Test-Parameters: trivial HPE-bug-id: LUS-11556 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I1de02f86fd2719fc75de4f014f51d73736d83c33 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50193 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14555 lnet: asym route inconsistency warning remove LNET_UNDEFINED_HOPS from lnet_check_route_inconsistency() where it is being treated as equivalent to 1 for the value of lr_hops. Due to the changes made in commit 3f2844dc9 "LU-14945 lnet: don't use hops to determine the route state", LNET_UNDEFINED_HOPS is no longer considered equivalent to 1 for lr_hops in all cases, and it is valid to leave hops undefined for multi-hop routes. Therefore, having a multi-hop route with a hops of LNET_UNDEFINED_HOPS is no longer inconsistent. Fixes: 6ab060e58e ("LU-14555 lnet: asym route inconsistency warning") Test-Parameters: trivial Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov> Change-Id: Iab8597f59c5f8d27b16dbeda79b41e9ec4777f52 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49352 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: router_discover - handle large addrs in ping lnet_router_discover_ping_reply() now considers the large nids in the ping message. Test-Parameters: trivial testlist=sanity-lnet Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: Ia67bcf2b09c976d9e4bf49a409e0d7bffe778ba4 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44631 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: change lnet_notify() to take struct lnet_nid lnet_notify() now takes a 'struct lnet_nid *' instead of a lnet_nid_t. Test-Parameters: trivial testlist=sanity-lnet Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I4c3ab0eea5202028ee881eee04bdd1014f7f150d Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44633 Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-15595 tests: Router test interop check and aarch fix setup_router_test() executes load_lnet() on remote nodes, but this function was only added in 2.15. Add a version check for it. Enabling routing may fail on nodes with small amount of memory (like aarch config). Define small number of router buffers to work around this issue. Modify the functions which calculate the number of buffers to allow small sizes to be specified via parameters. Test-Parameters: trivial testlist=sanity-lnet serverversion=2.12.9 Test-Parameters: testgroup=review-ldiskfs-arm testlist=sanity-lnet Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: If0b76747fe09e883546f18da9f3322c72263e29d Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48578 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15595 lnet: Always use ping reply to set route lr_alive We currently process discovery ping replies in different ways depending on whether the gateway has discovery enabled or disabled (or the local peer doing the processing has discovery enabled or disabled). When DD is disabled we process the ping reply to set the lr_alive field of lnet_route because the peer objects for non-MR routers do not contain all the information needed to calculate the route aliveness when a message is being sent. When DD is enabled then we don't do any special processing of the ping reply. We simply let discovery update the NI status for the GW's peer NIs and then we calculate the route aliveness on every send. We issue discovery pings to routers every alive_router_check_interval seconds (default 60), but we calculate route aliveness on every send to a remote network (1000s of times per seconds). Thus, it is better to slightly duplicate the effort expended when we receive a discovery reply so that we can avoid calculating route aliveness on every send. Since both lr_alive and hop type are being set on each ping reply, for both DD enabled and disabled cases, we can remove the code for updating lr_alive and hop type from lnet_router_discovery_complete(). If discover encounters a fatal error, we still set the status of each peer NI, as well as all routes, to down in lnet_router_discovery_complete(). Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: If4838c269a89885ba3763f62847e294804edf62e Reviewed-on: https://review.whamcloud.com/46624 Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15930 lnet: Remove duplicate checks for peer sensitivity Callers of lnet_inc_lpni_healthv_locked() and lnet_dec_healthv_locked() currently check whether the parent peer has a peer specific sensitivity defined. To remove this code duplication, this logic is rolled into lnet_inc_lpni_healthv_locked() and lnet_dec_lpni_healthv_locked(). The latter is a new wrapper around lnet_dec_healthv_locked(). lnet_dec_healthv_locked() is changed to return a bool indicating whether the health value was actually modified so that the peer net health is only updated when the peer NI health actually changes. Test-Parameters: trivial testlist=sanity-lnet HPE-bug-id: LUS-11018 Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I624561167392ad625ea7478689e9c5975cec3f2e Reviewed-on: https://review.whamcloud.com/46626 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: change ni_status in lnet_ni to u32* struct lnet_ni.ni_status points to a 'struct lnet_ni_status', but only the ns_status field of that structure is ever accessed. Change ni_status to point directly to just the ns_status field. This will provide flexibility for introducing a variant for 'struct lnet_ni_status' which holds a large-address nid. Test-Parameters: trivial testlist=sanity-lnet Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I5570608e98bc2aa1156b8d885df2a56f8ae7b6f7 Reviewed-on: https://review.whamcloud.com/44626 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Tested-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14555 lnet: asym route inconsistency warning lnet_check_route_inconsistency() checks for inconsistency between the lr_hops and lr_single_hop values of a route. A warning is currently emitted if the route is not single hop and the hop count is either 1 or LNET_UNDEFINED_HOPS. To emit the warning, add the requirement that avoid_asym_router_failure is enabled. Test-Parameters: trivial Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov> Change-Id: Iaa26d25492e49b569ae5e81da9f00f162be3da59 Reviewed-on: https://review.whamcloud.com/46918 Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 lnet: use list_first_entry() in lnet/lnet subdirectory. Convert list_entry(foo->next .....) to list_first_entry(foo, ....) in 'lnet/lnet' In several cases the call is combined with a list_empty() test and list_first_entry_or_null() is used Test-Parameters: trivial testlist=sanity-lnet Change-Id: I45e1bdfe41854c88af98ebf24797f72a68b11dc3 Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/47488 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: discard some peer_ni lookup functions lnet_nid2peerni_locked(), lnet_peer_get_ni_locked(), lnet_find_peer4(), and lnet_find_peer_ni_locked() each have few users left and that can call be change to use alternate versions which take 'struct lnet_nid' rather than 'lnet_nid_t'. So convert all those callers over, and discard the older functions. Test-Parameters: trivial testlist=sanity-lnet Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I9f0ebd0631c2e4160c3198aa37f16b45027bce3d Reviewed-on: https://review.whamcloud.com/44624 Reviewed-by: James Simmons <jsimmons@infradead.org> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-12756 lnet: Avoid redundant peer NI lookups Each caller of lnet_peer_ni_traffic_add() performs a subsequent call to lnet_peer_ni_find_locked(). We can avoid the extra lookup by having lnet_peer_ni_traffic_add() return a peer NI pointer (or ERR_PTR as appropriate). lnet_peer_ni_traffic_add() now takes a ref on the peer NI to mimic the behavior of lnet_peer_ni_find_locked(). lnet_nid2peerni_ex() only has a single caller that always passes LNET_LOCK_EX for the cpt argument, so this function argument is removed. Some duplicate code dealing with ln_state handling is removed from lnet_peerni_by_nid_locked() Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I8e9e2449ef2b958b53abd59cd2c122e5492fbb34 Reviewed-on: https://review.whamcloud.com/36623 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13714 lnet: only update gateway NI status on discovery Move the NI status from DOWN to UP only when receiving a discovery PING. The discovery PING should be the only message which should update the NI status since it's used as the gateway NI keep alive mechanism. This is done to avoid the following scenario: The gateway itself can push its updates to the peers which have removed it from its routing table. The peers would respond to the PUSH with an ACK, the ACK will bring the gateway's NI status to up. Therefore other peers which have avoid_asym_router_failure=1 will have their route status remain up even though the symmetrical route is gone. Note: there is no way for the gateway to differentiate between a keep alive discovery and a manually triggered discovery or ping. However, this a narrow case which will not be handled. net_last_alive converted to use ktime_get_seconds() instead of ktime_get_real_seconds() since the NTP adjustment is not needed. Test-Parameters: trivial Signed-off-by: Amir Shehata <ashehata@whamcloud.com> Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: Ifd5b06d4cf783b68b36413ada63f0a1d0095fb5b Reviewed-on: https://review.whamcloud.com/39176 Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: change lnet_del_route() to take lnet_nid The gateway NID passed to lnet_del_route is now a struct lnet_nid. Instead of passing LNET_NID_ANY as a wildcard, we pass a NULL pointer. Test-Parameters: trivial Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I1243be20d9f40e4ac3ebc6ec5dd9bbcbae6653c3 Reviewed-on: https://review.whamcloud.com/43615 Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: alter lnd_notify_peer_down() to take lnet_nid The lnd_notify_peer_down() interface now takes a large nid. Test-Parameters: trivial Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I9926caf0508ff257e9e64d5537597addbce657d7 Reviewed-on: https://review.whamcloud.com/43608 Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15149 lnet: Missing newline in lnet_add_route CWARN string is missing a newline character. Test-Parameters: trivial Fixes: 3f2844dc93 ("LU-14945 lnet: don't use hops to determine the route state") Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I06370c36e9d88b7e02e000bfb573297ff281aef1 Reviewed-on: https://review.whamcloud.com/45340 Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-2084 lnet: don't retry allocating router buffers Don't loop indefinitely trying to allocate router buffer pools if the number of requested buffers is too large for the system. Test-Parameters: trivial Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Change-Id: Ic0f2ccf0f7b38dfa254e46e268b27092342efdb5 Reviewed-on: https://review.whamcloud.com/45174 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14945 lnet: don't use hops to determine the route state NodeA <-tcp1-> GW1 <-tcp2-> GW2 <-tcp3-> NodeB Assuming GW1 knows how to reach tcp3 network and GW2 knows how to reach tcp1 network, it should be possible to add routes without specifying hop=2 on nodes A and B to reach tcp3 and tcp1 respectively and then be able to lnetctl ping between them. Changes introduced by LU-13785 interpret default hops to be equivalent to hop=1 set explicitly for the purpose of determining route aliveness, which results in the routes created as described above to be considered "down". Fix it so that default hop setting doesn't prevent the multi-hop scenario from working. Test-Parameters: trivial Fixes: 2e07619477 ("LU-13785 lnet: Use lr_hops for avoid_asym_router_failure") Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I341ccdfe156434b0cb306359acc91a9193b44f7b Reviewed-on: https://review.whamcloud.com/44674 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Amir Shehata <ashehata@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: change lr_nid to struct lnet_nid The nid in 'struct lnet_route' is now a struct lnet_nid'. Test-Parameters: trivial Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests Test-Parameters: clientversion=2.12 testlist=runtests Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I2e2f2e9c8d2cbdbc87b408ee4589952f2df02880 Reviewed-on: https://review.whamcloud.com/43593 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>