LU-10391 lnet: support updating LNet local NI settings The LNet API allows updating specific settings instead of a full new configuration for NIs. We can accomplish this using NLM_F_REPLACE with the LNET_CMD_NETS command. The only change for the user land tools is now you can use large NID addresses. Another change in the user land tools is increasing intf_name field in size from IFNAMSIZ to LNET_MAX_STR_LEN which requires increasing err_str handling. This is because we use struct lnet_dlc_intf_descr both to store network addresses or / and network interfaces. Test-Parameters: trivial testlist=sanity-lnet Change-Id: Id334ed3a73ac6ec7a342d4616e32dcfef46907a7 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53560 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17258 socklnd: stop connecting on too many retries If peer repeatedly rejects connection requests with EALREADY, assume that it doesn't support as many connections as we're trying to create. Make sure to stop connecting to the peer altogether and either continue with already created connections if there's at least one of each type, or fail. This helps avoid the assertion: "ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed" Test-Parameters: trivial testlist=sanity-lnet Fixes: 5afe3b053 ("LU-17258 socklnd: ensure connection type established upon race") Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I6072e91cc36544fc2f56c91cd78f6637cf82ecbc Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53955 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT Returning LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet on ETIMEDOUT causes LNet to only decrement the local NI health score, while the issue may actually be with the remote NI. Changing this to return LNET_MSG_STATUS_NETWORK_TIMEOUT causes LNet to decrement both local NI and peer NI health. If local NI is ok, it will recover its health score quickly, but the affected peer NI health is lowered until peer NI is recovered. This helps LNet select healthy NIs of the same peer in the meantime. Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I916772477d1fd63571447262880a33830746f002 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53930 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17081 build: compatibility for 6.5 kernels Linux commit v6.4-rc2-29-gc6585011bc1d splice: Remove generic_file_splice_read() Prefer filemap_splice_read and provide alternates for older kernels. Linux commit v6.4-rc2-30-g3fc40265ae2b iov_iter: Kill ITER_PIPE ITER_PIPE and iov_iter_is_pipe() are removed, provide a replacement for iov_iter_is_pipe Linux commit v6.4-rc4-53-g54d020692b34 mm/gup: remove unused vmas parameter from get_user_pages() Use vma_lookup() to acquire the vma following get_user_pages() Linux commit v6.4-rc7-1884-gdc97391e6610 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) Use sendmsg when MSG_SPLICE_PAGES is defined. Provide a wrapper using sendpage() for older kernels. HPE-bug-id: LUS-11811 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I95a0954a602c8db08d30b38a50dcd50107c8f268 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52258 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: xinliang <xinliang.liu@linaro.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17271 kfilnd: Allocate tn_mr_key before kfilnd_peer A race exists between kfilnd_peer and tn_mr_key allocation that could result in RKEY re-use and data corruption. Thread 1: Posts tagged receive with RKEY based on peerA::kp_local_session_key X and tn_mr_key Y Thread 2: Fetches peerA with kp_local_session_key X Thread 1: Cancels tagged receive, marks peerA for removal, and releases tn_mr_key Y Thread 2: allocates tn_mr_key Y At this point, thread 2 has the same RKEY used by thread 1. The fix is to always allocate the tn_mr_key before looking up the peer, and always mark peers for removal before releasing tn_mr_key. This commit modifies the TN allocation to ensure the tn_mr_key is allocated before looking up the target peer. HPE-bug-id: LUS-11972 Test-Parameters: trivial Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I2e0948ae4fe7c5dfb86e297a3437213f193bf67c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53029 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com> Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17271 kfilnd: Protect RKEY for bulk Put/Get The initiator of a bulk Put/Get generates an RKEY based on the the values of the struct kfilnd_tn::tn_mr_key and struct kfilnd_peer::kp_local_session_key. kp_local_session_key is assigned at peer creation, and tn_mr_key is assigned when the kfilnd_tn is allocated. A bulk Put/Get can fail in various ways such that the target of the operation may have a reference to the RKEY, but the originator cannot know the state of the operation at the target. In these cases, the initiator must ensure that the RKEY is not re-used. To accomplish this, we need to delete the target peer from the originator's peer cache to ensure that subsequent bulk Put/Get operations will use a new kp_local_session_key, and thus avoid re-using any old RKEY values. HPE-bug-id: LUS-11972 Test-Parameters: trivial Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: If270a2df745ee88c35addc8194cdb160cb373c3e Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53028 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com> Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17270 kfilnd: Check status of TAG_RX_OK in WAIT_COMP When the target of a bulk Get/Put drops the message it sends ENODATA back to the initiator via immediate data. This status needs to be accounted for while the transaction is in the TN_STATE_WAIT_COMP state, otherwise it can be lost if the TN_EVENT_TAG_RX_OK event arrives before the TN_EVENT_TX_OK event. HPE-bug-id: LUS-11971 Test-Parameters: trivial Signed-off-by: Chris Horn <chris.horn@hpe.com> Change-Id: I52d6ea52746cbc14a86478fcccb32b25badd3b0a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53027 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com> Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16967 build: Separate lnet LND deb packaging Enable separate packaging of lnet lnd kernel modules into separate packages with build profile multiple-lnds: lustre-lnet-module-socklnd for socklnd.ko lustre-lnet-module-gnilnd for kgnilnd.ko, profile gnilnd lustre-lnet-module-kfilnd for kkfilnd.ko, profile kfilnd lustre-lnet-module-o2iblnd for o2iblnd.ko, profile ext_o2ib lustre-lnet-module-in-kernel-o2iblnd for ko2iblnd.ko, profile int_o2ib Test-Parameters: trivial HPE-bug-id: LUS-11711 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I3a5ca03fa410238f66083289db0899c8b4bfab5c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52397 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16314 lnet: Migrate LASSERTF %p to %px This change covers libcfs and lnet and converts LASSERTF statements to explicitly use %px. Use %px to explicitly report the non-hashed pointer value messages printed when a kernel panic is imminent. When analyzing a crash dump the associated kernel address can be used to determine the system state that lead to the system crash. As crash dumps can and are provided by customers from production systems the use of the kernel command line parameter: no_hash_pointers is not always possible. Ref: Documentation/core-api/printk-formats.rst Test-Parameters: trivial HPE-bug-id: LUS-10945 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I4d0c956e1b914cea9517b632d46f1714bcd43a85 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51231 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17189 o2ib: assign tx_gpu properly tx_gpu is not assigned or initialized properly. Test-Parameters: trivial Fixes: f792297212 ("LU-16211 o2iblnd: Avoid NULL md deref") Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com> Change-Id: I5e14d66f41f6194203fec7832493efd432b54c36 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52702 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17000 lnet: Fix dereference after NULL under ksocknal_recv_hello This patch fixes 'conn->ksnc_proto' which was dereferenced under function ksocknal_recv_hello() even though it could be NULL. This patch also removes 'returns' in between the function and replaces it with 'goto'. Allowing exit from a single place. CoverityID: 410244 ("Dereference after null check") Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Fixes: cb5f92c0e (LU-10391 ksocklnd: use ksocknal_protocol v4 for IPv6) Change-Id: I95196d481b537281ab8643f1ee6162db450bef20 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53305 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn There were examples in the field with RoCE setups which demonstrate that CM_EVENT_UNREACHABLE may be received when connection is already in ESTABLISHED state. This causes an assert in kiblnd_cm_callback to fail. Handle this in a more gracious manner: report the event as unexpected and allow the flow to continue. If there are indeed issues on the connection, it is expected to report transaction errors later and get cleaned up without crashing the whole system. Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: If32166fe9fc59e025609c2035cb1c03d3bed22f2 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53298 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16967 build: Add in-kernel-ko2iblnd driver Add in-kernel-ko2iblnd.ko for users of in-kernel OFED and only build ko2iblnd.ko if an external OFED is available This allows for building and packaging both an external (MOFED or OPA) o2ib driver and an in-kernel o2ib driver. Packaging rules will be written so that only enable one of the o2iblnd drivers can be installed. In the case of the in-kernel-ko2iblnd.ko driver a symlink named ko2iblnd.ko will be created to point to the in-kernel based o2ib driver which allows for a reasonable migration path for the majority of users. It is useful for dist build and test to be able to build both in-kernel IB and external OFED in the same build. This also means there would be some install/configure adjustments that ought to have some discussion. Test-Parameters: trivial HPE-bug-id: LUS-11711 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I8105fad0b20c36705d7e14e3ae976bf3d81e9f1b Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51915 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17230 socklnd: treat UNKNOWN netif operstate as UP "UNKNOWN" (IF_OPER_UNKNOWN) operational state doesn't necessarily mean that the interface can't be used and may be the result of particular network driver not providing UP/DOWN states, so it may be incorrect for socklnd to initiate setting of a "fatal error" flag on a NI using an interface in "UNKNOWN" operstate. Test-Parameters: trivial testlist=sanity-lnet Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com> Change-Id: I39dfa01f3758809440d50cf8b6b11555889ef366 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52842 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
LU-10391 socklnd: handle IPv6 for zero copy messages When messages exceed a certain size zero copy messages are created. To support zero copy messages We need to add KSOCK_PROTO_V4 support. This resolves the error: LNetError: 5978:0:(socklnd_cb.c:1237:ksocknal_process_receive()) 12345-2601:8c1:c180:2000::36b6@tcp: Unknown ZC-ACK cookie: 0, 272 Test-Parameters: trivial testlist=sanity-lnet Change-Id: I4bc3d03cc5157a0f6ddb1e36ddeac225ed5d0984 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53150 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Nathaniel Clark <nclark@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-9859 libcfs: refactor libcfs initialization. Many lustre modules depend on libcfs having initialized properly, but do not explicit check that it did. When lustre is built as discrete modules, this does not cause a problem because if the libcfs module fails initialization, the other modules don't even get loaded. When lustre is compiled into the kernel, all module_init() routines get run, so they need to check the required initialization succeeded. This patch splits out the initialization of libcfs into a new libcfs_setup(), and has all modules call that. The misc_register() call is kept separate as it does not allocate any resources and if it fails, it fails hard - no point in retrying. Other set-up allocates resources and so is best delayed until they are needed, and can be worth retrying. Ideally, the initialization would happen at mount time (or similar) rather than at load time. Doing this requires each module to check dependencies when they are activated rather than when they are loaded. Achieving that is a much larger job that would have to progress in stages. For now, this change ensures that if some initialization in libcfs fails, other modules will fail-safe. Linux-commit: 64bf0b1a079d61e9e059b9dc7a58e064c7d994ae Change-Id: I6b5ecdba0defc6e033f78d8fc2b9be9e26c7f720 Signed-off-by: Mr. NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52700 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lnet: support setting LND timeouts The patch that added support for NI setup with Netlink was developed before individual LND timeout settings support was merged. Add this missing settings. For ksocklnd we already supported conns_per_peer so rearrange the code into a switch statement. Test-Parameters: trivial testlist=sanity-lnet Fixes: 8f8f6e2f36e ("LU-10003 lnet: use Netlink to support old and new NI APIs.") Change-Id: Iba955da7f5fa78b8a624bab6af66b577c75917e0 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53013 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17259 lnet: kgnilnd_nl_get should return 0 on success Fix build failure error: control reaches end of non-void function [-Werror=return-type] Test-Parameters: trivial Fixes: d15bfca078 ("LU-10391 lnet: migrate full LNet NI information collection") Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I09dd76c46620107d6c3f89cf59b9d9190578ef60 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52972 Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17258 socklnd: ensure connection type established upon race When a connection race is hit between two peers, only increment the retry count if a connection of the specific type has already been established; otherwise, this can lead to an unexpected value set in ksnr_connected and some of the assertions being triggered in ksocknal_connect(): "ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed" Fixes: da893c6c97 ("LU-16191 socklnd: limit retries on conns_per_peer mismatch") HPE-bug-id: LUS-11922 Signed-off-by: Chris Horn <chris.horn@hpe.com> Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com> Change-Id: I6e8abb39ad3c0bcd7fbc8f8c5478c903029df908 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52957 Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com>
LU-17242 debug: remove CFS_CHECK_STACK CFS_CHECK_STACK is primitive, doesn't work on x86_64, and only dumps a stack in kernel log when we are fairly close to passing the stack limit anyway. Admins and developers can grab the same info from debug/tracing/stack_trace and debug/tracing/stack_max_size on a live system. And the kernel will dump a stack if it 'Oops' from going over the stack limit. We don't need an additional Lustre specific stack checking mechanism. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Icc7c82f6a0dcd727de6ce2c2d40ba071ee349c0c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52883 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>