LU-9859 lnet: move CPT handling to LNet The CPT work is used for LNet and ptlrpc which is the Lustre LNet interface. Move this work there and merge the lib-mem.c code as well since they both work closely together. Move cpt debugfs handling from libcfs to lnet. Now all remaining debugfs in libcfs is for debugging. Test-Parameters: trivial Change-Id: I016a90520bd7c6428b45bafff8618bc864e9112b Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52923 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-9680 lnet: add NLM_F_DUMP_FILTERED support In addition to different API levels for the netlink packets we can also filter the data sent back when user land sends the NLM_F_DUMP_FILTERED. Support this across the various netlink dumpit functions. This work is needed for the proper support for lnetctl export command. Update the export to work with the Netlink API. This results in proper IPv6 support for the export command. Test-Parameters: trivial testlist=sanity-lnet Change-Id: I0e8993b1f9a08199f282965601781aa6fd0e4844 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53004 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Chris Horn <chris.horn@hpe.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14518 libcfs: print CFS_FAIL_CHECK() location Print the file/function/line where cfs_fail_loc is hit. This allows better debugging of issues with this code. This adds the CDEBUG_LIMIT_LOC() macro to allow printing the location passed to the caller instead of the function, file, and line number where the macro is located. Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Change-Id: Ieadace61b014d3576c0535f181256c728c7ec6f8 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53451 Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17081 build: compatibility for 6.5 kernels Linux commit v6.4-rc2-29-gc6585011bc1d splice: Remove generic_file_splice_read() Prefer filemap_splice_read and provide alternates for older kernels. Linux commit v6.4-rc2-30-g3fc40265ae2b iov_iter: Kill ITER_PIPE ITER_PIPE and iov_iter_is_pipe() are removed, provide a replacement for iov_iter_is_pipe Linux commit v6.4-rc4-53-g54d020692b34 mm/gup: remove unused vmas parameter from get_user_pages() Use vma_lookup() to acquire the vma following get_user_pages() Linux commit v6.4-rc7-1884-gdc97391e6610 sock: Remove ->sendpage*() in favour of sendmsg(MSG_SPLICE_PAGES) Use sendmsg when MSG_SPLICE_PAGES is defined. Provide a wrapper using sendpage() for older kernels. HPE-bug-id: LUS-11811 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I95a0954a602c8db08d30b38a50dcd50107c8f268 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52258 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Jian Yu <yujian@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: xinliang <xinliang.liu@linaro.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17418 libcfs: support debug setup for libcfs modules Work was landed to make Lustre ensure key libcfs components were initialized for both a module build and a build directly into the kernel. This change resulted in an defect that allows you to crash a node when you only load libcfs.ko and run a user land tool to set a debugfs setting of libcfs. The debug handling is critical to load before anything. Update Lustre to handle both a module and builtin setup for Lustre. When lustre is built into the kernel we can't control if libcfs_init() is called first so have libcfs_setup() handle setting up the debug handling. When built as a module have libcfs_init() setup the debug handling instead. For both cases libcfs_debug_init() is always called so make sure we only initialize it only once. Add a test to validate this fix. Fixes: f3494a6e9 ("LU-9859 libcfs: refactor libcfs initialization.") Test-Parameters: trivial testlist=conf-sanity env=ONLY="5j" Change-Id: If4a229e43b9e06a723546c03eb2b787ba0b16f5a Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53825 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17495 build: cleanup configure messages Convert some remaining configure checks to use LB2_MSG_LINUX_TEST_RESULT Also drop the undefined macro LC_CONFIG_HEALTH_CHECK_WRITE Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: If0ae4f7549d5e1a46d6a5ce99d40ebcbd76c5e85 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53874 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16314 debug: Enable optional unhashed pointers This patch takes a page out of the kernel trace debug playbook to rewrite format strings and change %p -> %px on-the-fly when: libcfs_debug_raw_pointers is enabled. The module parameter can be viewed and modified by root via lctl: lctl get_param debug_raw_pointers lctl set_param debug_raw_pointers=1 Since nothing uses the return value from libcfs_debug_msg change it to void. Use percpu pre-allocated buffers for holding modified format strings to avoid kmalloc/kfree as well as avoid bloating stack usage. HPE-bug-id: LUS-10945 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I63d90d614ce4435b07f5e84991a12ae7351ac2bb Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51877 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
LU-16314 lnet: Migrate LASSERTF %p to %px This change covers libcfs and lnet and converts LASSERTF statements to explicitly use %px. Use %px to explicitly report the non-hashed pointer value messages printed when a kernel panic is imminent. When analyzing a crash dump the associated kernel address can be used to determine the system state that lead to the system crash. As crash dumps can and are provided by customers from production systems the use of the kernel command line parameter: no_hash_pointers is not always possible. Ref: Documentation/core-api/printk-formats.rst Test-Parameters: trivial HPE-bug-id: LUS-10945 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I4d0c956e1b914cea9517b632d46f1714bcd43a85 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51231 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17242 debug: use dump_stack() where possible In some cases, libcfs_debug_dumpstack() can fail to output a stack trace - either because the needed symbols are not exported or those symbols can't be resolved at runtime. This seems to occur more often with newer kernels. The messages appears only as: Lustre: ldlm_cb01_002: service thread pid 57876 was inactive for 40.494 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 57876, comm: ldlm_cb01_002 6.1.70 #1 SMP PREEMPT_DYNAMIC Thu Jan 4 18:52:41 UTC 2024 Call Trace TBD: with no stack trace (seen on CentOS 8.5 with ml 6.1.70). For reference, the runtime symbol lookup was added and updated in: b49ce7a ("LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK") 58ac9d3 ("LU-14099 build: Fix for unconfigured arch_stackwalk") First, add a message when the symbol can't be resolved correctly. This makes it much easier to understand why the stack trace is missing. Second, replace libcfs_debug_dumpstack(NULL) with dump_stack(). When the task_struct is NULL, libcfs uses the current task_struct. This replicates the functionality of dump_stack(). Using dump_stack() is more reliable, more in line with kernel style, and not likely to be un-exported in the future. Finally, in lustre/osc/osc_object.c the stack isn't dumped since there is already an LBUG(). There only remains one user of libcfs_debug_dumpstack() which uses a task_struct other than current. This can be cleaned up in a future patch. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: I196c1da7e39b1a694c0cb67ecfaab58ab3e4662c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53625 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16796 ldlm: Change struct ldlm_resource to use refcount_t This patch changes struct ldlm_resource and struct nrs_tbf_client to use refcount_t instead of atomic_t This patch also only changes spaces to tabs which were close to lines of code being changed. Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: Ic15f27bc6281725f00bddc465668f81291aad6ec Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53416 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17173 gss: user keys go to user keyring Keys for root, that are used for Lustre internal processing, are stored in the session keyring. That way they can be found by all Lustre processes in userspace and in the kernel. For end user keys, it is better to store them in the user keyring. This simplifies key management, makes them shared accross all user sessions, and avoids unfortunate key leak if lfs flushctx is not called at user logout. Test-Parameters: kerberos=true testlist=sanity-krb5 Signed-off-by: Sebastien Buisson <sbuisson@ddn.com> Change-Id: Ibb3d326e89dcacc89e77eca76cdb773861d3a8a7 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52771 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Aurelien Degremont <adegremont@nvidia.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17394 libcfs: print cfs_fail_val when fail_loc hit Add some more information to the console message when fail_loc is hit. Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Change-Id: I99fe4524f3764b068c96965c0b86bd4d7b341707 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53585 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
LU-17186 utils: replace gethostby*() with get*info() This patch replaces the deprecated gethostbyname() and gethostbyaddr() functions with getaddrinfo() and getnameinfo() functions respectively. The getaddrinfo() function combines the functionality provided by the gethostbyname() and getservbyname() functions into a single interface, but unlike the latter functions, getaddrinfo() is reentrant and allows programs to eliminate IPv4-versus-IPv6 dependencies. The getnameinfo() function is the inverse of getaddrinfo(): it converts a socket address to a corresponding host and service, in a protocol-independent manner. It combines the functionality of gethostbyaddr() and getservbyport(), but unlike those functions, getnameinfo() is reentrant and allows programs to eliminate IPv4-versus-IPv6 dependencies. Test-Parameters: kerberos=true testlist=sanity-krb5 Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 Signed-off-by: Jian Yu <yujian@whamcloud.com> Signed-off-by: Sebastien Buisson <sbuisson@ddn.com> Change-Id: Iacb5583826cd2f7329455bc6cbb4477f9087f15a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52632 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16502 lutf: add proper config option and fix bugs LUTF did not have a proper configuration option. Since no message was printed at configure time, this made it hard to debug why LUTF was not being built. Fix a few minor bugs in headers that prevented shared libraries from being `import`ed by python. Fix a small Clang error in liblutf_agent.c. Test-Parameters: @lnet Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: I6680b203bef08b7afa326a1cbe30c96b5c29e95c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53200 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10973 lutf: Fix order of linking for python modules LUTF normally works but in some test cases at startup we got: ImportError: lustre/test/lutrf/src/_lnetconfig.so: undefined symbol: lustre_lnet_del_ni If you check the symbol is there. The issue is the linking order. We need to put the generated module name before all its dependencies. Also, remove cfs_expr_list_match from string.h, move the definition to nidstrings.c, and make it static. Test-Parameters: @lnet Change-Id: Ia57fbd9d5795d845ea14bc1416f968383afcba2b Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Timothy Day <timday@amazon.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46478 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com> Reviewed-by: Frank Sehr <fsehr@whamcloud.com> Reviewed-by: Cyril Bordage <cbordage@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17174 misc: fix hash functions 1) LU-16518 landing caused a bug which visible with debug kernel UBSAN: Undefined behaviour in include/linux/hash.h:81:31 shift exponent 64 is too large for 64-bit type 'long long unsigned int' Call Trace: dump_stack+0x8e/0xd0 ubsan_epilogue+0x5/0x21 ldlm_export_lock_hash+0x49/0x4d [ptlrpc] cfs_hash_bd_from_key+0x88/0x2e0 [libcfs] 2) use a high bits unstead of low as it more accurate. HPe-bug-id: LUS-11925 Fixes: 239e8268 (LU-16518 misc: use fixed hash code) Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Change-Id: Ie1c531ad220f44e55fbf80674a49472fb6024252 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52611 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com>
LU-9859 libcfs: refactor libcfs initialization. Many lustre modules depend on libcfs having initialized properly, but do not explicit check that it did. When lustre is built as discrete modules, this does not cause a problem because if the libcfs module fails initialization, the other modules don't even get loaded. When lustre is compiled into the kernel, all module_init() routines get run, so they need to check the required initialization succeeded. This patch splits out the initialization of libcfs into a new libcfs_setup(), and has all modules call that. The misc_register() call is kept separate as it does not allocate any resources and if it fails, it fails hard - no point in retrying. Other set-up allocates resources and so is best delayed until they are needed, and can be worth retrying. Ideally, the initialization would happen at mount time (or similar) rather than at load time. Doing this requires each module to check dependencies when they are activated rather than when they are loaded. Achieving that is a much larger job that would have to progress in stages. For now, this change ensures that if some initialization in libcfs fails, other modules will fail-safe. Linux-commit: 64bf0b1a079d61e9e059b9dc7a58e064c7d994ae Change-Id: I6b5ecdba0defc6e033f78d8fc2b9be9e26c7f720 Signed-off-by: Mr. NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52700 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17242 debug: remove CFS_CHECK_STACK CFS_CHECK_STACK is primitive, doesn't work on x86_64, and only dumps a stack in kernel log when we are fairly close to passing the stack limit anyway. Admins and developers can grab the same info from debug/tracing/stack_trace and debug/tracing/stack_max_size on a live system. And the kernel will dump a stack if it 'Oops' from going over the stack limit. We don't need an additional Lustre specific stack checking mechanism. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Icc7c82f6a0dcd727de6ce2c2d40ba071ee349c0c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52883 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17203 libcfs: ignore remaining items remove the assertion checking libcfs hashtable for emptiness in cfs_hash_for_each_empty(). the only user of this hashtable is per-export ldlm locks set. in this case it's legal that some locks can't be removed from the hashtable being in the process of enqueuing. the hashtable is destroyed from the export destroy function which in turn is called only when all RPCs on this export are done (exp_rpc_count==0). Fixes: 306a9b666e ("LU-16272 libcfs: cfs_hash_for_each_empty optimization") Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I2b853b017bb7247a0c60cc8f464c2e08d649f0eb Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52726 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-9859 libcfs: migrate libcfs_mem.c to lnet/lib-mem.c Move the libcfs_mem.c code to the LNet core. The prototypes are declared in libcfs_cpu.h but we don't move them yet since the CPT code depends on the libcfs_mem.c work. This can end up in a modular cyclic dependency if we move the CPT work right away so limit what is changed at this point. Test-Parameters: trivial Change-Id: I6bf5cd9f20033f988dde1989f0fc5f89ea74b5a2 Signed-off-by: James Simmons <jsimmons@infradead.org> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52701 Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>