LU-17022 obdclass: convert obd_conn_inprogress to atomic_t Using atomic_t for obd_conn_inprogress means we don't need to take a spinlock. Also send wakeup when value reaches zero, and wait for the wakeup instead of using a yield() loop. Change-Id: I9af29e068203cde951e592c408906d121702fa18 Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51906 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17484 gss: reply error for SEC_CTX_INIT on wrong node When a server receives a SEC_CTX_INIT request for a target that is not available (either stopping, or not set up yet, or moved to a failover node), the request gets dropped. This makes the client-side RPC time out, increasing the time it takes to establish a proper gss context with the target, because it slows down the HA mechanism that tries alternate failover NIDs. Instead of dropping the request reply for SEC_CTX_INIT, the server needs to send back a proper error reply. The client will then be able to immediately try alternate failover NIDs, speeding mount/reconnect process up, and avoiding potential eviction. Test-Parameters: trivial Test-Parameters: kerberos=true testlist=sanity-krb5 Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 Signed-off-by: Sebastien Buisson <sbuisson@ddn.com> Change-Id: Id2cefaa7d54729a63c7be13b65d7ace579bcaa78 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53970 Reviewed-by: Aurelien Degremont <adegremont@nvidia.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-17415 ldlm: lock conversion to skip cancelled locks ldlm_cli_inodebits_convert() should re-check the lock so it's not being cancelled to skip such locks and avoid an assertion: LustreError: 15208:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist()) ASSERTION( ldlm_is_granted(lock) ) failed: Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: If212931d8fa6a2d8f56c44714de830d5fb4a9a6b Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53645 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 ldlm: Fix style issues for ldlm folder This patch fixes issues reported by checkpatch for files under folder lustre/ldlm/ Test-Parameters: trivial Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I3c15c6a6e3d21bce9c8609e60ec481b484f00480 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54003 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 ldlm: Fix style issues for ldlm_lock.c This patch fixes issues reported by checkpatch for file lustre/ldlm/ldlm_lock.c Test-Parameters: trivial Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I492eacb0bf8033a78f1001a350c9fe4258729693 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54002 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17276 ldlm: add interval in flock Add necessary changes for using interval tree in flock. Signed-off-by: Yang Sheng <ys@whamcloud.com> Change-Id: I94c416b4215b863b54eccfe7025f2976fe40181a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53447 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16314 llite: Migrate LASSERTF %p to %px This change covers lustre/ec through lustre/mgs and converts LASSERTF statements to explicitly use %px. Use %px to explicitly report the non-hashed pointer value messages printed when a kernel panic is imminent. When analyzing a crash dump the associated kernel address can be used to determine the system state that lead to the system crash. As crash dumps can and are provided by customers from production systems the use of the kernel command line parameter: no_hash_pointers is not always possible. Ref: Documentation/core-api/printk-formats.rst Test-Parameters: trivial HPE-bug-id: LUS-10945 Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com> Change-Id: I708d9ef60c63f5b4006c7986599a2f39fc9e5fdf Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51213 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17242 debug: use dump_stack() where possible In some cases, libcfs_debug_dumpstack() can fail to output a stack trace - either because the needed symbols are not exported or those symbols can't be resolved at runtime. This seems to occur more often with newer kernels. The messages appears only as: Lustre: ldlm_cb01_002: service thread pid 57876 was inactive for 40.494 seconds. The thread might be hung, or it might only be slow and will resume later. Dumping the stack trace for debugging purposes: Pid: 57876, comm: ldlm_cb01_002 6.1.70 #1 SMP PREEMPT_DYNAMIC Thu Jan 4 18:52:41 UTC 2024 Call Trace TBD: with no stack trace (seen on CentOS 8.5 with ml 6.1.70). For reference, the runtime symbol lookup was added and updated in: b49ce7a ("LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK") 58ac9d3 ("LU-14099 build: Fix for unconfigured arch_stackwalk") First, add a message when the symbol can't be resolved correctly. This makes it much easier to understand why the stack trace is missing. Second, replace libcfs_debug_dumpstack(NULL) with dump_stack(). When the task_struct is NULL, libcfs uses the current task_struct. This replicates the functionality of dump_stack(). Using dump_stack() is more reliable, more in line with kernel style, and not likely to be un-exported in the future. Finally, in lustre/osc/osc_object.c the stack isn't dumped since there is already an LBUG(). There only remains one user of libcfs_debug_dumpstack() which uses a task_struct other than current. This can be cleaned up in a future patch. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: I196c1da7e39b1a694c0cb67ecfaab58ab3e4662c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53625 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16796 ldlm: Change struct ldlm_resource to use refcount_t This patch changes struct ldlm_resource and struct nrs_tbf_client to use refcount_t instead of atomic_t This patch also only changes spaces to tabs which were close to lines of code being changed. Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: Ic15f27bc6281725f00bddc465668f81291aad6ec Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53416 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17078 ldlm: do not spin up thread for local cancels When doing lockless IO on the client, the server is responsible for taking LDLM locks for each IO. Currently, the server sends these locks to a separate thread for cancellation. This behavior is necessary on the client where a lock may protect a large number of cached pages, so cancelling it in a user thread may introduce unacceptable delays. But the server doesn't have cached pages, so it makes more sense for the server to do the cancellation in the same thread. We do this by not spinning up an ldlm_bl thread for cancellations of local (server side only) locks. This improves 4K DIO random read performance by about 9%. Without patch, maximum server IOPs on 4K reads: 2864k IOPS With patch: 3118k IOPS This is the maximum performance achieved with many clients and client threads doing 4K random AIO reads from different files. Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com> Change-Id: Ia996732780d278c5d0bc290c5484e3bc325a347a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52192 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-17278 ldlm: don't grant failed lock lock convert can re-grant lock if it loses some bits. this procedure can race with the import's invalidation. thus lock can become invalid (l_granted_mode=LCK_MINMODE): LustreError: 8637:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist()) ASSERTION( ldlm_is_granted(lock) ) Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I7bb20d62948224647d7632f2822fba44d39a7713 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53051 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17174 misc: fix hash functions 1) LU-16518 landing caused a bug which visible with debug kernel UBSAN: Undefined behaviour in include/linux/hash.h:81:31 shift exponent 64 is too large for 64-bit type 'long long unsigned int' Call Trace: dump_stack+0x8e/0xd0 ubsan_epilogue+0x5/0x21 ldlm_export_lock_hash+0x49/0x4d [ptlrpc] cfs_hash_bd_from_key+0x88/0x2e0 [libcfs] 2) use a high bits unstead of low as it more accurate. HPe-bug-id: LUS-11925 Fixes: 239e8268 (LU-16518 misc: use fixed hash code) Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com> Change-Id: Ie1c531ad220f44e55fbf80674a49472fb6024252 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52611 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com>
LU-13805 llite: Implement unaligned DIO connect flag Unupgraded ZFS servers may crash if they received unaligned DIO, so we need a compat flag and a test to recognize those servers. This patch implements that logic. Fixes: 7194eb6431 ("LU-13805 clio: bounce buffer for unaligned DIO") Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com> Change-Id: I5d6ee3fa5dca989c671417f35a981767ee55d6e2 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51126 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-17188 mdt: remove n from LDLM_DEBUG LDLM_DEBUG() doesn't need n in an extra message Test-Parameters: trivial Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com> Change-Id: I5a62cccb0a17b3f878206e8bbec6c1fbe07c4753 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52673 Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-8802 obd: remove MAX_OBD_DEVICES Remove this arbitrary limit by reimplementing the array as an Xarray. Xarray can grow and shink dynamically, hence saving memory and allow for many more OBD devices. There is still technically a limit OBD_MAX_INDEX, which is xa_limit_31b.max or around 2 billion. This is far more than is practically useful. This patch also adds various iterators for OBD devices, which are used to simplify code in various places. Removing class_obd_list() since it is unused. Rename class_dev_by_str() to class_str2obd() to keep the pattern. Several class_* functions have been refactored to improve locking. The larger issue of OBD device locking will be addressed separately. Update the OBD device lifecycle test to try loading more devices (about 24,000 for now). Currently, adding an additional OBD device is an O(n^2) operation due to the class_name2dev calls in class_register_device(). This will be addressed in a future patch adding a hash table for OBD device name lookups. Further, OBD life cycle management could likely be simplified by using Xarray marks. Right now, it is handled by a bit field in the obd_device struct. Since the scope of the changes needed to simplify this seem large, this will also be addressed separately. Test-Parameters: testlist=sanity env=ONLY=55,ONLY_REPEAT=10 Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Icb2cd94a5529e79f5d3ebd0de5e0f225cf212075 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51040 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15246 ptlrpc: per-device adaptive timeout parameters When a client is mounting multiple filesystems with different MGSes setting global parameters at_min, at_max, etc., then the settings from one filesystem's MGS config will also apply to RPCs sent for the OSC, MDC, and MGC devices on the other filesystem(s). Typically the settings of the last filesystem to mount on the client override the earlier values, and there is no way to separate them. Moving the parameters to be per-device values allows them to be set independently for each set of client devices, so that the client can interact properly with each set of servers. This allows e.g. different timeouts for local and remote mounts, or for flash and HDD filesystems that have different load and performance. Add per-device adaptive timeout parameters that can optionally replace global parameters of the same name: at_min -> *.<fsname>*.at_min at_max -> *.<fsname>*.at_max at_history -> *.<fsname>*.at_history ldlm_enqeue_min -> *.<fsname>*.ldlm_enqueue_min These parameters should always be set with fsname in the device name, rather than pure wildcard '*' device names, or it will be be same as the global parameters in the end (settings from one MGS will apply to devices on other filesystems). That is a bug in how "lctl set_param -P" works, but will be fixed separately. Signed-off-by: Lei Feng <flei@whamcloud.com> Change-Id: I5b04c9aa53a446fb5a78bfaff372b4f236c9eb8a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45598 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13306 mgc: handle large NID formats For newer versions of Lustre the MGS can send mgs_nidtbl_entry containing NIDs of a larger format. Its also possible an old MGS will send NIDs of the previous size. We need to handle both cases. We reused the field of mcb_nm_cur_pass, which only is used for nodemap, of the struct mgs_config_body to send the NID size from the client to the MGS. Pre IPv6 clients will by default have a zero mcb_nm_cur_pass / mcb_nid_size. When mcb_nid_size is zero the the MGS will treat the client as pre-IPv6 and send small NID back to the client. This avoids needing to patch older clients. If the MGS is older then small size NIDs will be sent back which the new MGC layer can handle by converting those lnet_nid_t to struct lnet_nid. To handle this new code the "swab" of the entry is split into two parts. The "header" is "swab"ed as soon as we know the entry is large enough for that to make sense. The content containing NID information is swabbed later once the header has been found to look sane. Test-Parameters: serverversion=2.15 testlist=runtests,sanity,recovery-small Change-Id: I97ebdcecc1ee0fbfe676cbdbdc77edee13e60891 Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50750 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-17000 misc: remove Coverity annotations These Coverity function annotations were added around 10 years ago. Since then, Coverity seems to produce less false positives. Out of about 20 annotations, only 3 warnings get surpressed. Thus, the applicability of these annotations should be re-evaluated. Coverity has more advanced tools now for reducing false positives. Various Lustre functions and macros could be modeled rather than using function annotations. But first, we need to get a good idea of what kinds of false postives are being generated. https://scan.coverity.com/tune Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Ibcb9cf55574675e20b13a4f7a1b9142a3b75e262 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51793 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17003 dne: remove REP-ACK support in DNE system DNE system doesn't need to support REP-ACK. In the old implementation, write locks are kept in PW|EX mode after transaction stop, and will be downgraded to TXN mode till REP-ACK, and then not released until transaction commit. While in the period between transaction stop and REP-ACK, any read lock request will be on hold till downgrade, with this change, this read lock request will succeed immediately. During this period, any write lock request may involve extra commit, since mdt_blocking_ast() does not know whether transaction has stopped, so it needs to trigger commit-on-sharing immediately, and also set 'sync' flag in the lock. If transaction is not stopped yet, later when it's stopped, it will trigger another commit-on-sharing since the 'sync' flag is set. With this change, mdt_blocking_ast() only needs to set 'sync' flag if its mode is PW|EX, and trigger commit-on-sharing once upon unlock. This refuces the number of transaction commits and may improve performance in some corner cases. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I159a0ad619afd10e97be3dc175a6b4ed77b31142 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51851 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_ZERO This patch removes all reference to LASSERT_ATOMIC_ZERO macro. Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I73259599d1dee6277fadf66181699f1282274a80 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51004 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Oleg Drokin <green@whamcloud.com>