LU-17022 obdclass: convert obd_conn_inprogress to atomic_t Using atomic_t for obd_conn_inprogress means we don't need to take a spinlock. Also send wakeup when value reaches zero, and wait for the wakeup instead of using a yield() loop. Change-Id: I9af29e068203cde951e592c408906d121702fa18 Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51906 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-17484 gss: reply error for SEC_CTX_INIT on wrong node When a server receives a SEC_CTX_INIT request for a target that is not available (either stopping, or not set up yet, or moved to a failover node), the request gets dropped. This makes the client-side RPC time out, increasing the time it takes to establish a proper gss context with the target, because it slows down the HA mechanism that tries alternate failover NIDs. Instead of dropping the request reply for SEC_CTX_INIT, the server needs to send back a proper error reply. The client will then be able to immediately try alternate failover NIDs, speeding mount/reconnect process up, and avoiding potential eviction. Test-Parameters: trivial Test-Parameters: kerberos=true testlist=sanity-krb5 Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 Signed-off-by: Sebastien Buisson <sbuisson@ddn.com> Change-Id: Id2cefaa7d54729a63c7be13b65d7ace579bcaa78 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53970 Reviewed-by: Aurelien Degremont <adegremont@nvidia.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-6142 ldlm: Fix style issues for ldlm folder This patch fixes issues reported by checkpatch for files under folder lustre/ldlm/ Test-Parameters: trivial Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I3c15c6a6e3d21bce9c8609e60ec481b484f00480 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54003 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13805 llite: Implement unaligned DIO connect flag Unupgraded ZFS servers may crash if they received unaligned DIO, so we need a compat flag and a test to recognize those servers. This patch implements that logic. Fixes: 7194eb6431 ("LU-13805 clio: bounce buffer for unaligned DIO") Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com> Change-Id: I5d6ee3fa5dca989c671417f35a981767ee55d6e2 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51126 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-8802 obd: remove MAX_OBD_DEVICES Remove this arbitrary limit by reimplementing the array as an Xarray. Xarray can grow and shink dynamically, hence saving memory and allow for many more OBD devices. There is still technically a limit OBD_MAX_INDEX, which is xa_limit_31b.max or around 2 billion. This is far more than is practically useful. This patch also adds various iterators for OBD devices, which are used to simplify code in various places. Removing class_obd_list() since it is unused. Rename class_dev_by_str() to class_str2obd() to keep the pattern. Several class_* functions have been refactored to improve locking. The larger issue of OBD device locking will be addressed separately. Update the OBD device lifecycle test to try loading more devices (about 24,000 for now). Currently, adding an additional OBD device is an O(n^2) operation due to the class_name2dev calls in class_register_device(). This will be addressed in a future patch adding a hash table for OBD device name lookups. Further, OBD life cycle management could likely be simplified by using Xarray marks. Right now, it is handled by a bit field in the obd_device struct. Since the scope of the changes needed to simplify this seem large, this will also be addressed separately. Test-Parameters: testlist=sanity env=ONLY=55,ONLY_REPEAT=10 Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Icb2cd94a5529e79f5d3ebd0de5e0f225cf212075 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51040 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-15246 ptlrpc: per-device adaptive timeout parameters When a client is mounting multiple filesystems with different MGSes setting global parameters at_min, at_max, etc., then the settings from one filesystem's MGS config will also apply to RPCs sent for the OSC, MDC, and MGC devices on the other filesystem(s). Typically the settings of the last filesystem to mount on the client override the earlier values, and there is no way to separate them. Moving the parameters to be per-device values allows them to be set independently for each set of client devices, so that the client can interact properly with each set of servers. This allows e.g. different timeouts for local and remote mounts, or for flash and HDD filesystems that have different load and performance. Add per-device adaptive timeout parameters that can optionally replace global parameters of the same name: at_min -> *.<fsname>*.at_min at_max -> *.<fsname>*.at_max at_history -> *.<fsname>*.at_history ldlm_enqeue_min -> *.<fsname>*.ldlm_enqueue_min These parameters should always be set with fsname in the device name, rather than pure wildcard '*' device names, or it will be be same as the global parameters in the end (settings from one MGS will apply to devices on other filesystems). That is a bug in how "lctl set_param -P" works, but will be fixed separately. Signed-off-by: Lei Feng <flei@whamcloud.com> Change-Id: I5b04c9aa53a446fb5a78bfaff372b4f236c9eb8a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45598 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-13306 mgc: handle large NID formats For newer versions of Lustre the MGS can send mgs_nidtbl_entry containing NIDs of a larger format. Its also possible an old MGS will send NIDs of the previous size. We need to handle both cases. We reused the field of mcb_nm_cur_pass, which only is used for nodemap, of the struct mgs_config_body to send the NID size from the client to the MGS. Pre IPv6 clients will by default have a zero mcb_nm_cur_pass / mcb_nid_size. When mcb_nid_size is zero the the MGS will treat the client as pre-IPv6 and send small NID back to the client. This avoids needing to patch older clients. If the MGS is older then small size NIDs will be sent back which the new MGC layer can handle by converting those lnet_nid_t to struct lnet_nid. To handle this new code the "swab" of the entry is split into two parts. The "header" is "swab"ed as soon as we know the entry is large enough for that to make sense. The content containing NID information is swabbed later once the header has been found to look sane. Test-Parameters: serverversion=2.15 testlist=runtests,sanity,recovery-small Change-Id: I97ebdcecc1ee0fbfe676cbdbdc77edee13e60891 Signed-off-by: James Simmons <jsimmons@infradead.org> Signed-off-by: Mr NeilBrown <neilb@suse.de> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50750 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Sebastien Buisson <sbuisson@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
LU-17003 dne: remove REP-ACK support in DNE system DNE system doesn't need to support REP-ACK. In the old implementation, write locks are kept in PW|EX mode after transaction stop, and will be downgraded to TXN mode till REP-ACK, and then not released until transaction commit. While in the period between transaction stop and REP-ACK, any read lock request will be on hold till downgrade, with this change, this read lock request will succeed immediately. During this period, any write lock request may involve extra commit, since mdt_blocking_ast() does not know whether transaction has stopped, so it needs to trigger commit-on-sharing immediately, and also set 'sync' flag in the lock. If transaction is not stopped yet, later when it's stopped, it will trigger another commit-on-sharing since the 'sync' flag is set. With this change, mdt_blocking_ast() only needs to set 'sync' flag if its mode is PW|EX, and trigger commit-on-sharing once upon unlock. This refuces the number of transaction commits and may improve performance in some corner cases. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: I159a0ad619afd10e97be3dc175a6b4ed77b31142 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51851 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_ZERO This patch removes all reference to LASSERT_ATOMIC_ZERO macro. Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I73259599d1dee6277fadf66181699f1282274a80 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51004 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 obd: remove OBP and MDP macros These macros save very little space, make it harder to understand the code (by adding one more thing to remember) and make it impossible to grep for o_* and m_* functions. Luckily, they are only used in a few places. So, remove them and all references. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: I4c23199ca53c906ca190a81ffdf916ff6cff9a0b Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51739 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Neil Brown <neilb@suse.de> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16965 obd: remove unused obd_evict_inprogress Remove the atomic_t struct field obd_evict_inprogress from 'struct obd_device'. This field was only ever incremented in a unused function that was removed in a previous patch. Hence, remove it altogther. This patch also removes the associated wait queue. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Change-Id: Id151c1e6a0adde8c1aeb6dbc903b9d98d00fd21d Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51681 Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-15527 dne: refactor commit-on-sharing for DNE Commit-on-sharing for DNE is different from the original commit-on-sharing: * the original commit-on-sharing is to eliminate dependency between operations from different clients. * while commit-on-sharing for DNE is to eliminate dependency between operations handled by different MDTs, so that upon multiple MDT failures, an operaiton replay won't fail because its dependent operation is not replayed by another MDT yet. Current CoS for DNE implementation checks dependency in MDT layer, and it decides by checking whether current operation is a distributed transaction, if so, it will trigger CoS upon conflicting locks. Actually this may miss some cases that should trigger CoS (even local transaction should trigger CoS if it depends on a distributed transaction), and on the other hand it may trigger extra CoS because if two operations are handled by the same MDT, the dependency is ensured because they will always be replayed by transaction number. And to avoid mixing the code of two different CoS, the following changes are made: * add new ldlm lock mode LCK_TXN. On DNE system, downgrade PW/EX locks to this mode after transaction stop. * add li_initiator_id in struct ldlm_inodebits, which is the index of MDT where the lock is enqueued, i.e. where operation is handled. If another operation handled by a different MDT requests a conflicting PW|EX mode lock against this TXN mode lock, it will trigger commit to ensure the dependent operation is committed to disk (NB, it doesn't trigger commit on all involved MDTs, but only the MDT where the conflict happens, which is enough to allow replay succeed). * remove LDLM_FL_COS_INCOMPAT and LDLM_FL_COS_ENABLED. * MDT layer doesn't need to check such dependency any more, since lock itself knows. * updated sanityn 33c, 33d and 33e since fewer CoS are triggered now. Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> Change-Id: Ib0149fcdc0178afd2c6894d211480f3c6c9284a0 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46641 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-6142 lustre: use list_first/last_entry() for list heads This patch changes list_entry(foo.next, ...) to list_first_entry(&foo, ...) and list_entry(foo.prev, ...) to list_last_entry(&foo, ...) in cases where 'foo' is a list head - not a list member. Test-Parameters: trivial Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I22b1278f5b481ce3074db3e59d37d9148016aed5 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50828 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16517 build: pass extra configure options to "make debs" While running "make debs", the configure command in debian/rules ignores some user defined configure options. This patch fixes the issue by adding the detection of the extra options into debian/rules. Test-Parameters: trivial clientdistro=ubuntu2204 Change-Id: Ia9db4e05abf33834cb3c853f4f0829dadc8d7400 Signed-off-by: Jian Yu <yujian@whamcloud.com> Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50464 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Tested-by: Shuichi Ihara <sihara@ddn.com> Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com> Reviewed-by: Shuichi Ihara <sihara@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_POS This patch removes all reference to LASSERT_ATOMIC_POS macro. Once all the access is removed it would be easier to just toggle atomic_* API calls with recount_* counts. Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Change-Id: I2051de3707106532259e51ec3e4c890c65836b1a Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50881 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Oleg Drokin <green@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Timothy Day <timday@amazon.com> Reviewed-by: Neil Brown <neilb@suse.de>
LU-12610 ldlm: replace OBD_ -> CFS_ macros Replace OBD macros that are simply redefinitions of CFS macros. Test-Parameters: trivial Signed-off-by: Timothy Day <timday@amazon.com> Signed-off-by: Ben Evans <beevans@whamcloud.com> Change-Id: I4d903f286f138152cff22df5cba411d2c9fcb4a8 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50685 Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-11407 obdclass: init osc.*.rpc_stats start_time Add missing start_time initialization for osc.*.rpc_stats. Test-Parameters: trivial Fixes: ea2cd3af7b ("LU-11407 obdclass: add start time to stats files") Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Change-Id: I998b5337ccebc4d3ec18260d259f39c7893ebbe5 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50734 Reviewed-by: Feng Lei <flei@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Oleg Drokin <green@whamcloud.com> Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com>
LU-16634 misc: standardize iocontrol param handling Validate uarg and karg early in iocontrol processing where needed. This needs kernel interop for 4.20+ kernels for access_ok(), but this can be checked by #ifdef and does not need an autoconf test. Fix incorrect definition of OBD_IOC_BARRIER to match reality. Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Reported-by: Tao Lyu <tao.lyu@epfl.ch> Change-Id: I1a0d2f839949debf346aa15c65b0f407e9ce7057 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50314 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-10391 lustre: obd_connect and reconnect now use large nid The 'localdata' argument for o_connect and o_reconnect when called server-side is now a 'struct lnet_nid *' rather than an 'lnet_nid_t *'. Signed-off-by: Mr NeilBrown <neilb@suse.de> Change-Id: I1ce72ec11a5d2463fb90ab2686410e2dd96118e2 Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50097 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: James Simmons <jsimmons@infradead.org> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>
LU-14139 statahead: add stats for batch RPC requests This patch adds stats for batch PtlRPC request. It can show the statistical information such as how many subreqs in a batch RPC. Signed-off-by: Qian Yingjin <qian@ddn.com> Change-Id: I2f71ff5d01ab1070bd8d771a72edd786ad27f03c Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40943 Tested-by: jenkins <devops@whamcloud.com> Tested-by: Maloo <maloo@whamcloud.com> Reviewed-by: Andreas Dilger <adilger@whamcloud.com> Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com> Reviewed-by: Oleg Drokin <green@whamcloud.com>