Whamcloud - gitweb
fs/lustre-release.git
8 months agoLU-16097 tests: skip quota subtests in interop 09/52009/11
Andreas Dilger [Fri, 18 Aug 2023 21:55:10 +0000 (21:55 +0000)]
LU-16097 tests: skip quota subtests in interop

Skip subtests in sanity-quota.sh to avoid interop test failures,
backdated to check all new tests since 2.14.0 for completeness.

Test-Parameters: trivial testlist=sanity-quota ossversion=2.15.3
Test-Parameters: testlist=sanity-quota mdsversion=2.15.3
Fixes: 513b1cdbca ("LU-16340 quota: notify only global lqe")
Fixes: d4978678b4 ("LU-15694 quota: keep grace time while setting default")
Fixes: 25a70a88c9 ("LU-13952 quota: default OST Pool Quotas")
Fixes: 188112fc80 ("LU-14300 quota: avoid nested lqe lookup")
Fixes: 8c19365416 ("LU-13971 quota: report Pool Quotas for a user")
Fixes: a4fbe7341b ("LU-14739 quota: nodemap squashed root cannot bypass quota")
Fixes: 3ffa5d680f ("LU-14740 llite: avoid project quota overflow")
Fixes: 29e00cecc6 ("LU-14696 llite: check read only mount for setquota")
Fixes: 789038c97a ("LU-15167 quota: fallocate send UID/GID for quota")
Fixes: 5fc934ebbb ("LU-15519 quota: fallocate does not increase projid usage")
Fixes: c9901b68b4 ("LU-13587 quota: protect qpi in proc")
Fixes: 61ec1e0f2c ("LU-15031 quota: reseed glbe in qmt_lvbo_udate")
Fixes: dfe7d2dd2b ("LU-16341 quota: fix panic in qmt_site_recalc_cb")
Fixes: 862f0baa7c ("LU-15097 quota: stop pool_recalc before killing pool")
Fixes: 61481796ac ("LU-15193 quota: expand QUOTA_MAX_TRANSIDS to 12")
Fixes: a2fd4d3aee ("LU-15880 quota: fix insane grant quota")
Fixes: 6c0b4329d0 ("LU-16339 quota: notify OSTs until lge_qunit_nu is set")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ife8bfd83d0f217c534f3b12b4c9d108d370ed6b7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-13306 mgs: support large NID for mgs_write_log_osc_to_lov 53/52053/5
James Simmons [Wed, 23 Aug 2023 14:46:34 +0000 (10:46 -0400)]
LU-13306 mgs: support large NID for mgs_write_log_osc_to_lov

The various llogs on the MGS needed to be updated to support both
64 bit NID size and the newer large NID format. The function
mgs_write_log_osc_to_lov was missed in this update.

Test-Parameters: trivial testlist=runtests ossversion=2.15.3
Fixes: c0cb747ebe9 ("LU-13306 mgs: use large NIDS in the nid table on the MGS")
Change-Id: If543a0421d1f3cac9827581ce46da911c3456efd
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52053
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16541 tests: Improve test 64f 40/52040/5
Patrick Farrell [Tue, 22 Aug 2023 16:32:52 +0000 (12:32 -0400)]
LU-16541 tests: Improve test 64f

The buffered IO part of test 64f has several timing related
holes and other oddities.  The use of multiop in the
background does not guarantee the RPC will not be sent, AND
the test doesn't kill it correctly.

Clean this up and make a more reliable version of the test.
Hopefully this will resolve the failure issues, if not, a
better version of the test will allow debugging.

Test-Parameters: trivial
Test-Parameters: testlist=sanity envdefinitions=ONLY=64f,ONLY_REPEAT=20
Test-Parameters: testlist=sanity envdefinitions=ONLY=64f,ONLY_REPEAT=20
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I25b825e1d9d516635ef8cbd26dd12809625c34df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52040
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
8 months agoLU-17005 obdclass: allow stats header to be disabled 23/51823/2
Andreas Dilger [Mon, 31 Jul 2023 19:34:22 +0000 (13:34 -0600)]
LU-17005 obdclass: allow stats header to be disabled

Add a global "enable_stats_header" tunable parameter that can be
set to enable/disable the "start_time" and "elapsed_time" fields
in the standard lprocfs "stats" files.

Default to enabled, since this landed shortly after v2_14_0.

Test-Parameters: trivial
Fixes: 5efb892396e3 ("LU-11407 obdclass: add start time to stats files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I460b957447bfb83e6d4fd7395b79ce994f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51823
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16341 tests: skip sanity-quota/test_14 for old MDS 49/51949/2
Alex Deiter [Tue, 15 Aug 2023 18:47:51 +0000 (22:47 +0400)]
LU-16341 tests: skip sanity-quota/test_14 for old MDS

Skip sanity-quota test_14 for old MDS missing the fix
for LU-16341 kernel NULL in qmt_site_recalc_cb.

Fixes: d965d63415 ("LU-16341 quota: fix panic in qmt_site_recalc_cb")
Test-Parameters: trivial testlist=sanity-quota env=ONLY=14
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: I1a23daa06f0cd306c2b034df18617c2650945b28
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51949
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17027 target: include linux/file.h 43/51943/2
Xinliang Liu [Tue, 15 Aug 2023 07:58:14 +0000 (07:58 +0000)]
LU-17027 target: include linux/file.h

In some 4.x kernels like 4.19 we need to include linux/file.h to
have alloc_file_pseudo() defined.

Change-Id: Ieee8d5ac5b080bd3b5c761f54a5ca2f9581ecfe1
Test-Parameters: trivial
Fixes: ac0380dc519a ("LU-137 osd-ldiskfs: pass through resize ioctl")
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51943
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16424 tests: Add version check in sanity-lnet 42/51942/4
Wei Liu [Mon, 14 Aug 2023 19:02:24 +0000 (12:02 -0700)]
LU-16424 tests: Add version check in sanity-lnet

Skip sanity-lnet test_205, test_207 and test_209 if
version is older than 2.14.58 since the lnet_if_list
function was added in Fixes:
3166a201e0 ("LU-15398 tests: Use remote peers for health tests")

Test-Parameters: trivial testlist=sanity-lnet \
serverjob=lustre-b2_14 serverbuildno=2 \
serverdistro=el8.3

Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I9cd62d91980784e3b33cf4e30426bf74d17f717f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51942
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
8 months agoLU-16796 libcfs: Change struct cfs_hash to use kref 38/51938/3
Arshad Hussain [Fri, 11 Aug 2023 07:32:49 +0000 (13:02 +0530)]
LU-16796 libcfs: Change struct cfs_hash to use kref

This patch changes struct cfs_hash to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I58b5e8311a34b3b128c1440b93958389b0fcdd48
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16831 tests: add version check to sanity-pfl/0e 30/51930/2
Jian Yu [Fri, 11 Aug 2023 19:58:27 +0000 (12:58 -0700)]
LU-16831 tests: add version check to sanity-pfl/0e

This patch adds MDS version check to sanity-pfl test 0e
to avoid interop test failure.

Test-Parameters: trivial \
serverjob=lustre-b2_15 serverbuildno=67 \
env=ONLY=0e testlist=sanity-pfl

Test-Parameters: trivial env=ONLY=0e testlist=sanity-pfl

Change-Id: I79df1f36f07f6b376525364708eacc687f85a061
Fixes: a250ecb959a9 ("LU-16831 lfs: limit stripe count for component size")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16796 target: Change struct top_multiple_thandle to use kref 22/51922/2
Arshad Hussain [Thu, 10 Aug 2023 12:30:46 +0000 (18:00 +0530)]
LU-16796 target: Change struct top_multiple_thandle to use kref

This patch changes struct top_multiple_thandle to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5892e5ab14ea6570645e6395af6d8a0d2c325398
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51922
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17018 build: add 'linux-image-generic' as Depends 79/51879/5
Raphael Druon [Mon, 7 Aug 2023 07:26:09 +0000 (01:26 -0600)]
LU-17018 build: add 'linux-image-generic' as Depends

Add 'linux-image-generic >= 3.10' as a dependency for Debian dkms
package for Ubuntu support

Test-Parameters: trivial
Fixes: 621e0bc2f9 ("LU-16661 build: improve lustre.spec.in Requires")
Signed-off-by: Raphael Druon <rdruon@ddn.com>
Change-Id: Ie8bacbd55c379632d5554de8d72606c818c1771e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51879
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16796 ptlrpc: Change struct lsi_mounts to use kref 64/51864/4
Arshad Hussain [Mon, 31 Jul 2023 10:21:48 +0000 (15:51 +0530)]
LU-16796 ptlrpc: Change struct lsi_mounts to use kref

This patch changes struct lsi_mounts to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia185b19123f535f8c54a6ea6b7a0212fbe85ffea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51864
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17003 dne: remove REP-ACK support in DNE system 51/51851/3
Lai Siyao [Mon, 10 Jul 2023 22:49:50 +0000 (18:49 -0400)]
LU-17003 dne: remove REP-ACK support in DNE system

DNE system doesn't need to support REP-ACK. In the old implementation,
write locks are kept in PW|EX mode after transaction stop, and will
be downgraded to TXN mode till REP-ACK, and then not released until
transaction commit.

While in the period between transaction stop and REP-ACK, any read
lock request will be on hold till downgrade, with this change, this
read lock request will succeed immediately.  During this period, any
write lock request may involve extra commit, since mdt_blocking_ast()
does not know whether transaction has stopped, so it needs to trigger
commit-on-sharing immediately, and also set 'sync' flag in the lock.
If transaction is not stopped yet, later when it's stopped, it will
trigger another commit-on-sharing since the 'sync' flag is set.

With this change, mdt_blocking_ast() only needs to set 'sync' flag if
its mode is PW|EX, and trigger commit-on-sharing once upon unlock.
This refuces the number of transaction commits and may improve
performance in some corner cases.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I159a0ad619afd10e97be3dc175a6b4ed77b31142
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51851
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16796 ptlrpc: Change struct ls_device to use kref 11/51811/6
Arshad Hussain [Mon, 31 Jul 2023 04:27:00 +0000 (09:57 +0530)]
LU-16796 ptlrpc: Change struct ls_device to use kref

This patch changes struct ls_device to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iba3965ef884ef65ab2d379ed389dfbea4ef8a453
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 months agoLU-16999 lnet: Restore lpni aliveness check 91/51791/2
Chris Horn [Tue, 25 Apr 2023 18:53:46 +0000 (13:53 -0500)]
LU-16999 lnet: Restore lpni aliveness check

This is a revert of the following master change:

Lustre-change: https://review.whamcloud.com/46623/
Lustre-commit: caf6095ade66f70d4bad99ced7a918814a3af092

That patch restored the historic behavior of the LNet router peer
health feature, but it did not account for the fact that the old lnet
router checker behaved differently than the current implementation
that leverages LNet discovery to perform the router checker pings.
Because of this change to use discovery we can no longer guarantee
that each router end point will be ping'd within the peer aliveness
window, and as a result the router may incorrectly determine that some
peer NIs are not alive.

Revert this change until a long term solution can be found.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11604
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I77f4bd64b616693ab2c91c747bf327c6f71689c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16998 lnet: Error when missing discover arg 90/51790/2
Chris Horn [Tue, 9 May 2023 20:05:21 +0000 (14:05 -0600)]
LU-16998 lnet: Error when missing discover arg

Print an error when a user does not supply a NID argument to the
'lnetctl discover' command.

Test-Parameters: trivial
HPE-bug-id: LUS-11487
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I081137db5490547a69248b7d2e7f7986b6d8612e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51790
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-15235 tests: skip sanity/56od in interop 80/51580/2
Andreas Dilger [Wed, 5 Jul 2023 20:07:52 +0000 (14:07 -0600)]
LU-15235 tests: skip sanity/56od in interop

Sanity test_56oc and test_56od were using the btime_supported()
function to check it "lfs find" supported file birth time, but
this did not properly check whether the MDS supported this option.

Remove the btime_supported() check and just use the version, since
this has been around a few releases already.

Fixes: 186b97e68abb ("LU-11971 utils: Send file creation time to clients")
Test-Parameters: trivial testlist=sanity serverversion=2.12.9 env=ONLY=56
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c85103c843d3b993e3e112bf5d0da976d3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51580
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16945 tests: skip sanity test_27Cg in interop 79/51579/2
Andreas Dilger [Wed, 5 Jul 2023 19:47:28 +0000 (13:47 -0600)]
LU-16945 tests: skip sanity test_27Cg in interop

Sanity test_27Cg is testing functionality that was broken in older
MDS versions, but does not have a version check, so it causes testing
to timeout 100% of the time when running on older servers.  Skip it.

Fixes: d96b98ee6b63 ("LU-16693 lod: ENODEV on setstripe with wrong OST#")
Test-Parameters: trivial testlist=sanity env=ONLY=27Cg
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I52b3d4e6a78a0db8f48401b128e22372f3d8a9bd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51579
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16760 utils: support 'lfs find --attrs' and '-printf %La' 62/51562/8
Sebastien Buisson [Tue, 4 Jul 2023 07:28:37 +0000 (09:28 +0200)]
LU-16760 utils: support 'lfs find --attrs' and '-printf %La'

Add support to "lfs find" to filter on file attribute flags, with the
syntax "[!] --attrs=[^]ATTR[,...]".
Add support to "lfs find" to print file attribute flags with
"-printf %La".

Add sanity-sec test_65 for Encrypted and Immutable flags.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5e5cfe5c8c8cbed8bb79f3ad6d8116347ecfe6ac
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51562
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 months agoLU-16360 osc: fix lu_ref usage 22/51522/2
Alexey Lyashkov [Fri, 2 Dec 2022 08:40:05 +0000 (11:40 +0300)]
LU-16360 osc: fix lu_ref usage

LDLM_LOCK_PUT should used with find lock by handle,
but LDLM_LOCK_RELEASE with get ref, let's fix it.

HPe-bug-id: LUS-11365
Test-Parameters: trivial
Fixes: 9c2fb0b29cec (LU-9679 osc: convert oe_refc to kref)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ib720b496b585c915ba20e0651a88c4afdde98e99
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-14697 tests: change performance-sanity to use mdtest 14/51414/22
Alex Deiter [Thu, 22 Jun 2023 13:28:48 +0000 (17:28 +0400)]
LU-14697 tests: change performance-sanity to use mdtest

Replace mdsrate by mdtest in performance-sanity.sh

Test-Parameters: trivial
Test-Parameters: testlist=performance-sanity clientdistro=el7.9
Test-Parameters: testlist=performance-sanity clientdistro=el8.8
Test-Parameters: testlist=performance-sanity clientdistro=el9.2
Test-Parameters: testlist=performance-sanity clientdistro=ubuntu2204
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: I1a80bab4ccbe085d3ff8d8b332c8e117e14ea9cb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51414
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
8 months agoLU-13805 obd: Reserve unaligned DIO connect flag 75/51075/7
Patrick Farrell [Wed, 9 Aug 2023 16:16:25 +0000 (12:16 -0400)]
LU-13805 obd: Reserve unaligned DIO connect flag

Unaligned DIO generally requires only client changes, but
an assert must be removed from ZFS servers for it to work
correctly.  This means we need a connect flag to recognize
whether or not a server running ZFS can safely use
unaligned DIO.

All OSTs will present this flag - to keep things simple -
but if the flag is not present, we'll still do unaligned
DIO to ldiskfs OSTs.

Actual implementation will be in another patch, this one
just creates the flag itself.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8b149cc54f4fb11e64182c65f2fbb01f8a3d3868
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-9859 lnet: simplifiy cfs_ip_addr_parse() 42/50842/9
Mr NeilBrown [Tue, 24 Nov 2020 23:10:14 +0000 (10:10 +1100)]
LU-9859 lnet: simplifiy cfs_ip_addr_parse()

cfs_ip_add_parse() is now always passed a string that it is safe to
modify.  So change the parsing to benefit from this and use standard
tools like strsep().

Note that the 'len' argument is now ignored.  It cannot be removed
without a larger change.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ice492edf109dca2e411132b891514f0caa535d8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50842
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
8 months agoLU-10391 obdclass: handle large NIDs for mount strings 62/50362/17
James Simmons [Thu, 3 Aug 2023 20:57:02 +0000 (16:57 -0400)]
LU-10391 obdclass: handle large NIDs for mount strings

Mount strings support using ':' as a delimiter but this is also
a part of the some NID strings like IPv6, so rework class_parse_value()
to only look at ':' when it occurs after '@'.

The mount utilities use the function convert_hostnames() to ensure
the mount string containing an NID is valid. This only works for
small size nids so migrate the function to handle large NIDs. This
should allow mounting with IPv6 or other large NID addresses.

In testing the userland  libcfs_ip_str2addr_size() had bugs that
rendered incorrect NID strings. Fix those issues.

Fixes: b6c702df5d4 ("LU-10391 libcfs: add large-nid string conversion functions.")
Change-Id: Ic9b2a368456ba75ceb5911ac7f75ae00d6123870
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50362
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16962 build: parallel header checks 73/51673/4
Shaun Tancheff [Fri, 14 Jul 2023 09:21:45 +0000 (16:21 +0700)]
LU-16962 build: parallel header checks

Add LB2_CHECK_LINUX_HEADER_SRC and LB2_CHECK_LINUX_HEADER_RESULT
macros to use for running header checks in parallel.

Migrate (most) header checks to parallel and run a subset
early as the results of those tests are required by other
configure tests.

Test-Parameters: trivial
HPE-bug-id: LUS-11710
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia765261179d25e96912e65e31c81824b4507e604
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51673
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
8 months agoLU-16957 build: Improve parallel --config-cache 37/51637/5
Shaun Tancheff [Fri, 14 Jul 2023 08:34:02 +0000 (15:34 +0700)]
LU-16957 build: Improve parallel --config-cache

The parallel build should consider the configure cache before
adding tests to the parallel build pass.

Track the number of compile tests needed, skip the make when
no build tests are needed.

Also unify libcfs, core, and ldiskfs build passes to a single step.

Configure timings vs master

     master       master w/cache  |     patch         patch w/cache
 --------------   --------------- | ---------------  ----------------
 real  1m3.493s   real  0m34.024s | real  1m3.903s    real  0m8.404s
 user 1m34.587s   user  1m16.547s | user  1m37.191s   user  0m4.292s
 sys  0m35.119s   sys   0m22.687s | sys   0m35.297s   sys   0m5.514s

Test-Parameters: trivial
HPE-bug-id: LUS-11706
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I6696b350e8315190a67c1463435b18a87d45813e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
8 months agoLU-13805 llite: Add copy of iovec to sub-dio 40/49940/28
Patrick Farrell [Wed, 8 Feb 2023 04:09:17 +0000 (23:09 -0500)]
LU-13805 llite: Add copy of iovec to sub-dio

It's very useful to move some of the direct I/O processing
from the main thread to the ptlrpc threads.  This is done
by associating the processing with each sub DIO, or DIO
'chunk'.  This requires a local copy of the iovec in each
chunk, because we:
A) need the chunk-current state of the iovec (as we move
along the iovec as chunks are created)
B) some operations will modify the iovec, and so to do
them from multiple ptlrpc threads, each needs to work on a
separate copy of the iovec.

This will be used by copy_page_to_iter in completing
unaligned DIO reads.

This has been split out from the main unaligned DIO patch
for simplification.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5645e904b6f9423eafc69cc0f59349cb3dcb9920
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49940
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
8 months agoLU-13805 clio: Add write to sdio 91/49991/30
Patrick Farrell [Tue, 14 Feb 2023 18:33:35 +0000 (13:33 -0500)]
LU-13805 clio: Add write to sdio

Unaligned DIO will need to know if an sdio is a write or
a read, so we add this info to the sub-dio.

Test-parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8ee042ca5a0461db672ba98b7fa6c64b01a8bba2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49991
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-13805 tests: Add racing tests of BIO, DIO 29/50529/27
Patrick Farrell [Tue, 4 Apr 2023 18:53:08 +0000 (14:53 -0400)]
LU-13805 tests: Add racing tests of BIO, DIO

We're a bit short on racing tests for buffered IO and
direct IO.  This patch adds a number of tests.  These
were originally part of the unaligned DIO patch, but
some of them have shown issues without unaligned IO.

So this patch puts in these tests with only aligned IO so
we can see which failures are specific to the unaligned IO
changes and which are not.

This patch should be landable like this.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I861bcaec785936cb9c3752f8148dcab4054f6078
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50529
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-7073 tests: Add file migration to racer 69/13669/25
Henri Doreau [Fri, 6 Feb 2015 09:01:36 +0000 (10:01 +0100)]
LU-7073 tests: Add file migration to racer

Make racer run both blocking and non-blocking "lfs migrate" commands.
Implement this within the file_create.sh script, since it is already
selecting among different layout types.

Update Makefile.am to avoid listing every racer filename explicitly
to make it easier to add new types of operations in the future.

Test-Parameters: trivial testlist=racer,racer,racer
Test-Parameters: fstype=zfs testlist=racer,racer,racer
Signed-off-by: Henri Doreau <henri.doreau@cea.fr>
Signed-off-by: Jinshan Xiong <jinshan.xiong@intel.com>
Change-Id: I51b3f19c78029ff47102e96a71ec4a0fc472183a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/13669
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-930 misc: update MAINTAINERS addresses 25/52025/3
Andreas Dilger [Sun, 26 Mar 2023 23:40:19 +0000 (17:40 -0600)]
LU-930 misc: update MAINTAINERS addresses

Update email addresses for various maintainers, and remove those
people who are no longer working on Lustre.

Change "M:" to "R:", since "M:" means "Mail to" and not "Maintainer"
as I thought.  Use "R:" for "Reviewer", and remove other tags that
we don't want in this file for Lustre.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3e73963b08181154fa48f308cb3d1d0a533ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52025
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16997 kfilnd: Correct RX buffer size 89/51789/2
Chris Horn [Thu, 27 Jul 2023 15:47:22 +0000 (09:47 -0600)]
LU-16997 kfilnd: Correct RX buffer size

The immediate receive buffers are large buffers where incoming
messages are byte packed into the buffer until all buffer space is
exhausted. The size of the buffer is kkfilnd module parameter
credits * 4096. The number of buffers is controlled by kkfilnd module
parameter immediate_rx_buf_count.

With the current defaults this results in only 1MiB of buffers per RX
context (i.e. CPT) to sync these messages. While kfilnd tries to
replenish these buffers as fast as possible, it may not be fast enough
and replenishing can be delayed based on CPU availability. Change
default credits to 512 so that we have have 8x 2MiB buffers.

Test-Parameters: trivial
HPE-bug-id: LUS-10548
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I83f813ba2e295e6087131dcdfb12fc0feebb4834
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51789
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16996 kfilnd: Wrong traffic class assigned 88/51788/2
Ron Gredvig [Thu, 15 Dec 2022 20:04:20 +0000 (14:04 -0600)]
LU-16996 kfilnd: Wrong traffic class assigned

The wrong traffic class was being assigned to a transmit
context when multiple networks were assigned to the same
interface.

This was discovered by noticing a currently unsupported
traffic class didn't fail when it was used. The traffic
class from the shared domain was being used instead.

This was fixed by explicitly specifying the traffic
class when creating a transmit context for an endpoint.

Test-Parameters: trivial
HPE-bug-id: LUS-11415
Signed-off-by: Ron Gredvig <ron.gredvig@hpe.com>
Change-Id: I21236c01d4bef53b62e2f303c8e24e059ce83c0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51788
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16995 kfilnd: Handle TAG_RX_OK in TN_STATE_FAIL 87/51787/2
Chris Horn [Thu, 13 Apr 2023 15:36:37 +0000 (09:36 -0600)]
LU-16995 kfilnd: Handle TAG_RX_OK in TN_STATE_FAIL

It is possible for the fabric to delay packets such that the retry
handler cancels the message but it is still delivered to the target.
If the timing is right then the initiator may receive a TAG_RX_OK
event after the transaction has transitioned to TN_STATE_FAIL. This
currently trips an LBUG, but instead we can allow the transaction to
complete normally.

Test-Parameters: trivial
HPE-bug-id: LUS-11572
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I381d64713a7942fed09d41b30f64be602193057f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51787
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16994 kfilnd: Add work queue parameters 86/51786/2
Ron Gredvig [Thu, 20 Apr 2023 19:12:08 +0000 (19:12 +0000)]
LU-16994 kfilnd: Add work queue parameters

Added kfilnd work queue parameters to allow tuning.

The wq_high_priority parameter enables the work queue to run
as high priority. Default is enabled.

The wq_cpu_intensive parameter enables the work queue to run
as cpu intensive. Default is disabled.

The wq_max_active parameter sets the max in-flight work items
of the work queue. Default is 512.

The prov_cpu_exclusive parameter enables reserving one of a
CPT's CPUs for the exlusive use of the kfabric provider.
Default is disabled.

Test-Parameters: trivial
HPE-bug-id: LUS-11605
Signed-off-by: Ron Gredvig <ron.gredvig@hpe.com>
Change-Id: Ic4db95787a864efca3ea1234953ce3ea828f3594
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51786
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16993 kfilnd: Add count value to debugfs stats 85/51785/2
Ron Gredvig [Fri, 12 May 2023 19:44:48 +0000 (19:44 +0000)]
LU-16993 kfilnd: Add count value to debugfs stats

The debugfs stats for initialor and target include
min. max and average. It is hard to interprest the
average without knowing how many sample were included
to calculate it.

Added the count value for extra context.

Test-Parameters: trivial
HPE-bug-id: LUS-11627
Signed-off-by: Ron Gredvig <ron.gredvig@hpe.com>
Change-Id: I3d840c250653b4f29b40c169254b9c9b4c88f584
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51785
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16992 kfilnd: Expand debugfs to record min/max 84/51784/2
Ian Ziemba [Wed, 22 Feb 2023 23:56:21 +0000 (17:56 -0600)]
LU-16992 kfilnd: Expand debugfs to record min/max

This will be useful in determining how long kfilnd transactions take
when underlying NIC gets congested.

Test-Parameters: trivial
HPE-bug-id: LUS-11497
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Change-Id: I5506329086e6284e04ec7b609485d582a35ca0b5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51784
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16991 kfilnd: Init LNet NI data in dev alloc 83/51783/2
Ian Ziemba [Wed, 22 Feb 2023 22:00:43 +0000 (16:00 -0600)]
LU-16991 kfilnd: Init LNet NI data in dev alloc

LNet ni_nid was being set outside of kfilnd_dev_alloc(). This was
causing the incorrect debugfs directories to be generated inside
kfilnd_dev_alloc().

Fix this by setting all LNet NI fields inside kfilnd_dev_alloc().

Test-Parameters: trivial
HPE-bug-id: LUS-11496
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Change-Id: I4eecfa05966cb7793a01b92b0bc49ffca252976e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16990 kfilnd: Use NETWORK_TIMEOUT for TAG_RX_CANCEL 82/51782/2
Chris Horn [Thu, 9 Mar 2023 00:18:41 +0000 (18:18 -0600)]
LU-16990 kfilnd: Use NETWORK_TIMEOUT for TAG_RX_CANCEL

We can get ECANCELED for some tagged receives which results in
transaction failure with TN_EVENT_TAG_RX_CANCEL. This can occur due
to problems with either the source or the target, so we should
use NETWORK_TIMEOUT message status.

Test-Parameters: trivial
HPE-bug-id: LUS-11520
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic3c1910f8a8c43447cbbc28129e23350e726830d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16989 kfilnd: Handle TX_FAIL in WAIT_SEND_COMP 81/51781/3
Chris Horn [Mon, 12 Dec 2022 23:28:54 +0000 (16:28 -0700)]
LU-16989 kfilnd: Handle TX_FAIL in WAIT_SEND_COMP

It is possible for us to get a TN_EVENT_TX_FAIL while transaction is
in TN_STATE_WAIT_SEND_COMP state. We should gracefully handle this
situation rather than LBUG.

Test-Parameters: trivial
HPE-bug-id: LUS-11344
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ib6fc5ed41f12762843fe9f638ffd523699936556
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51781
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason 51/51651/3
Serguei Smirnov [Thu, 13 Jul 2023 00:29:56 +0000 (17:29 -0700)]
LU-16393 o2iblnd: add IBLND_REJECT_EARLY reject reason

Add IBLND_REJECT_EARLY reason for rejecting connection request:
to be used when the device doesn't have any nets added yet or
when there's no active NIs on the net to handle the connection.
These conditions are supposed to occur only when LNI is being
added/initialized, so report at CNETERROR level vs. CERROR.

In lnet, set NI state to ACTIVE only after it has been added
to the list of NIs for the net, so that LND can know that
the NI can be used to accept connections.

Test-parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I59efb2fdf5d5ceabb6ff23f638ec85da82d57b99
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-17021 socklnd: fix late ksnr_max_conns set 90/51890/4
Cyril Bordage [Tue, 8 Aug 2023 13:06:23 +0000 (15:06 +0200)]
LU-17021 socklnd: fix late ksnr_max_conns set

ksnr_max_conns was set to the correct value after it was used.

Test-Parameters: trivial
Fixes: a5cbe7883db6 ("LU-12815 socklnd: allow dynamic setting of conns_per_peer")
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I9f2454d915ee1ab27db96f5247028db94965a11f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51890
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-17010 lfsck: don't create trans in dryrun mode 49/51849/3
Hongchao Zhang [Sat, 22 Jul 2023 08:29:57 +0000 (16:29 +0800)]
LU-17010 lfsck: don't create trans in dryrun mode

In LFSCK, the LFSCK transaction should not be created in
dryrun mode, which is related to the following patch,

Fixes: 0c1ae1cb9c19 ("LU-13124 scrub: check for multiple linked file")
Change-Id: Id543bc3c0e300c1cc14b670d724ebcacac3bf71b
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51849
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-17000 lnet: fix use-after-free in lnet_startup_lndnet 06/51806/2
Timothy Day [Sat, 29 Jul 2023 19:58:47 +0000 (19:58 +0000)]
LU-17000 lnet: fix use-after-free in lnet_startup_lndnet

If the lnd_startup function returns a positive
error code, the ni will get freed. But the code
incorrectly checks only for negative error codes,
leading to a potential use-after-free.

Addresses-Coverity-ID: 397786 ("Use after free")

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I36dd4dbfc0b409de010257e5d9ae9d983fd1817f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51806
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16974 utils: lfs mirror resync to show progress 50/51750/12
Alex Zhuravlev [Wed, 2 Aug 2023 11:46:28 +0000 (14:46 +0300)]
LU-16974 utils: lfs mirror resync to show progress

lfs mirror resync should be able to:
 - show progress like lfs mirror extend --stats does
 - throttle like lfs mirror extend -W does

use 64MB buffer for mirror resync by default.

Change-Id: Ibe60748542ff4a3731aa6a4a9907be82427a0ae9
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51750
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-8191 lnet: convert functions in utils to static 27/51427/3
Timothy Day [Fri, 23 Jun 2023 20:44:17 +0000 (20:44 +0000)]
LU-8191 lnet: convert functions in utils to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in various LNet utils and lnetconfig to
static.

In LNet selftest (lst), one unused function was
removed entirely. Some declarations were moved to
made static.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia4528281b3c87d77e46abb95f47ab0bdc72168c0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51427
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16883 ldiskfs: update for ext4-delayed-iput for SUSE 15 52/51252/3
Shaun Tancheff [Thu, 8 Jun 2023 06:32:17 +0000 (13:32 +0700)]
LU-16883 ldiskfs: update for ext4-delayed-iput for SUSE 15

ext4-delayed-iput patch does not apply cleanly to SUSE 15
SP4 and SP5 series 5.14.21 kernel.

Adjust the minor conflict in ext4_put_super()

Test-Parameters: trivial
Fixes: 616fa9b581 ("LU-15404 ldiskfs: use per-filesystem workqueues to avoid deadlocks")
HPE-bug-id: LUS-11661
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iee424bd6d455853d9f82e6e5b08e4ab44deb432c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16862 rpm: set kmod-lustre-tests requires kmod-lustre explicitly 91/51191/6
Xinliang Liu [Thu, 1 Jun 2023 10:00:04 +0000 (10:00 +0000)]
LU-16862 rpm: set kmod-lustre-tests requires kmod-lustre explicitly

Kmod-lustre-tests rpm should be installed along with kmod-lustre rpm
if there is one.

Test-Parameters: trivial
Change-Id: Ib265298381c317a03c4244f8ea380c6d64f0aef5
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51191
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-9859 libcfs: remove cfs_size_round() 09/51009/4
James Simmons [Tue, 16 May 2023 13:59:15 +0000 (09:59 -0400)]
LU-9859 libcfs: remove cfs_size_round()

Now that everyone is moved to round_up() we can safely remove the
macro cfs_size_round().

Test-Parameters: trivial
Change-Id: If8e1aff5e89007eeb38c5810c68282d51e37f019
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-9859 lnet: simplify cfs_parse_nidlist() 41/50841/11
Mr NeilBrown [Tue, 25 Jul 2023 18:47:27 +0000 (14:47 -0400)]
LU-9859 lnet: simplify cfs_parse_nidlist()

By duplicating the string being parsed, we can use mutating parsign
primitives and simplify the code.

Note that the kernel-space cfs_parse_nidlist() is now different from
the user-space version.  As they are both declared in the same header
file, this needs an #ifdef until the headers can be separated.

Change some function that return 0 on error to match the kernel
convention of 0 on success and -ve on error.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1d5155a1cee82f798bec0863d80d75af92399cf1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-9859 llite: simplify pcc_fname_list_parse() 40/50840/6
Mr NeilBrown [Tue, 25 Jul 2023 18:34:49 +0000 (14:34 -0400)]
LU-9859 llite: simplify pcc_fname_list_parse()

Now that pcc_fname_list_parse() is passed a mutable string, mutable
parsing with standard tools can be used.

Test-Parameters: trivial testlist=sanity-pcc
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I8da6720ae6822c410a339b5fff9fdd4256bb3685
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50840
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-9859 ptlrpc: simplify nrs_tbf_opcode_list_parse() 37/50837/13
Mr NeilBrown [Wed, 19 Jul 2023 18:15:57 +0000 (14:15 -0400)]
LU-9859 ptlrpc: simplify nrs_tbf_opcode_list_parse()

If nrs_tbf_opcode_list_parse() duplicates the string it is passed, it
can use standard mutating functions for parsing the string.

Test-Parameters: trivial
Test-Parameters: testlist=sanityn
Test-Parameters: testlist=sanity
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4acb57de52abde97fa9c2d133cf10a432b12f604
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50837
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16673 tests: add checking for user USER0 01/50501/2
Xinliang Liu [Mon, 3 Apr 2023 03:47:23 +0000 (03:47 +0000)]
LU-16673 tests: add checking for user USER0

Add checking for user USER0 in tests 125, 154a, 154b.

Test-Parameters: trivial testlist=sanity env=ONLY="125 154a 154b"

Change-Id: Id42d4b6dca4c05757d02483ddedd65be55df96d6
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16374 enc: rename O_FILE_ENC to O_CIPHERTEXT 40/51640/4
Sebastien Buisson [Wed, 12 Jul 2023 13:32:26 +0000 (15:32 +0200)]
LU-16374 enc: rename O_FILE_ENC to O_CIPHERTEXT

Rename O_FILE_ENC to O_CIPHERTEXT as per discussion in linux-fscrypt
mailing-list.
Also change the flag combination to be:
O_NOCTTY | O_NDELAY | O_DSYNC
to avoid the risk of accidental issues with tar that already opens
files with the 'O_NOCTTY | O_NDELAY' combination.

O_DSYNC does not make much sense for O_RDONLY files, but will force
writes on encrypted restore to be synchronous. With O_DIRECT and large
enough writes (32MB?) that might be OK, but not ideal for small files.

Fixes: fdbf2ffd41 ("LU-14677 sec: no encryption key migrate/extend/resync/split")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I36fed17a413ee690bc445c3e76674ed5fc337de5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 months agoLU-16851 llite: use right predicate in ll_atomic_open 47/51147/7
Vladimir Saveliev [Fri, 28 Jul 2023 14:44:33 +0000 (10:44 -0400)]
LU-16851 llite: use right predicate in ll_atomic_open

Using d_unhashed() brings a race window with d_add() and d_drop()
leading to dentry hash table corruption. If dentry which is in hash
already is added to hash table, it gets looped to itself via next
pointer:
        dentry 0xffff8fd34cc08840
        ...
                d_hash = {
                       next = 0xffff8fd34cc08848,
                       pprev = 0x0
                },

See for reference:
commit 00699ad8571afd7fb8bc2c61f67c86c2428680ab
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Jul 5 09:44:53 2016 -0400

    Use the right predicate in ->atomic_open() instances

Keep using d_unhashed() if d_in_lookup() is not provided by kernel.

HPE-bug-id: LUS-11560
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I6c27f031d0d7e7d571752d6172a32406ad68e913
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51147
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16962 build: parallel configure cleanup 70/51670/5
Shaun Tancheff [Fri, 14 Jul 2023 09:09:38 +0000 (16:09 +0700)]
LU-16962 build: parallel configure cleanup

LC_REGISTER_SHRINKER_FORMAT_NAMED macro should use
  register_shrinker_format

LC_HAVE_FILEMAP_GET_FOLIOS_CONTIG macro should use
  filemap_get_folios_contig

LN_SRC_CONFIG_STRSCPY_EXISTS and LN_CONFIG_STRSCPY_EXISTS
  are not defined. Remove the references.

Test-Parameters: trivial
HPE-bug-id: LUS-11709
Fixes: 0006eb3644 ("LU-16328 llite: migrate_folio, vfs_setxattr")
Fixes: ca992899d5 ("LU-16351 llite: Linux 6.1 prandom, folios_contig, vma_iterator")
Fixes: 7fe7f4ca06 ("LU-16520 build: Move strscpy to libcfs common header")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I0cb630d035a23edfa353040f4c0d25c46eb417d8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
8 months agoLU-8130 nrs: convert NRS ORR/TRR to rhashtable 13/40113/24
James Simmons [Tue, 25 Jul 2023 18:14:03 +0000 (14:14 -0400)]
LU-8130 nrs: convert NRS ORR/TRR to rhashtable

Move away from the cfs hash implementation to rhashtable
for NRS ORR handling. Since rhashtable is lockless it
should also increase performance. For the NRS TRR handling
we can use Xarray since its based on OST index which are
sequential which will give better performance than an hashtable.
TRR entries are added to the Xarray but not removed until the
Xarray is destroyed.

Test-Parameters: testlist=sanityn env=ONLY=77c+77d,ONLY_REPEAT=100
Change-Id: I5206a7586d4b4c8991b7163fd9253f017e6d3969
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40113
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
8 months agoLU-8130 libcfs: don't use radix tree for xarray 40/51840/7
James Simmons [Wed, 2 Aug 2023 18:36:11 +0000 (14:36 -0400)]
LU-8130 libcfs: don't use radix tree for xarray

For newer kernels the radix tree is totally based on Xarray. For Lustre
support for RHEL7 we backported Xarray but it still was using the
radix tree. Their is a mismatch between what the radix tree expects
and using a struct xa_node when allocating and freeing memory. Instead
abandon all use of the radix tree with Xarray. We use our own private
kmem cache which is based on radix tree but it uses xa_node.

Test-Parameters: trivial
Fixes: 84e12028be9a ("LU-9859 libcfs: add support for Xarray")
Change-Id: I87607aa0e55a4aca039f2fef5a76fbff0bedd9b3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51840
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16973 ptlrpc: flush delayed file desc if idle 05/51805/6
Andreas Dilger [Sat, 29 Jul 2023 08:59:28 +0000 (02:59 -0600)]
LU-16973 ptlrpc: flush delayed file desc if idle

The use of alloc_file_pseudo() allocates a real file descriptor,
so fput() will use a deferred cleanup for the descriptor, either
when the thread "finishes the syscall" (which never happens for
kernel threads), or a unmount time.  This accumulates too many
file descriptors (millions) on a busy system.

Instead of waiting to cleanup these file descriptors at unmount
time, call flush_delayed_fput() to clean them up when a ptlrpcd
thread becomes idle before it goes to sleep.

For kernels 3.6 and later when flush_delayed_fput() was first added,
and before kernel 5.4 when it was EXPORT_SYMBOL'd, grab a pointer
to the function with kallsyms_lookup_name() so it can be called.

Delete LN_CONFIG_STRSCPY_EXISTS reference that generates configure
warnings, since this check was renamed and moved to libcfs.

Fixes: eed43b2a427b ("LU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I24a08f9568d7d636a69672c5c3132ab25b292407
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51805
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 months agoLU-16847 ldiskfs: refactor bio submit. 36/51236/9
Alexey Lyashkov [Tue, 30 May 2023 08:30:02 +0000 (11:30 +0300)]
LU-16847 ldiskfs: refactor bio submit.

extract common code into osd_submit_bio() function to avoid
code duplication.
fix a debug output for the last bio, currently it lost from output.

HPe-bug-id: LUS-11645
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I5f957a832b73ad08ccccfba7866393d93d4ae538
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-13306 mgs: use large NIDS in the nid table on the MGS 96/50896/32
James Simmons [Wed, 19 Jul 2023 18:04:15 +0000 (14:04 -0400)]
LU-13306 mgs: use large NIDS in the nid table on the MGS

On the MGS the NIDs detected are handled using the struct
mgs_target_info which currently only handles lnet_nid_t.
This structure also limits the number of NIDs to 32 entries.
Some sites have reported that 32 NIDs wasn't enough when
they configured virtual LNet networks for isolation.

Update the mgs_target_info to use NID strings instead.
This has the advantage of working even if struct lnet_nid
expands in the future. We place this data at the end of
the mgs_target_info as a flexible array. This requires
updating the ptlrpc packet handling to increase the size
to some new value to contain all the NIDs registered.
Also this gives us the option to use hostnames in the
future. This information is then feed into a
struct mgs_nidtbl_entry which is sent to the mgc on all
the remote nodes. With this patch only large NIDs for
small address space is translated to the original
lnet_nid_t format and sent to the various clients.
All the server targets, which are clients of the MGS,
use the large NID format. With this patch we don't
have to patch old clients when the servers are using
the larger NID format.

Expand LNetGetId() to return large NID addresses as well.
In the future we will use the ocd_connect_flags to
determine if the MSG supports large NID addresses.

Change-Id: I7083d6ecfc46cf0419a0d4a582e4bf5240f193cd
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50896
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-15978 osp: fix striped directory deletion fails for 64K PAGE_SIZE 12/47812/33
Xinliang Liu [Tue, 28 Jun 2022 08:34:46 +0000 (08:34 +0000)]
LU-15978 osp: fix striped directory deletion fails for 64K PAGE_SIZE

This fixes the rmdir errors below:
rmdir: failed to remove '/mnt/lustre/d1.sanity/d2': Invalid argument
LustreError: 381691:0:(osp_object.c:1998:osp_it_next_page())
lustre-MDT0000-osp-MDT0001: invalid magic (0 != 8a6d6b6c) for page 0/1
while read layout orphan index.

For 64K PAGE_SIZE, when created an striped directory, e.g. created
with function test_mkdir() defined in test-framework.sh when MDSCOUNT
>= 2, deleting it will fail.

For PAGE_SIZE > LU_PAGE_SIZE, if the end system page fills less than
LU_PAGE_COUNT lu_idxpages, init the header of the remain lu_idxpages.
So that the clients handle this partial filling correctly.

Also make goto labels meaningful and avoid not freeing pages for
lip_nr == 0 in osp_it_next_page().

This patch also fixes wrong page idx for page kunmap in
dt_index_walk().

This server end fix also necessary for the idxpage reading clients
nodemap_process_idx_pages() and qsd_reint_entries(). So this patch also
includes fix for LU-15992: nodemap create and check failed on 64K page
size.

Fixes: 77eea1985bb1 ("LU-3336 lfsck: orphan OST-objects iteration")
Change-Id: I75bd9603c31bed8ea15fdba693677d41affaf61c
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Co-authored-by: Kevin Zhao <kevin.zhao@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47812
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
8 months agoLU-15722 osd-ldiskfs: fix write stuck for 64K PAGE_SIZE 63/47563/23
Xinliang Liu [Mon, 6 Jun 2022 08:59:54 +0000 (08:59 +0000)]
LU-15722 osd-ldiskfs: fix write stuck for 64K PAGE_SIZE

This reverts a previous commit for large PAGE_SIZE to fix a stuck IO
issue in another way.

One more ldiskfs_map_blocks() can't fix the write stuck for PAGE_SIZE
> BLOCK_SIZE. It still gets stuck in some tests like sanity-dom fsx.
Because each time ldiskfs_map_blocks() lookup it only return a
continuous range physical blocks. If a page has multiple continuous
range blocks, then it needs multiple ldiskfs_map_blocks() lookups to
find out all the already mapped blocks.

The fixed idea here is to record the already written blocks of the
start page and skip them at the next write retry.

This also fix and cleanup osd_mark_page_io_done() when start_blocks
is non-zero.

Fixes: 176ea3a4599e ("LU-15722 osd-ldiskfs: fix IO write gets stuck for 64K PAGE_SIZE")
Change-Id: I9c14d5d0aa23e81837dacb01d050c091e6a79148
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
8 months agoLU-16872 tests: exercise sanity test_27M more fully 02/51602/4
Andreas Dilger [Fri, 7 Jul 2023 19:57:52 +0000 (13:57 -0600)]
LU-16872 tests: exercise sanity test_27M more fully

Improve the sanity.sh test_27M to precreate a bunch of files with
specific OST striping so that it is more likely to trigger the code
path that accessed the stale OST list when using O_APPEND layout.

Also clean up code style in the rest of this subtest.

Test-Parameters: trivial testlist=sanity env=ONLY=27M,ONLY_REPEAT=200
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie94e3a32fc48198e4e15f44a55d1f8ccf61c74f5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 months agoLU-16872 lod: reset llc_ostlist when using O_APPEND stripes 59/51559/4
Thomas Bertschinger [Fri, 7 Jul 2023 14:57:40 +0000 (10:57 -0400)]
LU-16872 lod: reset llc_ostlist when using O_APPEND stripes

Files created with O_APPEND can have special striping set with the
parameters mdd.*.append_stripe_count and mdd.*.append_pool, and
should not inherit a list of OSTs to use from a parent directory
when these parameters are set. However, if a file is created with
O_APPEND and its create is handled by a kernel thread that has
previously created a file with a default list of OSTs, then those
defaults were erroneously applied to the O_APPEND file. This can
lead to the create returning EINVAL or to a crash.

This commit ensures that llc_ostlist is cleared when a file is
created with special append stripes.

Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Ib2023e17c9ef31a2e029e09e67b257eb2c77b113
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-17019 tests: fix 'v' usage in test 185 98/51898/2
Patrick Farrell [Tue, 8 Aug 2023 19:39:17 +0000 (15:39 -0400)]
LU-17019 tests: fix 'v' usage in test 185

Test 185 provides 'v' to multiop when using the
multiop_bg_pause function, which already adds 'v' to the
multiop command.  Using a verbosity level > 2 breaks the
code which checks the multiop output for PAUSING, now that
there is other text in that output.

Fixes: 84376bf674 ("LU-13805 tests: add unaligned io to multiop")
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7023e47956e32a3f681fec504a1fc9200dbbf24f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51898
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
9 months agoLU-16980 build: fix gcc-12 [-Werror=use-after-free] error 19/51819/2
Jian Yu [Mon, 31 Jul 2023 17:24:02 +0000 (10:24 -0700)]
LU-16980 build: fix gcc-12 [-Werror=use-after-free] error

This patch fixes the following [-Werror=use-after-free] and
[-Werror=stringop-overflow=] errors detected by gcc 12:

libcfs/include/libcfs/util/list.h:481:42: error: pointer 'tmp'
used after 'free' [-Werror=use-after-free]
  481 |              pos = list_entry(pos->member.next, typeof(*pos), member),  \
      |                                          ^
libcfs/include/libcfs/util/list.h:239:28: note: in definition of macro 'list_entry'
  239 |         ((type *)((char *)(ptr)-(char *)(&((type *)0)->member)))
      |                            ^~~
obd.c:5118:9: note: in expansion of macro 'list_for_each_entry'
 5118 |         list_for_each_entry(tmp, head, lpn_list) {
      |         ^~~~~~~~~~~~~~~~~~~
obd.c:5124:17: note: call to 'free' here
 5124 |                 free(tmp);
      |                 ^~~~~~~~~

test_brw.c: In function 'main':
test_brw.c:227:22: error: 'write' specified size between 9223372036854775808
and 18446744073709551615 exceeds maximum object size 9223372036854775807
[-Werror=stringop-overflow=]
  227 |                 rc = write(fd, buf, len);
      |                      ^~~~~~~~~~~~~~~~~~~

Change-Id: Ibe783ab0d13e2ecde1736946323932ab5db53740
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51819
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-17000 lod: fix lod_gen_component_id not wrapping 95/51795/6
Timothy Day [Fri, 28 Jul 2023 05:11:49 +0000 (05:11 +0000)]
LU-17000 lod: fix lod_gen_component_id not wrapping

The end variable is set to SEQ_ID_MAX. The code
for checking whether to try and search the
remaining ids checks LCME_ID_MAX. Thus, it
never gets called. Change the check to look for
SEQ_ID_MAX. Also, change the start and end such
that all ids are checked once.

Addresses-Coverity-ID: 397902 ("Logically dead code")

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1bd06b38314686f3d5a1c9ad42b38ad197f1a4e7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51795
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16985 utils: adaptive bufsize for mirroring 73/51773/5
Alex Zhuravlev [Wed, 26 Jul 2023 19:00:29 +0000 (22:00 +0300)]
LU-16985 utils: adaptive bufsize for mirroring

if bandwidth limit is requested, then change default bufsize to
make I/O rather smooth than like a saw.

Fixes: 23224e03dc ("LU-16587 utils: give lfs migrate a larger buffer")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ibc8e7d30ded201a4ff3d699530f5c9f8be5ce7f1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51773
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-16980 build: fix gcc-12 [-Werror=format-truncation=] error 65/51765/3
Jian Yu [Mon, 31 Jul 2023 09:03:46 +0000 (02:03 -0700)]
LU-16980 build: fix gcc-12 [-Werror=format-truncation=] error

This patch fixes the following [-Werror=format-truncation=] errors
detected by gcc 12:

liblnetconfig.c: In function 'open_sysfs_file':
liblnetconfig.c:106:49: error: '%s' directive output may be truncated
writing up to 127 bytes into a region of size between 1 and 128
[-Werror=format-truncation=]
  106 |         snprintf(filename, sizeof(filename), "%s%s",
      |                                                 ^~

lfs_project.c: In function 'lfs_project_handle_dir':
lfs_project.c:324:50: error: '%s' directive output may be truncated
writing up to 255 bytes into a region of size between 1 and 4095
[-Werror=format-truncation=]
  324 |                 snprintf(fullname, PATH_MAX, "%s/%s", pathname,
      |                                                  ^~

statx.c: In function 'do_dir_list':
statx.c:1427:58: error: '%s' directive output may be truncated
writing up to 255 bytes into a region of size between 1 and 4095
[-Werror=format-truncation=]
 1427 |                         snprintf(fullname, PATH_MAX, "%s/%s",
      |                                                          ^~

Change-Id: I514a1022d879f8b7af89f6ded68e9b453cd11408
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51765
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-16605 lfs: Add -0 option to fid2path 36/51736/8
Arshad Hussain [Tue, 18 Jul 2023 11:50:47 +0000 (17:20 +0530)]
LU-16605 lfs: Add -0 option to fid2path

Currently fid2path adds '\n' after printing
pathnames. Add '-0' option to fid2path to
add NUL('\0') at the end after printing out
pathnames instead of '\n'. This allows
pathnames that contain newlines('\n') to be
correctly interpreted by binaries like xargs.

Without -0 option:
$ lfs fid2path /mnt/lustre 0x200000401:0x1:0x0
/mnt/lustre/Test
_file
/mnt/lustre/Link_
file

With -0 option:
$ lfs fid2path -0 /mnt/lustre 0x200000401:0x1:0x0 | xargs --null
/mnt/lustre/test
_file /mnt/lustre/link_
file

Test-case sanity/226e added.

Test-Parameters: trivial testlist=sanity
Reported-by: Simon Westersund <simon.westersund@csc.fi>
Signed-off-by: Simon Westersund <simon.westersund@csc.fi>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9e3e32cde6c6abe83df48afd191ec167c74ac7e6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51736
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16949 lnet: get monitor thread to update ping buffer 35/51635/6
Serguei Smirnov [Tue, 11 Jul 2023 22:40:37 +0000 (15:40 -0700)]
LU-16949 lnet: get monitor thread to update ping buffer

Make sure that ping buffer updates requested by o2iblnd and
socklnd are performed by the LNet monitor thread.
Having the LNDs do these updates via an LNet API directly caused a
lock-up due to spinlock acquisition while in an interrupt context
in Centos 7.9 environment.
To avoid LNet trying to update the ping buffer for an LNI which is
still initializing, check that o2iblnd net is fully initialized
(IBLND_INIT_ALL) before requesting the ping buffer update.

Fixes: da230373bd ("LU-16563 lnet: use discovered ni status")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I87ff8791937f5a0ead6096ff33e8c0a8087f8ddd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51635
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16521 tests: racer should work without DNE 67/51267/4
Andreas Dilger [Sat, 10 Jun 2023 13:36:38 +0000 (07:36 -0600)]
LU-16521 tests: racer should work without DNE

Don't assume there are multiple MDTs for the target directory
when running dir_migrate and dir_remote.  If only one MDT
(or none, if running on a non-Lustre directory for some reason),
just exit from these workloads instead of spewing errors.

Test-parameters: trivial testlist=sanityn
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I47cbe9de2c4f9afd79228b6a505dde023f2540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51267
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-13031 jobstats: store jobid in xattr when files are created 82/50982/8
Thomas Bertschinger [Fri, 5 May 2023 21:05:22 +0000 (17:05 -0400)]
LU-13031 jobstats: store jobid in xattr when files are created

This change stores the jobid of the process that creates a file in an
extended attribute in the file's MDT inode, at file creation time.

The name of the xattr is determined by a new sysfs parameter
"mdt.*.job_xattr" so that the admin can choose a name that does
not conflict with other uses they may have for a given xattr.
The default value is "user.job". A value of "NONE" means that
the jobid will not be stored in the inode.

If the name is in the user namespace "user.", then the name portion
can be up to 7 alphanumeric characters long. The admin can choose
the trusted namespace to prevent users from modifying the value,
but only "trusted.job" is allowed in this namespace.

Allowing users to modify the contents of the xattr is helpful so
that the jobid can be preserved even when files are moved with tools
like `cp` or `rsync`, and when copied from one filesystem to another.

Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Iad78a5ec6fbc4b761ff481141763bdd0cdcd0128
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
9 months agoLU-9859 lnet: simplify cfs_expr_list_parse() 43/50843/6
Mr NeilBrown [Wed, 25 Nov 2020 00:04:09 +0000 (11:04 +1100)]
LU-9859 lnet: simplify cfs_expr_list_parse()

If we dup the string first, we can use a mutating approach to parsing,
allowing us to use standard tools like strsep, strcmp, kstrtouint,
etc.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I40a1f7bc65bd122d22cd53cf3c645f4a3730f82e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50843
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16770 llite: prune object without layout lock first 42/50742/6
Andriy Skulysh [Tue, 26 Jul 2022 11:10:43 +0000 (14:10 +0300)]
LU-16770 llite: prune object without layout lock first

lov_layout_change() calls cl_object_prune() before
changing layout. It may lead to eviction from MDT
in case slow responce from OST.

To reduce risk of possible eviction call cl_object_prune()
without layout lock held before calling lov_layout_change()

vvp_prune() attempts to sync and truncate page cache pages.
osc_page_delete() may encounter page cache pages in non-clean state
during truncate because there's a race window between sync and truncate.
Writes may stick into this window and generate dirty or writeback pages.

This window is usually protected with a special truncate semaphore e.g.
when truncate is requested from the truncate syscall.

Let's use this semaphore to avoid write vs truncate race in vvp_prune().

Change-Id: Ie2ee29ea1e792e1b34b6de068ff2b84fd8f52f2a
HPE-bug-id: LUS-9927, LUS-11612
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50742
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-6142 lod: remove custom macros in lod_qos.c 08/51808/2
Timothy Day [Sun, 30 Jul 2023 16:33:06 +0000 (16:33 +0000)]
LU-6142 lod: remove custom macros in lod_qos.c

lod_qos.c redefines some common Lustre macros,
for use only in that file. This is not needed.
Replace these macros with the usual Lustre
macros instead. Remove some unused macros at
the same time.

There was some debug code under a '#if 0'.
Since enabling this requires a change to the
source code anyway, a potential debugger can
just re-add this code themselves if they
really need it.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1752eab59ce5792f1ca3e9f698bda370d9ac75b1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51808
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16982 ldiskfs: Fix crash after "umount -d -f /mnt/..." 60/51760/5
Vitaliy Kuznetsov [Fri, 28 Jul 2023 15:40:14 +0000 (19:40 +0400)]
LU-16982 ldiskfs: Fix crash after "umount -d -f /mnt/..."

This patch adds an extra state check during the unmount process;
Since there was the following problem:
Lustre: DEBUG MARKER: umount -d -f /mnt/lustre-mds4
kernel BUG at fs/jbd2/transaction.c:378!
CPU: 0 PID: 310834 Comm: kworker/0:2 4.18.0-477.15...
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
Workqueue: events flush_stashed_stats_work [ldiskfs]
RIP: 0010:start_this_handle+0x22c/0x520 [jbd2]
Call Trace:
 jbd2__journal_start+0xee/0x1f0 [jbd2]
 jbd2_journal_start+0x19/0x20 [jbd2]
 flush_stashed_stats_work+0x36/0x90 [ldiskfs]
 process_one_work+0x1a7/0x360
 worker_thread+0x30/0x390
 kthread+0x134/0x150

Fixes: e27a7b33d6 ("LU-16298 ldiskfs: Periodically write ldiskfs superblock")
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I162d3416ca1fe9bd09f1102ccca892db05719016
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51760
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-13805 tests: add unaligned io to multiop 90/49990/24
Patrick Farrell [Tue, 14 Feb 2023 18:29:09 +0000 (13:29 -0500)]
LU-13805 tests: add unaligned io to multiop

Add memory unaligned IO support to multiop.

This will be used by tests for unaligned DIO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I38c049690610d34564a15e57f37c052105ab2066
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49990
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16954 llite: do not set SB_I_CGROUPWB on super block 01/51701/3
Li Dongyang [Wed, 26 Jul 2023 10:52:24 +0000 (20:52 +1000)]
LU-16954 llite: do not set SB_I_CGROUPWB on super block

On clients with a more recent kernel e.g. ubuntu2204,
this makes the mount fails sometimes with
sysfs: cannot create duplicate filename '/devices/virtual/bdi/lustre-ffff8dd549f3d000'

Change-Id: Ie15e41eb9d039829545e1d69f97ed9e13f89e53e
Fixes: f5a75ea44d ("LU-16697 llite: Set BDI_CAP_* flags for lustre")
Test-Parameters: clientdistro=ubuntu2204 testlist=sanity,conf-sanity
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51701
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoNew tag 2.15.57 2.15.57 v2_15_57
Oleg Drokin [Tue, 1 Aug 2023 06:17:39 +0000 (02:17 -0400)]
New tag 2.15.57

Change-Id: Ice12bbb65d4d455b2beea14e83a9ab663bda237c

9 months agoLU-16983 mdc: check errcode prior mdc_fill_lvb() call 61/51761/2
Mikhail Pershin [Tue, 25 Jul 2023 22:09:31 +0000 (01:09 +0300)]
LU-16983 mdc: check errcode prior mdc_fill_lvb() call

The mdc_enqueue_fini() can be called with negative
errcode parameter if request processing was failed.
In that case the mdc_fill_lvb() shouldn't be called.

Issue may occur with DoM files, old server (<2.14) and
new client. The problem is in new client code.

Test-Parameters: testlist=racer serverversion=EXA5.2.8
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I884398beada4286bc07875247e15b41120f73a3e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16979 utils: enable throttling mirror extend 58/51758/2
Alex Zhuravlev [Tue, 25 Jul 2023 15:07:19 +0000 (18:07 +0300)]
LU-16979 utils: enable throttling mirror extend

this can be useful in some scenarios like massive mirror
creation.

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia84054f3519cd5cef37aaabb2ae605fb6ea200e0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51758
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-16796 obdclass: Change struct jobid_pid_map to use refcount_t 47/51747/3
Arshad Hussain [Sun, 23 Jul 2023 05:40:04 +0000 (11:10 +0530)]
LU-16796 obdclass: Change struct jobid_pid_map to use refcount_t

This patch changes struct jobid_pid_map to use
refcount_t(kref) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia3458d5605a8cff2bb65476495c321fa98cf01dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51747
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16969 build: check that pkg-config is installed 16/51716/3
Timothy Day [Wed, 19 Jul 2023 17:23:24 +0000 (17:23 +0000)]
LU-16969 build: check that pkg-config is installed

PKG_CHECK_MODULES macro fails in very annoying to debug ways.
Often, this will fail with:

 syntax error near unexpected token `LIBNL3,'
 ` PKG_CHECK_MODULES(LIBNL3, libnl-genl-3.0 >= 3.1)'

and provide no indication that the real error is that
pkg-config is not installed. Adding an explicit check
for pkg-config will make the error more self-evident.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ic2ee8e4c3ec3fa2e03c5ece03e6a9ce335133578
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51716
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-16958 llite: migrate vs regular ops deadlock 41/51641/2
Bobi Jam [Wed, 12 Jul 2023 15:05:27 +0000 (23:05 +0800)]
LU-16958 llite: migrate vs regular ops deadlock

When it need to lock inode in lov_conf_set(), it could have hold
inode's lli_layout_mutex, we need unlock the layout mutex before
taking its inode lock to keep the lock order.

Fixes: 51d62f2122f ("LU-16637 llite: call truncate_inode_pages() in inode lock")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7ee58039a6d31daefc625ac571a52baf112f8151
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51641
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16878 tests: use RUNAS_UID / RUNAS_GID for NRS TBF 38/51238/5
James Simmons [Thu, 20 Jul 2023 19:57:36 +0000 (15:57 -0400)]
LU-16878 tests: use RUNAS_UID / RUNAS_GID for NRS TBF

Some of the sanityn NRS TBF test hardcode the use of uid 500 and gid
500. They are not guaranteed to exist so use RUNAS_UID and RUNAS_GID
instead.

Test-Parameters: trivial testlist=sanityn env=ONLY=77
Change-Id: Ie987c70e94918c5cddadb632a4a3a3caac12c96f
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51238
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16796 libcfs: Remove reference to LASSERT_ATOMIC_ZERO 04/51004/6
Arshad Hussain [Tue, 16 May 2023 03:00:49 +0000 (08:30 +0530)]
LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_ZERO

This patch removes all reference to LASSERT_ATOMIC_ZERO macro.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I73259599d1dee6277fadf66181699f1282274a80
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51004
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16430 ptlrpc: racy rq_obsolete bit modification 05/49505/6
Andriy Skulysh [Thu, 24 Nov 2022 13:18:04 +0000 (15:18 +0200)]
LU-16430 ptlrpc: racy rq_obsolete bit modification

Racy bit modification causes assertion failure in
ptlrpc_at_remove_timed():
ASSERTION( !list_empty(&req->rq_srv.sr_timed_list) )

rq_obsolete is a bit field, so it's modification
isn't atomic and should be modified under rq_lock.

Change-Id: Ib1d3ad189a78b71ecf5b01585478922e984c9568
HPE-bug-id: LUS-11368
Fixes:  23773b32bf ("LU-11444 ptlrpc: resend may corrupt the data")
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49505
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16053 build: Update zfs configure checks 89/48089/10
Shaun Tancheff [Mon, 22 May 2023 12:36:21 +0000 (07:36 -0500)]
LU-16053 build: Update zfs configure checks

From Brian Behlendorf <behlendorf1@llnl.gov>:

update dmu_*_by_dnode checks
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update zap_*_by_dnode checks
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update multihost protection check
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.  Drop the compatibility code required to
 support OpenZFS releases older than 0.7.0.

update userobj accounting check
 Provided as a feature since ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update dmu_prefetch() check
 Provided since at least ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update dmu_object_alloc_dnsize() check
 Provided since at least ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update spa_maxblocksize() check
 Provided since at least ZFS 0.7.0, convert to a fatal configure
 error when unavailable.

update dsl_pool_config_enter/exit check
 Convert to a fatal configure error, these functions have
 been provided since at least ZFS 0.7.x.

replace sa_spill_block() check
 The sa_spill_block() function was removed after the ZFS 0.6.x
 release.  Replace the check with one for use zio_buf_alloc/free
 which have been available since 0.7.x.

 The dsl_sync_task_do_nowait() function has not been provided
 by since the 0.6.x releases.  Furthermore, the results of this
 check are unused by Lustre so let's just remove it.

Test-Parameters: trivial
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3c1597e56100961178f9001e918ffb9aa3558706
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-6142 obd: remove OBP and MDP macros 39/51739/2
Timothy Day [Fri, 21 Jul 2023 20:16:03 +0000 (20:16 +0000)]
LU-6142 obd: remove OBP and MDP macros

These macros save very little space, make it harder
to understand the code (by adding one more thing to
remember) and make it impossible to grep for
o_* and m_* functions. Luckily, they are only used in
a few places. So, remove them and all references.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I4c23199ca53c906ca190a81ffdf916ff6cff9a0b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-6142 obd: fix white space, header 38/51738/2
Timothy Day [Fri, 21 Jul 2023 20:29:32 +0000 (20:29 +0000)]
LU-6142 obd: fix white space, header

Convert all of the remaining spaces to tabs. Also,
add SPDX text to file.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I2d4e71646f7aaa286f7500564c817c76a4b716ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51738
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-11036 test: race in sanity-lfsck test_8 20/51720/3
Lai Siyao [Mon, 10 Jul 2023 04:30:28 +0000 (00:30 -0400)]
LU-11036 test: race in sanity-lfsck test_8

In sanity-lfsck test_8, "sleep 1" is run after START_NAMESPACE,
but it still has chance that LFSCK status is complete but LFSCK
thread not quit yet, therefore the following START_NAMESPACE may fail
with -EALREADY. Just check the first lfsck started scanning.

And similarly use wait_update to check flags for DELAY3.

Test-Parameters: trivial MDSCOUNT=2 MDTCOUNT=4 testlist=sanity-lfsck env=ONLY=8,ONLY_REPEAT=10
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie1f612bebb52c4755e5b4e13d58ab8bf2aeb2832
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51720
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9680 utils: add updating the key table for Netlink. 15/51715/3
James Simmons [Wed, 19 Jul 2023 16:07:26 +0000 (12:07 -0400)]
LU-9680 utils: add updating the key table for Netlink.

Currently lnetconfig implementation only sends the key table once
to construct a YAML document. Add the ability to update the key
table at a latter time. New keys will be used by the YAML
document.

Test-Parameters: trivial
Change-Id: Ie2201f91eb24d06c7e2a2d4abe3da3805f74e5a7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51715
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-9680 contrib: share libyaml C code generator 08/51508/5
James Simmons [Mon, 24 Jul 2023 16:10:17 +0000 (12:10 -0400)]
LU-9680 contrib: share libyaml C code generator

Writing proper libyaml C code is not easy. So I wrote an
application that generates the C code to help the developer not
struggle starting from scratch. It wouldn't be a one to one
copy and paste but it greatly helps. The build the application
just do gcc -lyaml yaml-event-dump.c

Test-Parameters: trivial
Change-Id: I1b570dbfc3ea2e6a7ec77b3743aa4cd80aba2acb
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51508
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16843 ldiskfs: merge extent blocks 96/51096/14
Alex Zhuravlev [Tue, 23 May 2023 13:30:58 +0000 (16:30 +0300)]
LU-16843 ldiskfs: merge extent blocks

There are cases (e.g. file written synchronously with discontiguous
blocks that are later filled in) when a lot of extents are created
initially, then the extents get merged over time, but there is no
way to merge the index blocks.  This can cause a very deep extent
index tree (above 5 levels) and cause problems like:

inode has invalid extent depth: 6

Merge leave/index blocks (one at each level at most) to right/left
when extents are removed from the index.

submitted to ext4@ maillist:
https://lore.kernel.org/linux-ext4/7A2B8861-96AA-4815-BB58-180F63F62436@whamcloud.com/

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I746c0917e746eb442d3c69a23f591d9cdade76fa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16965 obd: remove unused obd_evict_inprogress 81/51681/2
Timothy Day [Fri, 14 Jul 2023 15:42:39 +0000 (15:42 +0000)]
LU-16965 obd: remove unused obd_evict_inprogress

Remove the atomic_t struct field obd_evict_inprogress
from 'struct obd_device'. This field was only ever
incremented in a unused function that was removed in
a previous patch. Hence, remove it altogther. This
patch also removes the associated wait queue.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id151c1e6a0adde8c1aeb6dbc903b9d98d00fd21d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51681
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
9 months agoLU-15553 test: mkdir_on_mdt0 in recovery-small.sh 69/51669/3
Lai Siyao [Sat, 8 Jul 2023 20:35:43 +0000 (16:35 -0400)]
LU-15553 test: mkdir_on_mdt0 in recovery-small.sh

Many subtests in recovery-small.sh requires test dir be created on
MDT0, replace mkdir with mkdir_on_mdt0.

Fixes: b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")

Test-Parameters: trivial
Test-Parameters: testlist=recovery-small,recovery-small,recovery-small
Test-Parameters: MDSCOUNT=2 MDTCOUNT=4 testlist=recovery-small,recovery-small,recovery-small
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibc37b2dd25bcd94794392f5ff8a79df2e7932dcc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51669
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16951 test: don't call echo in function call 15/51615/2
Hongchao Zhang [Thu, 6 Jul 2023 17:16:09 +0000 (01:16 +0800)]
LU-16951 test: don't call echo in function call

In sanity-quota, the call '$(get_slave_nr expr "foo")' will fail
if there is "echo" call in "wait_update_facet/wait_update_cond".

Test-Parameters: trivial
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ib35bf8ccd7eb121a0a2852ba7ed69ad9b01f271a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51615
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
9 months agoLU-16947 tests: On error correctly kill multiop 89/51589/2
Arshad Hussain [Thu, 6 Jul 2023 10:25:04 +0000 (15:55 +0530)]
LU-16947 tests: On error correctly kill multiop

multiop_bg_pause under test-framework starts multiop
in background and waits for signal if "_" option is
provided. On 'verbose' mode the PAUSING string is
printed on console which is checked and if not found
error is reported by multiop_bg_pause function.

On error, it is required to kill the existing running
multiop binary and if not done will eventually timeout
and not exit the test.

Currently on error multiop_bg_pause function incorrectly
sends signal to wrong PID. This patch fixes this issue.

Test-Parameters: trivial testlist=replay-single mdscount=2 mdtcount=4
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3fb505302615512a891725e7339a6f0238c2cdab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51589
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>