Whamcloud - gitweb
fs/lustre-release.git
5 months agoLU-17358 lprocfs: make job_stats job_id valid yaml 24/53424/2
Nathaniel Clark [Tue, 12 Dec 2023 18:05:22 +0000 (13:05 -0500)]
LU-17358 lprocfs: make job_stats job_id valid yaml

Fix quoting job_id to account for leading '@' being reserved.

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ifce3edc9b636db2f059ab9960488972a152d2e7a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53424
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17349 tests: sanity-quota_81 decrease timeout 84/53384/3
Sergey Cheremencev [Sun, 3 Dec 2023 04:06:23 +0000 (07:06 +0300)]
LU-17349 tests: sanity-quota_81 decrease timeout

Decrease cfs fail timeout in sanity-quota_81 from 30
to 10 seconds to avoid soft lockup.

Fixes: 862f0baa7c21 ("LU-15097 quota: stop pool_recalc before killing pool")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I8630db7b3948b335fef5d5349f960f79cb877fc3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53384
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17347 debs: also move .ddeb files into debs/ 78/53378/2
Aurelien Degremont [Fri, 8 Dec 2023 12:34:09 +0000 (13:34 +0100)]
LU-17347 debs: also move .ddeb files into debs/

When building debian packages, the resulting packages are
moved into a 'debs/' subdir.

Don't miss the debug symbol packages 'dbgsym', which are
suffixed .ddeb.

Also add .buildinfo file.

Test-Parameters: trivial
Change-Id: I52d0bddfaafc67c4a2a2dbc786d7f320c0b979f8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17317 gss: no cache flush for rsi and rsc 77/53377/3
Sebastien Buisson [Tue, 5 Dec 2023 16:02:21 +0000 (17:02 +0100)]
LU-17317 gss: no cache flush for rsi and rsc

RPCSEC init and RPCSEC context caches hold gss-related information
of security contexts established between network peers. These cache
entries are tightly coupled with contexts handled in the sptlrpc layer
so they must not be purged directly. They are inserted into the cache
when sptlrpc security contexts are established, and removed when the
corresponding security contexts are destroyed.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Fixes: 8d828762d1 ("LU-17015 gss: support large kerberos token for rpc sec init")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I903f75a4b5229286fcaed3e9d96b5eee7f653f15
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53377
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9] 57/53357/2
Jian Yu [Thu, 7 Dec 2023 00:52:50 +0000 (16:52 -0800)]
LU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9]

Update RHEL 8.9 kernel to 4.18.0-513.9.1.el8_9.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-3

Change-Id: Ied0d2873974a3c8cc6e346373457c8ebc09740d6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53357
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17336 gss: fix __user pointer in rsi_upcall_seq_write 42/53342/3
Sebastien Buisson [Wed, 6 Dec 2023 08:15:18 +0000 (09:15 +0100)]
LU-17336 gss: fix __user pointer in rsi_upcall_seq_write

rsi_upcall_seq_write() uses sscanf to get the string passed from
userspace, but this needs to be copied to a kernel buffer first.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2ec875b7c6c158695857fe912ec1dd9f41ddc25d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17000 utils: Check return value of yaml_parser_initialize 31/53331/2
Arshad Hussain [Tue, 5 Dec 2023 08:54:42 +0000 (14:24 +0530)]
LU-17000 utils: Check return value of yaml_parser_initialize

This patch adds return value checks to function
yaml_parser_initialize() and fopen() under lustre_cfg.c
And funciton cYAML_build_tree() under cyaml.c

Test-Parameters: trivial
CoverityID: 410239 ("Unchecked return value")
CoverityID: 410238 ("Unchecked return value")
Fixes: 65062463 (LU-14359 hsm: support a flatter HSM archive format)
Fixes: 8961f2d8 (LU-4939 utils: allow configuration through yaml files)
Change-Id: I67a34adee3e4d25f97244487684a613426637a70
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53331
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16097 quota: fix Null pointer dereference 30/53330/3
Hongchao Zhang [Sun, 26 Nov 2023 11:56:43 +0000 (19:56 +0800)]
LU-16097 quota: fix Null pointer dereference

The "qbody" should be checked whether it is NULL or not.

CoverityID: 410242 ("Dereference after null check")

Fixes: 57ac32a2 ("LU-16097 quota: release preacquired quota when over limits")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Idab61f3ebac24307c6d5db0d42429914858d21cb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53330
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17000 lnet: Fix dereference after NULL under ksocknal_recv_hello 05/53305/3
Arshad Hussain [Fri, 1 Dec 2023 04:58:32 +0000 (23:58 -0500)]
LU-17000 lnet: Fix dereference after NULL under ksocknal_recv_hello

This patch fixes 'conn->ksnc_proto' which was
dereferenced under function ksocknal_recv_hello()
even though it could be NULL.

This patch also removes 'returns' in between
the function and replaces it with 'goto'.
Allowing exit from a single place.

CoverityID: 410244 ("Dereference after null check")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Fixes: cb5f92c0e (LU-10391 ksocklnd: use ksocknal_protocol v4 for IPv6)
Change-Id: I95196d481b537281ab8643f1ee6162db450bef20
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53305
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-7328 misc: no non-ASCII characters in commit messages 04/53304/2
Andreas Dilger [Thu, 30 Nov 2023 23:00:38 +0000 (16:00 -0700)]
LU-7328 misc: no non-ASCII characters in commit messages

Some commit messages have control characters, or fancy quotation
marks, or mdash hyphens or similar, and this messes up the display
of "git log" and other tools depending on the current locale and
character set used in the terminal.

Add a check into commit-msg to reject commit messages that have
non-ASCII characters. This does not apply to characters used in
the Signed-off-by: or similar fields that list people's names.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I99d0954a68f8a5391195553ebf4b69181b6991f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53304
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17334 lov: handle object created on newly added OST 35/53335/8
Andreas Dilger [Tue, 5 Dec 2023 20:45:44 +0000 (12:45 -0800)]
LU-17334 lov: handle object created on newly added OST

When a new OST is added to a filesystem without no_create,
then a new object created on the OST relatively quickly
after it is added to the filesystem, in particular because
the new OST would be preferred by QOS space balancing
due to lots of free space. However, it might take a few
seconds for the addition of the new OST to be propagated
across all of the clients, so there is a risk that the MDS
creates file object on OSTs that a client is not yet aware of,
which returns an error to the application immediately.

This patch fixes the issue by adding a loop in lsme_unpack()
that is waiting and retrying for some number of seconds for
the filesystem layout to be updated if either the
"loi->loi_ost_idx >= lov->desc.ld_tgt_count" or "!ltd"
condition is hit.

Change-Id: Idc29b8c66079afaea25428577daf51370fa2b084
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53335
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14651 build: fix build for el7.9 kernels 03/53503/2
Andrew Perepechko [Mon, 18 Dec 2023 18:19:26 +0000 (11:19 -0700)]
LU-14651 build: fix build for el7.9 kernels

Handle extra setattr_prepare() argument added in Linux 5.12 kernels
when building on older kernels.

HPE-bug-id: LUS-12059
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: Ie7fd1c4d51b7a9b086cfca0db941321cbcce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53503
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: James Simmons <jsimmons@infradead.org>
5 months agoLU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn 98/53298/2
Serguei Smirnov [Thu, 30 Nov 2023 18:55:11 +0000 (10:55 -0800)]
LU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn

There were examples in the field with RoCE setups which demonstrate
that CM_EVENT_UNREACHABLE may be received when connection is already
in ESTABLISHED state. This causes an assert in kiblnd_cm_callback to
fail.

Handle this in a more gracious manner: report the event as unexpected
and allow the flow to continue. If there are indeed issues on
the connection, it is expected to report transaction errors later
and get cleaned up without crashing the whole system.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If32166fe9fc59e025609c2035cb1c03d3bed22f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53298
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17251 osp: start OST object precreate earlier 45/53245/4
Andreas Dilger [Sun, 26 Nov 2023 05:58:09 +0000 (22:58 -0700)]
LU-17251 osp: start OST object precreate earlier

If the OST object precreate count gets large (usually due to high
MDT file create workload, but sometimes also forced during testing)
then send an OST_CREATE RPC sooner when the number of precreated
objects gets low.

Currently the MDS will wait until 1/2 of the precreated OST objects
are consumed, but if create_count = 10000, then this can put bursty
create workloads on the OST.  Instead, send an OST_CREATE RPC when
the precreate pool is at most 1024 objects below target, so that the
MDS keeps its precreated pool more full and the OST doesn't have to
create so many objects at once (which also locks object directories
for a longer time).

Don't set opd_force_creation=true when osp.*.create_count is set
larger, and instead rely on the improved precreate check to force
OST object creation to start sooner, as opd_force_creation=true
can cause the OSP precreation to stop completely in some cases.

Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Fixes: df5b4c0a8b ("LU-17251 osp: force precreate if create_count grows")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id2d12636d535485919ca5eec3adb18b1e6ce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53245
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16502 lutf: remove lutf_start.sh wrapper script 37/53237/2
Timothy Day [Fri, 24 Nov 2023 17:38:10 +0000 (17:38 +0000)]
LU-16502 lutf: remove lutf_start.sh wrapper script

Remove bash wrapper script lutf_start.sh. Set the environment
natively in Python. LUTF currently involves a number of nested
wrapper scripts. Hence, this patch aims to simplify LUTF.

It also makes it simplier to import this script into another
Python script, by providing a reusable function to set the
environment natively.

Test-Parameters: @lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I56d80c12f9e50f3f8de1668ffa04c855a9829601
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53237
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16502 python: improve support for virtual environments 09/53209/2
Timothy Day [Wed, 22 Nov 2023 17:56:17 +0000 (17:56 +0000)]
LU-16502 python: improve support for virtual environments

Python virutal environments make it easy to install the
correct Python packages isolated from the rest of the
system.

 https://docs.python.org/3/library/venv.html

.venv is added to .gitignore and a simple virtual environment
example has been added to the README.

This patch collects all of the requirements for various
scripts in the Lustre tree and consolidates them in a
top level requirements.txt. lu_object.py spacing was
fixed due to parsing errors.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I69d074e9ba50022817bd243fb82d004366ab6adf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53209
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16502 lutf: add proper config option and fix bugs 00/53200/3
Timothy Day [Tue, 21 Nov 2023 19:19:24 +0000 (19:19 +0000)]
LU-16502 lutf: add proper config option and fix bugs

LUTF did not have a proper configuration option. Since
no message was printed at configure time, this made it
hard to debug why LUTF was not being built.

Fix a few minor bugs in headers that prevented shared
libraries from being `import`ed by python.

Fix a small Clang error in liblutf_agent.c.

Test-Parameters: @lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I6680b203bef08b7afa326a1cbe30c96b5c29e95c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53200
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17000 utils: fix leak in 'lfs find' error handling 82/53182/4
Arshad Hussain [Mon, 20 Nov 2023 07:43:07 +0000 (13:13 +0530)]
LU-17000 utils: fix leak in 'lfs find' error handling

Fix memory leak reported by Coverity in
setup_indexes() in case of errors during
OST UUID initialization.

CoverityID: 397693 ("Resource leak")

Test-Parameters: trivial testlist=sanity,conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Fixes: 05334b90a5d ("LU-16331 utils: fix 'lfs find -O <uuid>' with gaps")
Change-Id: Ibfd10cebaf3198ae2e9bb35686be420e4cd0050b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53182
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14361 tests: add aheadmany to .gitignore 73/53173/2
Timothy Day [Fri, 17 Nov 2023 18:14:15 +0000 (18:14 +0000)]
LU-14361 tests: add aheadmany to .gitignore

Add aheadmany to .gitignore.

Fixes: 5317f8a ("LU-14361 statahead: Add test for statahead advise")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1003200b7ed34e90d2aa0f75cb4c4f071eaeea04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17261 lov: ignore broken components 96/52996/5
Alex Zhuravlev [Sun, 5 Nov 2023 13:51:29 +0000 (16:51 +0300)]
LU-17261 lov: ignore broken components

if some component of a mirrored file is broken, it makes sense
to try another (possible valid) replica rather than give up
immediately.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I32ea0efa90109f5159bf8b6c4e0efe1d543580c3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52996
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17263 utils: 'lfs find -size' to use 512-byte units 94/52994/6
Andreas Dilger [Sun, 5 Nov 2023 05:44:02 +0000 (23:44 -0600)]
LU-17263 utils: 'lfs find -size' to use 512-byte units

Change the 'lfs find -size' argument to 512-byte blocks by default if
no unit is given.  This better matches find(1) and avoids confusion
when converting "find" arguments to "lfs find".  Accept the 'c' suffix
like find(1) to specify a number of characters (bytes).

Most users/scripts will specify a unit, so it is expected not to cause
significant upset with this change.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3124e667acc06928f41a3d3006e1d9b4a43ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17137 utils: l_getidentity Coverity cleanup 15/53315/3
Shaun Tancheff [Mon, 4 Dec 2023 10:42:41 +0000 (04:42 -0600)]
LU-17137 utils: l_getidentity Coverity cleanup

Do not assign newmod a value past the end of the allocated
space. This can confuse coverity.

Instead only assign valid addresses (or NULL).

CoverityID: 410235 ("Memory - illegal access")

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I767ed1273ebfab68d634b3ff22b81a4621405dd2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53315
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-16762 statahead: wait until statahead thread quit 02/52302/3
Qian Yingjin [Thu, 7 Sep 2023 08:33:02 +0000 (04:33 -0400)]
LU-16762 statahead: wait until statahead thread quit

It must wait until statahead thread quit. After that, we can get
accurate hit/miss stats for stat() workload such as "ls -l".

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I902b299e039de6c584b386856fb3f7a8989eb73b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16796 lov: Change struct pool_desc to use kref 22/52122/5
Arshad Hussain [Sun, 27 Aug 2023 21:39:47 +0000 (03:09 +0530)]
LU-16796 lov: Change struct pool_desc to use kref

This patch changes struct pool_desc to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3829de190ec148c2e087f6a0262bf3bb76c196af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52122
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-16397 test: check quota setting on QSD 33/49533/5
Hongchao Zhang [Fri, 24 Nov 2023 08:33:10 +0000 (16:33 +0800)]
LU-16397 test: check quota setting on QSD

In some case, the quota setting at QMT could not be transfered to
QSD in time, which could cause the test to fail.
This patch adds check on QSD after setting the quota limit by LFS.

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Ia999317a36a0f97c1f66726cdc10e9edac3d8a53
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49533
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14810 lnet: Cancel discovery ping/push on shutdown 56/53356/2
Chris Horn [Tue, 5 Dec 2023 09:56:57 +0000 (03:56 -0600)]
LU-14810 lnet: Cancel discovery ping/push on shutdown

Discovery shutdown can race with ping and push events. In some cases
this can result in failing to unlink ping/push MDs on shutdown.
Protect against this by checking for PING/PUSH_FAILED state on peers
on the request queue.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=500,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I84a1f5beb6508651bc62e1dd93271f9e72f5081c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53356
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-14287 tests: Add 'fallocate' to racer 63/41663/23
Arshad Hussain [Fri, 1 Dec 2023 10:20:22 +0000 (15:50 +0530)]
LU-14287 tests: Add 'fallocate' to racer

This patch adds fallocate(file_fallocate.sh)
under racer runs

Test-Parameters: trivial testlist=racer
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I8e807bfc5c2b29dfb610a0b35e7083a9609517b0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 lnet: Ping target corrupted 20/53320/3
Chris Horn [Mon, 4 Dec 2023 21:14:53 +0000 (15:14 -0600)]
LU-10391 lnet: Ping target corrupted

If NIs with large NIDs are added then the discovery ping target
can become corrupted. This is due to a typo in
len_ping_target_install_locked(). Instead of writing the NI status to
the lnet_ni_large_status::ns_status field, we were instead writing the
status value 8 bytes into the lnet_ni_large_status (ns_status is
offset 8 into lnet_ni_status). This overwrites some of the struct
lnet_nid.

Test-Parameters: trivial
Fixes: db0fb8f ("LU-10391 lnet: allow ping packet to contain large nids")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I7ccff7ae59feac5edc6a97a86c861ffbdb0bb333
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53320
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17306 ofd: return error for reconnection 95/53195/5
Alexander Boyko [Thu, 16 Nov 2023 22:57:24 +0000 (17:57 -0500)]
LU-17306 ofd: return error for reconnection

During the cleanup orphan phase, reconnection leads to unsynchronized
last id between MDT and OST. This means that MDT could assign non
existing objects to a client for a file create operation.

ofd_create_hdl()) capstor-OST0087: dropping old orphan cleanup request
MDS LAST_ID [0x2540000400:0xb6941:0x0] (747841) is 352 behind OST
    LAST_ID [0x2540000400:0xb6aa1:0x0] (748193), trust the OST

recovery-small 144c reproduce bug where MDT lost synchronization
with OST.

Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request")
HPE-bug-id: LUS-11969
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I22c3d3b3db2acc9ad8f1b978b234afe7d3eef51d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16826 tests: lfsck to repair a dangling remote entry 98/50998/9
Alexander Zarochentsev [Mon, 15 May 2023 17:47:39 +0000 (13:47 -0400)]
LU-16826 tests: lfsck to repair a dangling remote entry

Testing how lfsck repairs a dir entry to a non-existing
Lustre object.

HPE-bug-id: LUS-11609
Test-Parameters: trivial testlist=sanity-lfsck fstype=ldiskfs mdscount=2 mdtcount=4 env=ONLY=23d
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I88dc8be98bacd2a199facfe3569567de6a713ff6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50998
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17048 mdd: protect layout change in MDD layer 46/52146/17
Bobi Jam [Mon, 28 Aug 2023 13:08:34 +0000 (21:08 +0800)]
LU-17048 mdd: protect layout change in MDD layer

We need to detect changes to the LOD layout in between transaction
declaration and when the objects are locked during transaction
execution. Otherwise, if another thread has modified the layout
of an object used by the transaction then the declaration may
be incorrect.

This patch save objects' layout generation in transaction delaration
phase, and check whether they have been changed by others in the
transaction execution phase, if that's the case, the transaction will
be retried for several times.

Fixes: b7bd4e3422 ("LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I25fe03c6e8fc4eebccc039e62dfc88db1179cb26
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17312 tests: skip conf-sanity test_53 in interop 26/53226/2
Andreas Dilger [Fri, 24 Nov 2023 07:34:20 +0000 (00:34 -0700)]
LU-17312 tests: skip conf-sanity test_53 in interop

Skip conf-sanity test_53 in interop because older servers cannot
stop any running service threads above threads_max.

Remove old test interop for servers < 2.3.

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: testlist=conf-sanity env=ONLY=53 serverversion=2.12
Fixes: 183cb1e3cd ("LU-947 ptlrpc: allow stopping threads above threads_max")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia95405060c607c7a070720ed32a7a43b1c3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16739 uapi: restore LUSTRE_NODEMAP_NAME_LENGTH in lustre_disk.h 08/53208/4
James Simmons [Wed, 22 Nov 2023 15:29:58 +0000 (10:29 -0500)]
LU-16739 uapi: restore LUSTRE_NODEMAP_NAME_LENGTH in lustre_disk.h

sanity test 400b fails with:

lustre_disk.h:266:18: error: 'LUSTRE_NODEMAP_NAME_LENGTH' undeclared here (not in a function)
 char   ncr_name[LUSTRE_NODEMAP_NAME_LENGTH + 1];

This is due to the move of LUSTRE_NODEMAP_NAME_LENGTH to lustre_idl.h.
Move it back and this time add a message to avoid this in the
future.

Test-Parameters: trivial
Fixes: 8d828762d1 ("LU-17015 gss: support large kerberos token for rpc sec init")
Change-Id: I5479bbf13f26bfd3b4f6e5f7c0c1688d810fca53
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53208
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17057 tests: Fix sanity-sec/0 94/53194/5
Arshad Hussain [Tue, 21 Nov 2023 15:10:51 +0000 (20:40 +0530)]
LU-17057 tests: Fix sanity-sec/0

Command executed through 'runas' on failure breaks
out of running test script. While this failure is
expected. The setting of 'set -e' forces the pipeline
to exit the running script immediately. This patch
fixes this by checking the return value and then
taking the appropriate action.

This patch also fixes 'touch' command to file f4 by
correctly calling it via uid and gid as it was set
few lines above.

Test-Parameters: trivial testlist=sanity-sec env=ONLY=0,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I06e6d22840e31add8c24cf90c31b98464d580ae7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17301 utils: l_getidentity build fix 91/53191/6
Alexander Zarochentsev [Tue, 21 Nov 2023 23:15:06 +0000 (23:15 +0000)]
LU-17301 utils: l_getidentity build fix

The extra shared libs dependencies have an effect on
the l_getidentity libtool wrapper script created
in the source directory. The wrapper script fails
if it is executed as an identity upcall by a Lustre
md server, as it uses some of the core linux utilities
from /bin:

l_getidentity: line 150: ls: command not found
l_getidentity: line 197: rm: command not found
l_getidentity: line 211: rm: command not found
l_getidentity: line 212: mv: command not found
l_getidentity: line 213: rm: command not found

Removing the unnecessary build dependencies fixes
the issue.

Test-Parameters: trivial
Fixes: 5f9f92454e ("LU-16901 utils: l_getidentity with nss module support")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib8b83d5610a4d91ebb556406b563ca16e59dce76
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53191
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-13138 tests: improve sanity/101d 'dd' parsing 88/53188/3
Andreas Dilger [Tue, 21 Nov 2023 01:07:04 +0000 (18:07 -0700)]
LU-13138 tests: improve sanity/101d 'dd' parsing

If 'dd' takes a long time to complete, or if it finishes in an
exact number of seconds, it will not print a decimal point, so
the current regexp will fail to detect the runtime.

Improve test_101d to allow parsing the 'dd' runtime in this case.

Test-Parameters: trivial testlist=sanity env=ONLY=101d,ONLY_REPEAT=100
Fixes: 43ebfad490 ("LU-13138 tests: measure 'dd' time more accurately")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icb030467c76947d0546916e11a91e5afb33ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53188
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
5 months agoLU-17131 ldiskfs: el9.2 encdata and filename-encode 77/53077/4
Shaun Tancheff [Wed, 18 Oct 2023 09:37:21 +0000 (04:37 -0500)]
LU-17131 ldiskfs: el9.2 encdata and filename-encode

Add encryption support for el9.2

Test-Parameters: serverdistro=el9.2 testlist=sanity
Test-Parameters: serverdistro=el9.2 testlist=conf-sanity
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iacda4c0e4107bccb57aece2e8d9cee12a4bcd09b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53077
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17000 gss: Fix Out-of-bounds access under svcgssd_proc.c 20/52920/9
Arshad Hussain [Wed, 1 Nov 2023 06:50:53 +0000 (12:20 +0530)]
LU-17000 gss: Fix Out-of-bounds access under svcgssd_proc.c

Problem reported by coverity was passing 32bit type and
then dereferencing to larger 64bit under function
handle_channel_request(). This patch address this issue.

Since this is an uapi and to catch corner cases like
kernel modules being updated separately from user tools
RSI_DOWNCALL_MAGIC is also changed from 0x6d6dd62a to
0x6d6dd63a.

This patch also changes 32bit member (sid_hash) of
'struct rsi_downcall_data' to 64bit. Which also requires
changing of wiretest.c and wirecheck.c

CoverityID: 404758 ("Out-of-bounds access")
Fixes: 8d828762d1 ("LU-17015 gss: support large kerberos token for rpc sec init")
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8041cd4063f1b1cefdebf5681df426be61820f99
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52920
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16859 lnet: incorrect check for duplicate NI 18/52918/6
Serguei Smirnov [Tue, 31 Oct 2023 21:11:54 +0000 (14:11 -0700)]
LU-16859 lnet: incorrect check for duplicate NI

When NI is being added to an existing LNet, checking against
existing NI interface names currently fails if the new NI
happens to use interface name which is a prefix of one used
by an existing NI.

The following example assumes ib0 and its alias ib0:1 are
configured:

lnetctl net add --net o2ib --if ib0:1
lnetctl net add --net o2ib --if ib0

Fix this by making sure interface strings are compared properly
regardless of relative length.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0d4047118e7d9982fa791a2e324a27aa5d4abaee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52918
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17015 sec: fix PTLRPC_CTX_STATUS_MASK 29/52629/2
Sebastien Buisson [Wed, 11 Oct 2023 13:29:46 +0000 (15:29 +0200)]
LU-17015 sec: fix PTLRPC_CTX_STATUS_MASK

PTLRPC_CTX_STATUS_MASK should not include PTLRPC_CTX_NEW_BIT, which is
a bit index and not a value. Also, according to code in
sptlrpc_req_refresh_ctx():
if (unlikely(test_bit(PTLRPC_CTX_NEW_BIT, &ctx->cc_flags))) {
   if (ctx->cc_ops->refresh)
      ctx->cc_ops->refresh(ctx);
}
a context needs to be refreshed if it has the PTLRPC_CTX_NEW_BIT bit.
So the function to check if context is refreshed, cli_ctx_is_refreshed
should not return true if the PTLRPC_CTX_NEW_BIT bit is set.

In the end, do not replace PTLRPC_CTX_NEW_BIT with anything else in
PTLRPC_CTX_STATUS_MASK. Having PTLRPC_CTX_NEW_BIT was a no-op (bitwise
OR with 0), but this was working as expected. Just cleanup the code to
avoid headaches.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibc2ca9dfaa176b098080f7f2867338b62953b50e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52629
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17144 mdt: set dmv by setxattr 10/52510/6
Lai Siyao [Mon, 25 Sep 2023 14:28:51 +0000 (10:28 -0400)]
LU-17144 mdt: set dmv by setxattr

Client side: convert setxattr("trusted.dmv") to "setdirstripe -D", as
will help restore directory default LMV from backup.

Server side: add a tunable to enable setxattr("trusted.dmv"), it can
be turned on by "lctl set_param -n mdt.*.enable_dmv_xattr=1". It's
off by default. Since empty buffer can be set by setxattr, add check
in server code to avoid crash.

Add sanity 413j.

Test-Parameter: serverversion=2.14 mdtcount=4 testlist=sanity env=ONLY=413j
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I27d784998a9c4a182b4fffb8b06c84e9d9190919
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-17015 gss: remove legacy sunrpc-cache based gss caches 76/52376/18
Sebastien Buisson [Thu, 14 Sep 2023 12:23:07 +0000 (14:23 +0200)]
LU-17015 gss: remove legacy sunrpc-cache based gss caches

Now that GSS caches are based on Lustre's internal upcall cache
mechanism, we can remove the legacy ones based on the sunrpc cache
implementation, as this code is unused.

We can also remove support for updated get_expiry() in Linux 6.3, as
this function is no longer used.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I98d8777d225c723ae061ef360011abfc092e09d8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52376
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-16967 build: Separate lnet LND rpm packaging 92/51692/9
Shaun Tancheff [Sun, 17 Sep 2023 16:48:47 +0000 (11:48 -0500)]
LU-16967 build: Separate lnet LND rpm packaging

Enable separate packaging of lnet lnd kernel modules into
separate packages.

Use --with multiple_lnds to enable separate packages for
lnet lnds:
  [always builds]: kmod-lustre-lnet-socklnd for socklnd.ko
  --with o2ib: for kmod-lustre-lnet-in-kernel-o2iblnd
         ko2iblnd.ko -> in-kernel-ko2iblnd.ko
  --with mofed: kmod-lustre-lnet-o2iblnd for ko2iblnd.ko
  --with kfi: kmod-lustre-lnet-gnilnd for kgnilnd.ko
  --with gni: kmod-lustre-lnet-kfilnd for kkfilnd.ko

Test-Parameters: trivial
HPE-bug-id: LUS-11711
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If2dace7ced96be2a2194f66362e9419b017c625f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51692
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16967 build: Add in-kernel-ko2iblnd driver 15/51915/9
Shaun Tancheff [Fri, 10 Nov 2023 06:22:31 +0000 (00:22 -0600)]
LU-16967 build: Add in-kernel-ko2iblnd driver

Add in-kernel-ko2iblnd.ko for users of in-kernel OFED
and only build ko2iblnd.ko if an external OFED is available

This allows for building and packaging both an external
(MOFED or OPA) o2ib driver and an in-kernel o2ib driver.

Packaging rules will be written so that only enable one
of the o2iblnd drivers can be installed.

In the case of the in-kernel-ko2iblnd.ko driver a symlink
named ko2iblnd.ko will be created to point to the in-kernel
based o2ib driver which allows for a reasonable migration path
for the majority of users.

It is useful for dist build and test to be able to build
both in-kernel IB and external OFED in the same build.

This also means there would be some install/configure
adjustments that ought to have some discussion.

Test-Parameters: trivial
HPE-bug-id: LUS-11711
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8105fad0b20c36705d7e14e3ae976bf3d81e9f1b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10283 mdd: fix parent FID in changelog of striped directory 22/51322/5
Dmitry Ivanov [Mon, 16 May 2022 18:15:19 +0000 (12:15 -0600)]
LU-10283 mdd: fix parent FID in changelog of striped directory

Changelog entry for the file operations such as create, rename,
link, unlink, mkdir referred to parent FID ("p=") as a shard's
FID in a striped directory. The same was true for the source's
parent FID ("sp="). This commit hides the Lustre intrinsics from
user displaying the parent's directory FID instead as expected.

An object might be in a remote MDT, in which case obtaining the parent
FID via the linkEA can be an expensive operation, so the parent FID is
cached in the mdd_object, so that the cost of the cross-MDT RPC is
amortized over the lifetime of the object.

Certain userspace tools might depend on the previous behavior of
displaying the shard's parent FID in the changelog records, so this
canp be enabled by setting mdd.*.enable_shard_pfid=1, if this is
required for compatibility.

HPE-bug-id: LUS-10721
Signed-off-by: Dmitry Ivanov <dmitry.ivanov2@hpe.com>
Change-Id: Iae15b49f5852f36ba62ae1706d3a5f4ebf307bc4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51322
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 months agoLU-16695 llite: switch to ki_flags from f_flags 93/50493/11
Patrick Farrell [Fri, 31 Mar 2023 21:33:40 +0000 (17:33 -0400)]
LU-16695 llite: switch to ki_flags from f_flags

There are possible races between IO checking f_flags and
fcntl changing f_flags.  The kernel fixed most of these by
copying most of the file flags in to the iocb.

Let's follow on and use those copied flags.  This also lets
us change them if we want, since they're now local to the
specific IO.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Change-Id: Ib98cccec0e7888865ec10dc5f76f1d9917a1aef7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50493
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
5 months agoLU-17322 auster: move functions to test-framework.sh 71/53271/4
Timothy Day [Tue, 28 Nov 2023 17:46:35 +0000 (17:46 +0000)]
LU-17322 auster: move functions to test-framework.sh

auster is essentially a wrapper around
test-framework.sh. It's more sensible for functions
to be defined in the test library rather than a test
wrapper. A comment in auster even suggests this.

This small refactor will make it easier to write
alternative testing wrappers.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia10a44396f81f0423c9c15cc9779658292bc739f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53271
Tested-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10973 lutf: Fix order of linking for python modules 78/46478/15
James Simmons [Sun, 31 Jul 2022 14:06:08 +0000 (10:06 -0400)]
LU-10973 lutf: Fix order of linking for python modules

LUTF normally works but in some test cases at startup we got:

ImportError: lustre/test/lutrf/src/_lnetconfig.so:
undefined symbol: lustre_lnet_del_ni

If you check the symbol is there. The issue is the linking order.
We need to put the generated module name before all its
dependencies.

Also, remove cfs_expr_list_match from string.h, move the definition
to nidstrings.c, and make it static.

Test-Parameters: @lnet
Change-Id: Ia57fbd9d5795d845ea14bc1416f968383afcba2b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Signed-off-by: Timothy Day <timday@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46478
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16032 tests: restore delay_unlink_mb in sanity/360 18/53218/2
Andreas Dilger [Thu, 23 Nov 2023 22:56:00 +0000 (15:56 -0700)]
LU-16032 tests: restore delay_unlink_mb in sanity/360

Restore the original value of osd-ldiskfs.*.delay_unlink_mb after
sanity test_360 is finished, so that it doesn't have an impact on
later tests running, in particular sanity-quota.sh was seeing some
delay in freeing quota for files that were just deleted.

Test-Parameters: trivial testlist=sanity-quota
Fixes: a772e90243 ("LU-16032 osd: move unlink of large objects to separate thread")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7c1ab02262afdef2fc51f9fbc3932d954a4f8304
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53218
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-6142 misc: add editorconfig file 11/53211/3
Timothy Day [Wed, 22 Nov 2023 20:23:41 +0000 (20:23 +0000)]
LU-6142 misc: add editorconfig file

EditorConfig is a unified configuration file recognized
by dozens of different text editors. This patch adds a
very simple config.

  https://editorconfig.org/

There is an upstream patch discussing editorconfig:

  https://lkml.org/lkml/2023/6/1/196

This patch aims to be much simpler. EditorConfig must
be enabled for many common editors (Emacs, Vim) and
is enabled by default (but can be disabled) for other
editors (NeoVim).

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I264377f96d6d155f92336083160a392edfc79a4f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53211
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-10391 ptlrpc: update import debugfs to support IPv6 formats 17/53117/7
James Simmons [Wed, 22 Nov 2023 00:26:52 +0000 (19:26 -0500)]
LU-10391 ptlrpc: update import debugfs to support IPv6 formats

When mounting with IPv6 NIDs setting the connection will fail with

LustreError: (lproc_ptlrpc.c:1417:ldebugfs_import_seq_write()) config: wrong instance # d967@tcp::1

This is due to IPv6 NIDs being able to contain "::" which is used
as a delimiter. Update the code to search for '@' which is unique
for the NID and then look for "::". For reading the import we need
to quote all the NID strings to make it valid YAML.

This changes the import output from:

import:
    name: lustre-MDT0000-mdc-ffff96c7070a2800
    target: lustre-MDT0000_UUID
    state: FULL
    ....
    connection:
       failover_nids: [ 10.37.248.15@tcp, 192.168.1.100@tcp ]
       current_connection: 10.37.248.15@tcp
       connection_attempts: 1
       generation: 1
       in-progress_invalidations: 0
       idle: 64 sec
    ....

to the following:

import:
    name: lustre-MDT0000-mdc-ffff96c7070a2800
    target: lustre-MDT0000_UUID
    state: FULL
    ....
    connection:
       failover_nids: [ "10.37.248.15@tcp", "192.168.1.100@tcp" ]
       current_connection: "10.37.248.15@tcp"
       connection_attempts: 1
       generation: 1
       in-progress_invalidations: 0
       idle: 64 sec
    ....

Change-Id: Ie68d544d8733b87d04fa0c2385de2319696b3289
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53117
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 months agoLU-17025 llapi: Verify stripe pool name 63/51963/6
Rajeev Mishra [Tue, 4 Apr 2023 00:42:28 +0000 (00:42 +0000)]
LU-17025 llapi: Verify stripe pool name

Verify the pool exists when setting a stripe within a pool.
This avoids situations where the user specifies a missing
or invalid pool and an invalid stripe is created

test changes: Ensure pool exist before use

ost-pools.sh:
the test_6 test_7c and test_32 created pool before
calling setstripe -p

sanity-pfl.sh:
test_14 created pool before using it

sanity-flr.sh:
Test test_0b, test_0c, test_0e and test_0f
modified to create pool before using it

Test-parameters: trivial testlist=ost-pools,sanity-pfl,sanity-flr
HPE-bug-id: LUS-11510
Fixes: cd1f8527d414 ("LU-14645 utils: optimise setstripe")
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: I146bc6f886cd083b318dc9ea2e5d1943955bd7d6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51963
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16837 lustre: avoid the same member name 66/51766/6
Bobi Jam [Wed, 26 Jul 2023 10:17:47 +0000 (18:17 +0800)]
LU-16837 lustre: avoid the same member name

There are several structures using the same member name, such as
cl_ladvise_io::li_flags, layout_intent::li_flags and
lfsck_instance::li_flags, and this makes it hard to find where it
is used.

This patch renames some structures member prefix to avoid the
homonyms.

Test-Parameters: trivial
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie592afa06dd0abf0c1110843e5d8007a91c68145
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51766
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17307 mdt: get dirent count by request 29/53229/10
Lai Siyao [Sat, 4 Nov 2023 13:32:59 +0000 (09:32 -0400)]
LU-17307 mdt: get dirent count by request

Add MA_DIRENT_CNT/LA_DIRENT_CNT to notify osd to get dirent count.
Set it in mdt_getattr_name_lock() and when auto-split is enabled so it
won't cause overhead when auto-split is disabled, and change
oo_dirent_count type to atomic_t so the result does not become
inaccurate over time from repeated addition/removal (which may
be used to know whether directory is empty or compare directories in
the future).

In osd_dirent_count() set oo_dirent_count to 0 before iteration to
avoid multiple threads iterate at the same time, which means the
result may not be accurate in this case, but it will be eventually.

Fixes: 03a4431dac ("LU-11025 osd: osd_attr_get() returns dirent count")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2be6c0dcfda1c98995a269585c5d8d781a8a3b42
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-17046 tests: fix write success in 1g 38/53038/6
Sergey Cheremencev [Sun, 5 Nov 2023 00:37:54 +0000 (04:37 +0400)]
LU-17046 tests: fix write success in 1g

Increase latest write count in 3 times. This may
happen that the previous write didn't write requested
amount of data. For example, if it had to write 20MB, but
wrote only 17, the final write need to write at least 3MB
to hit EDQUOT. If there are only 2 OSTs it writes only
19MB(17+2) and can't hit EDQUOT.

Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ice5152fa4ba8504eda2ea5513201e340c5ff6220
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53038
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-16518 lnet: fix uninitialized variable in api-ni.c 74/53174/2
Timothy Day [Fri, 17 Nov 2023 18:24:47 +0000 (18:24 +0000)]
LU-16518 lnet: fix uninitialized variable in api-ni.c

Fix new Clang error in api-ni.c:

 warning: variable 'lpni' is used uninitialized whenever
 'if' condition is false [-Wsometimes-uninitialized]

Fixes: f0be006 ("LU-9680 lnet: collect data about peer_ni by using Netlink")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I0895f02aaeb4fbb3b40a6927e77b8f02cfb3bfe8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53174
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17293 kernel: update SLES15 SP5 [5.14.21-150500.55.36.1] 56/53156/2
Jian Yu [Thu, 16 Nov 2023 20:12:59 +0000 (12:12 -0800)]
LU-17293 kernel: update SLES15 SP5 [5.14.21-150500.55.36.1]

Update SLES15 SP5 kernel to 5.14.21-150500.55.36.1 for Lustre client.

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=sles15sp5 testlist=sanity

Change-Id: I5a9afb313e9bf315ef4af5b6602785ee68c4c247
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53156
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17280 scrub: skip dir stripes with OI 78/53078/2
Alexander Boyko [Wed, 8 Nov 2023 10:32:55 +0000 (05:32 -0500)]
LU-17280 scrub: skip dir stripes with OI

After fresh mount and LFSCK start all directory stripes
are added to inconsistent list. So scrub for all stripes
would print LFSCK message "inconsistent OI FID...fixed.
Lets check FID to OI mapping before adding to inconsistent
list.

Also fixing additional debug for scrub.

HPE-bug-id: LUS-11777
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I869f1cf71eb6c10f386a3f388a38032c73d2b41a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17275 kernel: RHEL 8.9 client and server support 71/53071/7
Jian Yu [Mon, 20 Nov 2023 18:46:20 +0000 (10:46 -0800)]
LU-17275 kernel: RHEL 8.9 client and server support

This patch makes changes to support RHEL 8.9 release
with kernel 4.18.0-513.5.1.el8_9 for Lustre client and server.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-3

Change-Id: Ia3672d134534b877bb6aaffb4cea0339bc55974f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53071
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3] 54/53054/5
Jian Yu [Mon, 13 Nov 2023 17:03:02 +0000 (09:03 -0800)]
LU-17274 kernel: new kernel [RHEL 9.3 5.14.0-362.8.1.el9_3]

This patch makes changes to support new RHEL 9.3 release
for Lustre client.

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 clientdistro=el9.3 testlist=sanity
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-3

Change-Id: I9cce1a7d2249cb4df39106c44ba4417411ee0757
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53054
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-17278 ldlm: don't grant failed lock 51/53051/2
Alex Zhuravlev [Thu, 9 Nov 2023 13:29:03 +0000 (16:29 +0300)]
LU-17278 ldlm: don't grant failed lock

lock convert can re-grant lock if it loses some bits. this
procedure can race with the import's invalidation. thus
lock can become invalid (l_granted_mode=LCK_MINMODE):
LustreError: 8637:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist())
ASSERTION( ldlm_is_granted(lock) )

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7bb20d62948224647d7632f2822fba44d39a7713
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53051
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17265 tests: allow margin for sanity/39r 35/53035/5
Arshad Hussain [Wed, 8 Nov 2023 06:38:07 +0000 (12:08 +0530)]
LU-17265 tests: allow margin for sanity/39r

The timestamp may be little outdated due to a gap between
writing a file and checking the timestamp, so take that into
consideration and allow 2 second leniency when comparing
timestamps.

The on-disk inode may also not be flushed from the journal
immediately, so allow some time for it to be updated.

This patch also converts the hex value read via debugfs
to decimal.

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9e765f9cd572fb25821f9a0401c34209b7c3f574
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53035
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17230 socklnd: treat UNKNOWN netif operstate as UP 42/52842/6
Serguei Smirnov [Thu, 26 Oct 2023 18:15:28 +0000 (11:15 -0700)]
LU-17230 socklnd: treat UNKNOWN netif operstate as UP

"UNKNOWN" (IF_OPER_UNKNOWN) operational state doesn't necessarily
mean that the interface can't be used and may be the result of
particular network driver not providing UP/DOWN states,
so it may be incorrect for socklnd to initiate
setting of a "fatal error" flag on a NI using an interface
in "UNKNOWN" operstate.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I39dfa01f3758809440d50cf8b6b11555889ef366
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52842
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
6 months agoLU-17216 ofd: make enable_health_write tunable 82/52782/14
Timothy Day [Sat, 21 Oct 2023 19:11:42 +0000 (19:11 +0000)]
LU-17216 ofd: make enable_health_write tunable

enable_health_write should be tunable rather than a
compilation option. This allows us to test it more
easily and gives admins the option to try it out
without having to recompile their Lustre servers.
It will still be disabled by default.

Add sanity/70 test to run a simple check to ensure
enable_health_write and health_check don't explode.
It's not a thorough check. But it at least checks
that the interfaces appear to work.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7b1832f8acf578b891386e28c5af760070a6862c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17212 gss: survive improper obd or imp at ctx init 55/52755/3
Sebastien Buisson [Thu, 19 Oct 2023 09:11:48 +0000 (11:11 +0200)]
LU-17212 gss: survive improper obd or imp at ctx init

GSS context init requests can happen even after a client has been
unmounted, because they are coming from userspace (request-key,
lgss_keyring).
In this case they must be ignored, and code must be robust to survive
improper, already or partially shutdown obd device or import.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I541727165eadf1fcb7715e416da85d100976cf2f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52755
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17097 osc: delete items in Xarray before its destroy 81/52381/23
James Simmons [Wed, 1 Nov 2023 19:25:12 +0000 (15:25 -0400)]
LU-17097 osc: delete items in Xarray before its destroy

For older debug kernels we get a double free with RCU usage with Xarray.

WARNING: CPU: 2 PID: 21477 at lib/debugobjects.c:286
debug_print_object+0x83/0xa0
 ODEBUG: activate active (active state 1) object type:
rcu_head hint:           (null)
 Modules linked in: lustre(OE) ofd(OE) osp(OE) lod(OE)
ost(OE) mdt(OE) mdd(OE) mgs(OE) lquota(OE) lfsck(OE) obdecho(OE) mgc(OE)
mdc(OE) lov(OE) osc(OE) lmv(OE) fid(OE) fld(OE) ptlrpc_gss(OE)
ptlrpc(OE) obdclass(OE) ksocklnd(OE) lnet(OE) crc32_generic libcfs(OE)
crc_t10dif crct10dif_generic crct10dif_common rpcsec_gss_krb5 squashfs
pcspkr i2c_piix4 i2c_core binfmt_misc ip_tables ext4 mbcache jbd2
ata_generic pata_acpi ata_piix serio_raw libata
 CPU: 2 PID: 21477 Comm: umount Tainted: G           OE
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.16.2-1.fc38 04/01/2014
 Call Trace:
  [<ffffffff817ded29>] dump_stack+0x19/0x1b
  [<ffffffff8108d558>] __warn+0xd8/0x100
  [<ffffffff8108d5df>] warn_slowpath_fmt+0x5f/0x80
  [<ffffffff81414723>] debug_print_object+0x83/0xa0
  [<ffffffff814150af>] debug_object_activate+0x1af/0x210
  [<ffffffff817e8d7e>] ? _raw_spin_unlock+0xe/0x20
  [<ffffffffa0189e60>] ? xas_alloc+0xd0/0xd0 [libcfs]
  [<ffffffff8114dc8f>] __call_rcu+0x3f/0x2d0
  [<ffffffff8114df3d>] call_rcu_sched+0x1d/0x20
  [<ffffffffa0189f44>] xas_free_nodes+0xa4/0xf0 [libcfs]
  [<ffffffffa018b26f>] xa_destroy+0xdf/0xf0 [libcfs]

This can be solved by cleaning up individual items in the Xarray
before destroying the Xarray.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-quota env=ONLY=1,ONLY_REPEAT=100 clientdistro=el7.9
Change-Id: I49c5fb588d1b5c44f37e55500a6f33a2cd3988ee
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52381
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-8191 lustre: convert lmv,lod,lov functions to static 79/51479/7
Timothy Day [Fri, 23 Jun 2023 20:53:00 +0000 (20:53 +0000)]
LU-8191 lustre: convert lmv,lod,lov functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in lmv, lod, and lov static.

Also, remove one unused function: lov_lsm_entry()

Another function is intentionally unused for
debugging purposes. It was detected by static
analysis, but it has been left untouched.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If409226ea201587c7f95d4a65ffaef72671b5ac2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51479
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10391 socklnd: handle IPv6 for zero copy messages 50/53150/2
James Simmons [Wed, 15 Nov 2023 17:53:15 +0000 (12:53 -0500)]
LU-10391 socklnd: handle IPv6 for zero copy messages

When messages exceed a certain size zero copy messages are
created. To support zero copy messages We need to add
KSOCK_PROTO_V4 support. This resolves the error:

LNetError: 5978:0:(socklnd_cb.c:1237:ksocknal_process_receive()) 12345-2601:8c1:c180:2000::36b6@tcp: Unknown ZC-ACK cookie: 0, 272

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I4bc3d03cc5157a0f6ddb1e36ddeac225ed5d0984
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10391 mgs: copy full nid string 15/53115/5
James Simmons [Wed, 22 Nov 2023 01:51:18 +0000 (20:51 -0500)]
LU-10391 mgs: copy full nid string

For IPv6 testing in mgs_steal_client_llog_handler() the full NID
string was not being copied. Instead we copied the size of pointer
not the NID string. Copy the full NID string.

Fixes: c0cb747e ("LU-13306 mgs: use large NIDS in the nid table on the MGS")
Change-Id: I7e19db0b0d3806c1c6fabe2ede0d880a45fe3052
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53115
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-16827 obdfilter: Fix "emfperf obdfilter-survey" error 83/53083/4
Vitaliy Kuznetsov [Fri, 10 Nov 2023 20:09:42 +0000 (21:09 +0100)]
LU-16827 obdfilter: Fix "emfperf obdfilter-survey" error

This patch fixes the definition of the lctl variable. It changes
the logic so that the LCTL value is assigned only when it was
defined earlier.

Fixes: 91a3b286ba ("LU-16827 obdfilter: Fix obdfilter-survery/1a")
Test-Parameters: trivial testlist=obdfilter-survey
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I4dfd7e3d1f78208b33b897d8e6680e59b690014c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53083
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17277 build: Distribute lutf.sh unconditionally 50/53050/2
Shaun Tancheff [Thu, 9 Nov 2023 10:36:52 +0000 (04:36 -0600)]
LU-17277 build: Distribute lutf.sh unconditionally

Do not exclude lutf.sh when building the src.rpm regardless
of the build host suitability to run lutf.sh tests.

HPE-bug-id: LUS-11975
Test-Parameters: trivial
Fixes: ba1fa08a0fd ("LU-10973 lnet: LUTF Python infra")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3beeabbae9f1435a002656bfd27d49a02c3bee27
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53050
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10391 lnet: missing some peer functionality 18/53018/3
James Simmons [Fri, 10 Nov 2023 14:30:22 +0000 (09:30 -0500)]
LU-10391 lnet: missing some peer functionality

For peers if we encounter a bad setup in the peer nis
settings for creation we need to cleanup the entire
peer setup.

For the peers API if one of the peer nis is the same as
the primary nid then treat it as tearing down all peer nis
in the peer deletion case.

Change-Id: I57d2a63a9e31860a5ad7587f73f159a9cad2b3c9
Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8a0fdfa0b28 ("LU-10391 lnet: migrate peer NI control to Netlink")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53018
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10729 tests: replay-dual/22d to wait 43/52343/2
hxing [Tue, 12 Sep 2023 03:38:26 +0000 (11:38 +0800)]
LU-10729 tests: replay-dual/22d to wait

replay-dual/22d should have a similar procedure as 23d:
replay-dual/23d simulates a dropped reply for the executed
update, but previous tests can break this:
 - the update modifies remote llog
 - there can be another uptdate to that remote log
   (from the previous tests)
 - fail_loc (OBD_FAIL_UPDATE_OBJ_NET) is applied to the
   old update
 - the 23d's update gets stuck

so the test has to ensure there is no pending/in-flight
updates.

Test-Parameters: trivial testlist=replay-dual mdscount=2 mdtcount=4
Test-Parameters: testlist=replay-dual fstype=zfs mdscount=2 mdtcount=4
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I2e3d3d4d1e5e33ffbb5c953edb21bcae884022c3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52343
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17174 misc: fix hash functions 11/52611/4
Alexey Lyashkov [Tue, 10 Oct 2023 08:38:21 +0000 (11:38 +0300)]
LU-17174 misc: fix hash functions

1) LU-16518 landing caused a bug which visible with debug kernel

UBSAN: Undefined behaviour in include/linux/hash.h:81:31
shift exponent 64 is too large for 64-bit type
'long long unsigned int'
Call Trace:
dump_stack+0x8e/0xd0
ubsan_epilogue+0x5/0x21
ldlm_export_lock_hash+0x49/0x4d [ptlrpc]
cfs_hash_bd_from_key+0x88/0x2e0 [libcfs]

2) use a high bits unstead of low as it more accurate.

HPe-bug-id: LUS-11925
Fixes: 239e8268 (LU-16518 misc: use fixed hash code)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie1c531ad220f44e55fbf80674a49472fb6024252
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
6 months agoRevert "LU-17131 ldiskfs: el9.2 encdata and filename-encode" 69/53069/3
Andreas Dilger [Fri, 10 Nov 2023 04:54:35 +0000 (04:54 +0000)]
Revert "LU-17131 ldiskfs: el9.2 encdata and filename-encode"

This reverts commit b0cc96a1ff516f79f26be32945a237ef8373e408
as it is likely causing ldiskfs to crash immediately at mount:

 LDISKFS-fs (dm-0): mounted filesystem with ordered data mode. Quota mode: journalled.
 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: error_code(0x0000) - not-present page
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 0 PID: 7148 Comm: mkfs.lustre  5.14.0-284.30.1_lustre.el9.x86_64 #1
 Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
 RIP: 0010:__ldiskfs_find_entry+0xab/0x440 [ldiskfs]
 Call Trace:
  ldiskfs_lookup.part.0+0x6c/0x2c0 [ldiskfs]
  __lookup_hash+0x70/0xa0
  __filename_create+0x87/0x150
  do_mkdirat+0x4b/0x160
  __x64_sys_mkdir+0x48/0x70

Change-Id: Idc8448c9e6d2300bc5eccb6ea190252eaaca9f75
Test-Parameters: trivial
Test-Parameters: serverdistro=el9.2 testlist=sanity
Test-Parameters: serverdistro=el9.2 testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53069
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17257 build: use pkg-config to find krb5 libdir 10/53010/3
Jian Yu [Tue, 7 Nov 2023 18:18:32 +0000 (10:18 -0800)]
LU-17257 build: use pkg-config to find krb5 libdir

This patch fixes kerberos5.m4 to use pkg-config to
find krb5 libdir instead of looking for the krb5
libraries in a static list of path.

Test-Parameters: trivial kerberos=true testlist=sanity-krb5

Change-Id: Ia15812932942171b019f3e73034a78f9185c16ce
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53010
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17263 utils: 'lfs find -blocks' to use 512-byte units 93/52993/3
Andreas Dilger [Sun, 5 Nov 2023 05:32:19 +0000 (23:32 -0600)]
LU-17263 utils: 'lfs find -blocks' to use 512-byte units

Change the default units for 'lfs find -blocks' from 1KiB blocks
to 512-byte blocks to better match the behavior of find(1).  This
also matches what "-printf %b" will print.

Change llapi_parse_size() to accept a 'c' argument to specify
characters, and accept a "B" or "iB" suffix if provided.

Fixes: c043f46025 ("LU-10705 utils: add "lfs find --blocks"")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8345f15bf53912501cadc0fa7f981a9f787b767
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17251 osp: force precreate if create_count grows 68/52968/6
Andreas Dilger [Fri, 3 Nov 2023 00:32:44 +0000 (18:32 -0600)]
LU-17251 osp: force precreate if create_count grows

Force the MDS to precreate OST objects if "osp.*.create_count" is
written and the OSP does not have at least that many precreated
objects locally.  This avoids doing complex operations in test
scripts to force precreation to run, which slows down the tests
and increases the chance that a test might fail.

Previously opd_precreate_force was only used for handling OSTs
that were reformatted and this reset "create_count" to minimum, so
move that to the reformat case rather than in the precreate code
path so it does not reset "create_count" when it was just set.

Remove the "env" argument from several precreate-related functions,
since it wasn't used in those functions, and that made it difficult
to call them from the "create_count" parameter handling.

Test-Parameters: testlist=parallel-scale env=ONLY=test_rr_alloc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iac35c1b981fcd6ab2d1ea5abc9ffe2e4563ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52968
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
6 months agoLU-17000 coverity: Fix Logically dead code under lnetctl.c 21/52921/6
Arshad Hussain [Wed, 1 Nov 2023 10:44:08 +0000 (16:14 +0530)]
LU-17000 coverity: Fix Logically dead code under lnetctl.c

This patch fixes Logically dead code reported by
coverity run. This uncovers the missing call
to lustre_lnet_list_peer() to list peer under
old API.

CoverityID: 404746 ("Logically dead code")

Test-Parameters: trivial testlist=sanity-lnet
Fixes: f0be00678c ("LU-9680 lnet: collect data about peer_ni by using Netlink")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0659ce403110118697fb8c88ade70f1695509382
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52921
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17000 coverity: Fix Dereference before null under obd_sysfs.c 03/52903/2
Arshad Hussain [Tue, 31 Oct 2023 11:14:49 +0000 (16:44 +0530)]
LU-17000 coverity: Fix Dereference before null under obd_sysfs.c

This patch fixes Dereference before null check reported
by coverity run.

CoverityID: 404751 ("Dereference before null")

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I89bcc820244ab17a60bf1d5c86f9d6a8747b43ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52903
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17000 coverity: Fix Resource Leak(3) 02/52902/2
Arshad Hussain [Tue, 31 Oct 2023 10:24:27 +0000 (15:54 +0530)]
LU-17000 coverity: Fix Resource Leak(3)

This patch fixes error reported by coverity run.

CoverityID: 404744 ("Resource leak")

Test-Parameters: trivial testlist=sanity,conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ib5a22dd09870fe43a36047e407d1dd57944c9413
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16868 tests: skip conf-sanity/66 in interop 99/52899/2
Andreas Dilger [Tue, 31 Oct 2023 07:36:49 +0000 (01:36 -0600)]
LU-16868 tests: skip conf-sanity/66 in interop

Do not run conf-sanity.sh test_66* in interop testing.  Otherwise,
it is possible that the version of the test script running on the
client does not perform the upgrades with the right steps needed
for remote servers that are running a different version.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=66
Test-Parameters: testlist=conf-sanity env=ONLY=66 serverversion=2.12.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7b28b5f123a7348f87d43c54c806eaf6173ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52899
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17221 kernel: update SLES15 SP4 [5.14.21-150400.24.92.1] 20/52820/2
Jian Yu [Tue, 24 Oct 2023 19:25:39 +0000 (12:25 -0700)]
LU-17221 kernel: update SLES15 SP4 [5.14.21-150400.24.92.1]

Update SLES15 SP4 kernel to 5.14.21-150400.24.92.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp4 testlist=sanity

Change-Id: Id82d0ce48179df1f12dc367cced8cf84e1b918d9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52820
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16796 mgc: Change config_llog_data to use refcount_t 13/52813/6
Arshad Hussain [Tue, 24 Oct 2023 11:44:15 +0000 (17:14 +0530)]
LU-16796 mgc: Change config_llog_data to use refcount_t

This patch changes struct config_llog_data to use refcount_t
instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieec4de5d957b8dfa82c8cdef80f3a9f73aa55126
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52813
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
6 months agoLU-9859 libcfs: refactor libcfs initialization. 00/52700/6
Mr. NeilBrown [Thu, 9 Nov 2023 02:15:09 +0000 (21:15 -0500)]
LU-9859 libcfs: refactor libcfs initialization.

Many lustre modules depend on libcfs having initialized
properly, but do not explicit check that it did.
When lustre is built as discrete modules, this does not
cause a problem because if the libcfs module fails
initialization, the other modules don't even get loaded.

When lustre is compiled into the kernel, all module_init()
routines get run, so they need to check the required initialization
succeeded.

This patch splits out the initialization of libcfs into a new
libcfs_setup(), and has all modules call that.

The misc_register() call is kept separate as it does not allocate any
resources and if it fails, it fails hard - no point in retrying.
Other set-up allocates resources and so is best delayed until they
are needed, and can be worth retrying.

Ideally, the initialization would happen at mount time (or similar)
rather than at load time.  Doing this requires each module to
check dependencies when they are activated rather than when
they are loaded.  Achieving that is a much larger job that would
have to progress in stages.

For now, this change ensures that if some initialization in libcfs
fails, other modules will fail-safe.

Linux-commit: 64bf0b1a079d61e9e059b9dc7a58e064c7d994ae

Change-Id: I6b5ecdba0defc6e033f78d8fc2b9be9e26c7f720
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52700
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17015 gss: avoid request replay 89/52689/9
Sebastien Buisson [Fri, 13 Oct 2023 15:19:16 +0000 (17:19 +0200)]
LU-17015 gss: avoid request replay

Lustre's upcall cache has a retry mechanism in case the upcall was
interrupted or failed and we timed out waiting. In this case we do our
best to retry and do the upcall again.
But when the upcall cache is used for GSS contexts, the upcall cannot
be done twice with same data. The GSSAPI implements security measures
that forbids that kind of request replay, to prevent man-in-the-middle
attacks for instance.

Add a new uc_acquire_replay field to struct upcall_cache, so that
upcall cache users can tell if acquire upcall can be replayed.
For identity upcall, this replay is fine. But for GSS contexts we need
to avoid those replays.
And bump upcall cache timeout value from 20s to 30s for GSS context
init requests.

Also add more debug messages to gss code for both client and server
sides, and both kernel and userspace.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I56decc83a4f0d21be420e87cb0417826011932af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16518 lod: fix new clang error in lod_lov.c 66/52566/5
Timothy Day [Tue, 3 Oct 2023 15:59:30 +0000 (15:59 +0000)]
LU-16518 lod: fix new clang error in lod_lov.c

The variable hsmsize was uninitialized in some
situations. By moving the initialization earlier,
we can avoid this.

Fixes: aebb405e32e ("LU-10499 pcc: use foreign layout for PCCRO")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I3385e3349ad00d037b8b94337cb3352623d0a40a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52566
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-17054 lnet: Change cpt-of-nid to get result from kernel 02/52502/5
Chris Horn [Tue, 29 Aug 2023 16:46:13 +0000 (10:46 -0600)]
LU-17054 lnet: Change cpt-of-nid to get result from kernel

The lnetctl cpt-of-nid command leverages a userspace implementation
of the kernel hash_long() function to compute the CPT for a given
NID. However, the kernel hash_long() function has changed over time
such that the userspace version may give a different result than the
kernel version. Since Lustre supports such a wide range of kernels
we cannot simply update the userspace implementation of hash_long() to
match newer kernel.

Address this by re-implementing lnetctl cpt-of-nid to call into kernel
space to compute the CPT and return the result to userspace.

lnetctl cpt-of-nid now works with extended NIDs (e.g., IPv6).

lnetctl cpt-of-nid no longer accepts the --ncpt argument because the
kernel functions for computing the cpt do not support this.

lnetctl cpt-of-nid no longer accepts the --nid argument. Instead, the
command now takes a space separated list of nids.

Example:
$ lnetctl cpt-of-nid 867@kfi 5.3.0.9@tcp
cpt-of-nid:
- nid: 867@kfi
  cpt: 0
- nid: 5.3.0.9@tcp
  cpt: 1
$

Because the old implementation could return a wrong result it is
completely removed.

HPE-bug-id: LUS-11785
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I7c2bc48c5c0da7da8a4425d319c0b99207814ae1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17075 osd: destroy declare shouldn't panic 96/52496/7
Alex Zhuravlev [Mon, 25 Sep 2023 10:11:04 +0000 (13:11 +0300)]
LU-17075 osd: destroy declare shouldn't panic

if the object doesn't exist during declaration.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7d42cad0c04e7941a2f7950fdddaf7c473998b12
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52496
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
6 months agoLU-17120 build: Remove deprecated option from dkms.conf 92/52392/3
Shaun Tancheff [Fri, 15 Sep 2023 15:45:37 +0000 (10:45 -0500)]
LU-17120 build: Remove deprecated option from dkms.conf

dkms-commit: 7114c62aa7ead0036b2c3dc9bac8eac482ef2b20
dkms-change: https://github.com/dell/dkms/commit/7114c62aa7ead0036b2c3dc9bac8eac482ef2b20
  Deprecated feature: --no-initrd
  Deprecated feature: REMAKE_INITRD

In dkms.mkconf REMAKE_INITRD="no" is a deprecated option.

It should be removed.
This also breaks installation with some version of dkms.

Test-Parameters: trivial
HPE-bug-id: LUS-11846
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I628193b6b9920fed6037b31ef2344d37d8a85bd7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52392
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-17015 gss: support large kerberos token for rpc sec ctxt 05/52305/23
Sebastien Buisson [Thu, 7 Sep 2023 07:33:36 +0000 (09:33 +0200)]
LU-17015 gss: support large kerberos token for rpc sec ctxt

If the current Kerberos setup is using large token, like when PAC
feature is enabled for Kerberos, authentication can fail due to server
side unable to exchange token between kernel and userspace.
This limitation is inherent to the sunrpc cache mechanism, that can
only handle tokens up to PAGE_SIZE.

For RPC sec context phase, use Lustre's upcall cache mechanism
instead of deprecated kernel's sunrpc cache. Note this phase does not
involve a proper upcall, only the downcall part is relevant to
populate the context computed in userspace.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I94e945a99cab60d5b6a4c40076c40fffede217ab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52305
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-8191 liblustre: add missing functions to header 34/51434/7
Timothy Day [Sat, 24 Jun 2023 18:48:31 +0000 (18:48 +0000)]
LU-8191 liblustre: add missing functions to header

A number of functions were missing from lustreapi.h,
causing them to be marked incorrectly as functions that
could be made static. They have been added to the
header so applications can use them.

Static analysis shows that a number of functions
could be made static. This patch also declares
several functions in liblustre static.

liblustreapi_nodemap.c and liblustreapi_ioctl.c were
missing an internal header, causing some functions
to be incorrectly flagged. This patch also adds that
header.

Initialize a previously uninitialized variable in
llapi_fid_to_handle().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I67b9c59418b62602ffe36eb4284eb1e8d4a3b19b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-16901 utils: l_getidentity with nss module support 29/51329/10
Shaun Tancheff [Sat, 23 Sep 2023 17:51:00 +0000 (12:51 -0500)]
LU-16901 utils: l_getidentity with nss module support

Enable l_getidenity to fetch user's supplementary groups
info from NIS, LDAP and/or any other services that NSS modules
exist for.

Add support for local lustre specific users in
  /etc/lustre/passwd
  /etc/lustre/group

Specify the list of modules to be searched, in-order which
allows lookup options to skip group or user searches for
particular user(s) and group(s).

To enable this feature add "lookup <mod1> <mod2> ... <modN>"
as the first line to:
  /etc/lustre/perm.conf

An example usage:
[/etc/lustre/perm.conf]
lookup lustre ldap

[/etc/lustre/passwd]
root:x:0:0:root:/root:/bin/bash

[/etc/lustre/group]
root:x:0:root

The special users in /etc/lustre do not incur the
expense of ldap queries.
Other special users like nobody, anon, etc. may be
useful to have on a cluster but not present in ldap, nis, ...

HPE-bug-id: MRP-1137, MRP-2132, LUS-2503, LUS-2453
Test-Parameters: trivial
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I55387e1df08bf2786ab78740403a1daf5a718d64
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 months agoLU-16802 build: compatibility for 6.4 kernels 75/50875/14
Shaun Tancheff [Fri, 27 Oct 2023 06:21:13 +0000 (01:21 -0500)]
LU-16802 build: compatibility for 6.4 kernels

linux kernel v6.3-rc4-32-g6eb203e1a868
  iov_iter: remove iov_iter_iovec()

Provide a replacement iov_iter_iovec() when one is not provided.

linux kernel v6.3-rc4-34-g747b1f65d39a
  iov_iter: overlay struct iovec and ubuf/len

This renames iov_iter member iov to __iov and provides the
iov_iter() accessor.
Define __iov as iov when __iov not present.
Provide an iov_iter() for older kernels.

linux kernel v6.3-rc1-13-g1aaba11da9aa
  driver core: class: remove module * from class_create()

Provide an ll_class_create() to pass THIS_MODULE, or not,
as needed by class_create().

Linux commit v6.2-rc1-20-gf861646a6562
  quota: port to mnt_idmap

Update osd_dquot_transfer to use mnt_idmap and fallback
to user_ns, if needed, by dquot_transfer.

Linux commit v6.3-rc7-2433-gcf64b9bce950
  SUNRPC: return proper error from get_expiry()

Updated get_expiry() requires a time64_t pointer to be passed
to hold the expiry time. A non-zero return value indicates an
error, nominally -EINVAL. Provide a wrapper for kernels that
return a time64_t and return -EINVAL on error.

Test-Parameters: trivial
HPE-bug-id: LUS-11614
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I765d6257eec8b5a9bf1bd5947f03370eb9df1625
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50875
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
6 months agoLU-17156 tests: Improve zfs_or_rotational() 73/52973/8
Arshad Hussain [Fri, 3 Nov 2023 09:06:03 +0000 (14:36 +0530)]
LU-17156 tests: Improve zfs_or_rotational()

Improve zfs_or_rotational() under test-framework.sh to handle
get_param failure gracefully and not throw bash syntax error.

Fix ostname_from_index() to print the OST name once instead of
twice if there are multiple mountpoints (e.g. sanityn).  If the
caller wants the specific name when there are two different
filesystems mounted, the specific mountpoint should be given.

Test-Parameters: trivial testlist=sanityn
Fixes: 43c3a804fe ("LU-13805 tests: Add racing tests of BIO, DIO")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0b914236865574dadd4ba0cb9a0ba7a7775fefc5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52973
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-10391 lnet: filter out white spaces 20/53020/3
James Simmons [Tue, 7 Nov 2023 16:39:52 +0000 (11:39 -0500)]
LU-10391 lnet: filter out white spaces

For the libyaml library two methods exist to construct an internal
YAML document. One is with the creation of yaml_event_t and submitting
it, yaml_emitter_emit(), to the emitter. The second method is using
some source like a file. In both cases the input is processed and
placed into an internal buffer which is passed to the read handler,
yaml_netlink_read_handler(). This buffer ends up looking in the
raw text of the configuration file passed and this includes all
the various whitespaces. Due to an internal processing bug both
creation methods don't yeild the same exact internal buffer
contents. In the sequence case for sources from a file will
contain extra white spacing. Our current Netlink implement
doesn't filter off that extra white spacing so the values packed
into the Netlink pack contains leading white spaces which is
seen as an error. The fix is to skip those extra white space if
they exist.

Change-Id: I7445ffb486d6d39c681ab4e5a85e0b835509c9ee
Test-Parameters: trivial testlist=sanity-lnet
Fixes: 70149f4ea89 ("LU-9680 utils: fix Netlink / YAML library handling")
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53020
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 months agoLU-10391 lnet: hops -1 is valid 19/53019/2
James Simmons [Tue, 7 Nov 2023 16:30:02 +0000 (11:30 -0500)]
LU-10391 lnet: hops -1 is valid

For route setup a hops value of -1 is valid. We were assuming
userland would never send a -1 which is wrong.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 6557cd4b8c8 ("LU-10391 lnet: migrate router management to Netlink")
Change-Id: I616334fccfe3aba6409f1a856c62cf02d07782a9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53019
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10391 lnet: use lnet_ni_get_status_locked for lnet_net_show_dump 16/53016/2
James Simmons [Tue, 7 Nov 2023 15:57:46 +0000 (10:57 -0500)]
LU-10391 lnet: use lnet_ni_get_status_locked for lnet_net_show_dump

In my testing of IPv6 I was always seeing the NI state as "down".
This is incorrect and I found this was due to reading ni->ni_status
directly. Using lnet_ni_get_status_locked() fixes the issue.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8f8f6e2f36e ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Change-Id: I490144ceae4a5c1cdd7c920661f8220033df8cd5
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-10391 lnet: support setting LND timeouts 13/53013/4
James Simmons [Thu, 9 Nov 2023 19:54:43 +0000 (14:54 -0500)]
LU-10391 lnet: support setting LND timeouts

The patch that added support for NI setup with Netlink was
developed before individual LND timeout settings support was
merged. Add this missing settings. For ksocklnd we already
supported conns_per_peer so rearrange the code into a switch
statement.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 8f8f6e2f36e ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Change-Id: Iba955da7f5fa78b8a624bab6af66b577c75917e0
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53013
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 months agoLU-17142 mgc: reconnection without pinger 98/52498/5
Alexander Boyko [Tue, 22 Aug 2023 09:53:14 +0000 (05:53 -0400)]
LU-17142 mgc: reconnection without pinger

When MGS was offline for some time, AT is increased and
connection request deadline is high. Reconnect with a pinger
waits a request deadline for a next attempt. A situation is
worse with a failover partner, when different connections are used.
Reconnection could fail with local MGS too.

Here is the error when MGC could not connect to a local MGS, MDT
combined with MGS.

LustreError: 15c-8: MGC90@kfi:
Confguration from log kjlmo12-MDT0000 failed from MGS -5.

The patch forces reconnection with import invalidate and aborts
inflight requests.

ptlrpc_recover_import() aborts waiting for disconnect import state.
But disconnect happens between connection attempt and it is valid.
This is fixed.

Reset Adaptive Timeout when local MGS starts. It allows MGC to
reconnect efficiently.

mgs_barrier_gl_interpret_reply() should handle EINVAL from a client,
it means client don't have a lock.

HPE-bug-id: LUS-11633
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ie631e04fb3e72900af076cf7f268f20f7b285445
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52498
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>