Whamcloud - gitweb
fs/lustre-release.git
3 months agoLU-17414 lnet: Use POSIX error number for libnetconfig 57/53657/3
Arshad Hussain [Thu, 11 Jan 2024 09:35:02 +0000 (15:05 +0530)]
LU-17414 lnet: Use POSIX error number for libnetconfig

Currently liblnetconfig.c is returning custom define
LUSTRE_CFG_RC_* numbers which can be confusing to users.
This patch redefines LUSTRE_CFG_RC_* to use POSIX
error number to be consistent.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I585d1dfd80d07160e5cdeef784920414132bcaf8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17242 debug: use dump_stack() where possible 25/53625/2
Timothy Day [Tue, 9 Jan 2024 17:17:10 +0000 (17:17 +0000)]
LU-17242 debug: use dump_stack() where possible

In some cases, libcfs_debug_dumpstack() can fail to output a
stack trace - either because the needed symbols are not exported
or those symbols can't be resolved at runtime. This seems to
occur more often with newer kernels. The messages appears only
as:

 Lustre: ldlm_cb01_002: service thread pid 57876 was inactive for
   40.494 seconds. The thread might be hung, or it might only be
   slow and will resume later. Dumping the stack trace for
   debugging purposes:
 Pid: 57876, comm: ldlm_cb01_002 6.1.70 #1 SMP PREEMPT_DYNAMIC
   Thu Jan  4 18:52:41 UTC 2024
 Call Trace TBD:

with no stack trace (seen on CentOS 8.5 with ml 6.1.70).

For reference, the runtime symbol lookup was added and updated in:

 b49ce7a ("LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK")
 58ac9d3 ("LU-14099 build: Fix for unconfigured arch_stackwalk")

First, add a message when the symbol can't be resolved correctly.
This makes it much easier to understand why the stack trace is
missing.

Second, replace libcfs_debug_dumpstack(NULL) with dump_stack().
When the task_struct is NULL, libcfs uses the current
task_struct. This replicates the functionality of dump_stack().
Using dump_stack() is more reliable, more in line with kernel
style, and not likely to be un-exported in the future.

Finally, in lustre/osc/osc_object.c the stack isn't dumped since
there is already an LBUG().

There only remains one user of libcfs_debug_dumpstack() which
uses a task_struct other than current. This can be cleaned up
in a future patch.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I196c1da7e39b1a694c0cb67ecfaab58ab3e4662c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53625
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17400 uapi: Fix incorrect snamelen return value 24/53624/5
Josh Samuelson [Mon, 8 Jan 2024 19:03:52 +0000 (13:03 -0600)]
LU-17400 uapi: Fix incorrect snamelen return value

The sname char array is limited by the struct
changelog_rec.cr_namelen value and has no '\0' character allocated
to it, so strlen() will overrun the char array till it finds the next
'\0' char.

This issue can be seen on the client side when "lfs changelog"
is run and 08RENME record types are present.

Pointer arithmetic was used between sname and name to avoid the
GCC 11 warnings mentioned in 6331eadbd6.

Added Andreas's safety/range check code to changelog_rec_sname.

Fixes: 6331eadbd6 ("LU-15420 uapi: avoid gcc-11 -Werror=stringop-overread")
Signed-off-by: Josh Samuelson <josh@1up.unl.edu>
Change-Id: Ie0817dfdd1d02e06b9399e66f1affaadb9e156c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53624
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17000 lnet: don't assign unused return codes 08/53608/4
Arshad Hussain [Mon, 8 Jan 2024 09:02:24 +0000 (14:32 +0530)]
LU-17000 lnet: don't assign unused return codes

In lnet_peer_discovery() return from lnet_peer_ping_failed()
and lnet_peer_push_failed() is unused and return value of
former get quashed without getting used.

Remove rc assignment and cast function to void to make it
clear the return code can be ignored.

Test-Parameters: trivial
CoverityID: 412758 ("Unused Value")
CoverityID: 412759 ("Unused Value")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I02d5e883fc02814d5dbe307b78f028703023db52
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53608
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-9457 test: improve sanity 253 48/53548/3
Lai Siyao [Tue, 19 Dec 2023 08:24:07 +0000 (03:24 -0500)]
LU-9457 test: improve sanity 253

Improve sanity test_253: set high watermark to 50M, and fill OST with
fallocate.

Test-Parameters: trivial
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I85139d7fc0697d08c21bdb19432b40c8dab82ee9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53548
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16796 lnet: Change sfw_session to use refcount_t 38/53438/4
Arshad Hussain [Wed, 13 Dec 2023 09:20:47 +0000 (14:50 +0530)]
LU-16796 lnet: Change sfw_session to use refcount_t

This patch changes struct sfw_session to use
refcount_t instead of atomic_t

This patch also address checkpatch errors.

Test-Parameters: trivial testlist=lnet-selftest
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ifa77b8d9280756ce52c8f59d1d193a866f0ba8a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53438
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16796 ldlm: Change struct ldlm_resource to use refcount_t 16/53416/4
Arshad Hussain [Tue, 12 Dec 2023 07:16:19 +0000 (12:46 +0530)]
LU-16796 ldlm: Change struct ldlm_resource to use refcount_t

This patch changes struct ldlm_resource and
struct nrs_tbf_client to use refcount_t instead of atomic_t

This patch also only changes spaces to tabs which were close
to lines of code being changed.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic15f27bc6281725f00bddc465668f81291aad6ec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17354 osp: don't reset sequence client 06/53406/15
Alex Zhuravlev [Mon, 11 Dec 2023 15:15:40 +0000 (18:15 +0300)]
LU-17354 osp: don't reset sequence client

do not reset sequence client if sequence allocation returned an
error, instead try to to get sequence later upon reconnection.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie23b688e4f93651c4615d77a9686c44a150d3961
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53406
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-17000 contrib: script to prepare coverity builds 00/53400/4
Timothy Day [Sun, 10 Dec 2023 22:58:09 +0000 (22:58 +0000)]
LU-17000 contrib: script to prepare coverity builds

Add script 'coverity-run' to semi-automate running
and submitting Coverity builds for Lustre. This
should make it much easier to reproducibly submit
builds to Coverity - and serve as an example of
how the Coverity build process works. It should
also provide more transparency in how builds are
being prepared for Coverity.

Add a Vagrantfile for the Vagrant VM used during
the build process.

Update in-tree Documentation.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I050b10d9df0e4e4c1b8bcc91a3c296c11f27ffef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53400
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17229 tests: rely on IR for replay-dual 33 67/53267/2
Etienne AUJAMES [Tue, 28 Nov 2023 13:26:15 +0000 (14:26 +0100)]
LU-17229 tests: rely on IR for replay-dual 33

test 33 seems to fail with a combined MDT0000 and MGT.

This patch failover MDT0001 instead of MDT0000 to keep the IR working
on the MGS.

Test-Parameters: testlist=replay-dual env=ONLY="33",ONLY_REPEAT=50
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ibf317283b005c103c5f28b7343a808fd25f992a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53267
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17308 mgs: move pool_cmd check to the kernel 02/53202/12
Etienne AUJAMES [Tue, 21 Nov 2023 19:01:43 +0000 (20:01 +0100)]
LU-17308 mgs: move pool_cmd check to the kernel

Several checks for pool_cmd need to be done before touching the MGS
configuration.

e.g: the following case should be denied before adding a destroy
record in the MGS configurations:
 - The pool does not exist
 - The pool is not empty (OSTs still in the pool)

This work is done in userspace (check_pool_cmd) by checking the client
lov parameters for pools. But nothing guarantees those parameters to
be in sync. So, only the MGS configuration should be trusted for that.

This patch move those checks in the kernel. There are several reasons
for this:
 - It guarantees the pool configurations consistency even if an
   external tool is used.
 - For standalone MGS, it limits the overhead of reading the
   configuration several times.

This patch add a "-n|--nowait" option for pool_cmd to skip waiting
for pool updates on the clients. This is useful when doing a lot of
pool_cmd in a raw. And this avoids cancelling clients CONFIG lock
each times (because of mgc_requeue_timeout_min).

e.g:
  lctl pool_destroy -n lustre.old
  lctl pool_new -n lustre.test
  lctl pool_add -n lustre.test OST0001
  ...
  lctl pool_add lustre.test OST0010

check_pool_cmd_result() is modified to compute the client wait delay
with mgc_requeue_timeout_min.

Add a regression test "ost-pools 2f".

Test-Parameters: testlist=ost-pools
Test-Parameters: testlist=ost-pools
Test-Parameters: testlist=ost-pools env=ONLY=2f,ONLY_REPEAT=50
Test-Parameters: testlist=ost-pools env=ONLY=2f,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ifbc49b5667bf17253716052a7480114936c65149
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53202
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17173 tests: fix security related tests 12/53012/13
Sebastien Buisson [Mon, 13 Nov 2023 10:03:38 +0000 (11:03 +0100)]
LU-17173 tests: fix security related tests

Several cleanups required in security related tests.

In sanity-krb5, in order to get proper access to keyrings, use su -
instead of runas to initialize process more completely.
Also fix use of 'lfs flushctx', as some tests do not call it properly.
And in test_8, avoid waiting arbitrarily and change fail_loc to just
sleep once.

In sanity-krb5 and sanity-sec, fix parameters passed to
start_gss_daemons().

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4598ae5a7d28afbc39d7cc2d0afd1096d877d03b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-16566 sptlrpc: remove rq_sepol from ptlrpc_request 45/52845/9
Etienne AUJAMES [Thu, 26 Oct 2023 19:28:55 +0000 (21:28 +0200)]
LU-16566 sptlrpc: remove rq_sepol from ptlrpc_request

This patch remove rq_sepol from ptlrpc_request to reduce the memory
consumption on the servers.

rq_sepol field is 327 bytes long allocated for each request and this
is rarely used (it needs SELinux activated with the send_sepol
feature).

The patch store the SELinux policy status string in a separate object.
The pointer is stored in ptlrpc_sec->ps_sepol and protected by RCU
(mostly read-only, the SELinux policy should rarely change).

When the policy status needs to be packed in a request, we take a
reference to the current ps_sepol object and release it after the
packing. If the policy has changed in the meantime, the object used
will be free after.

A read operation is added to srpc_sepol parameter to return the
SELinux policy string cached in Lustre.

Test-Parameters: testlist=sanity-selinux env=ONLY=21,ONLY_REPEAT=50
Test-Parameters: testlist=sanity-selinux env=ONLY=21,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I80fb76c97885c4b2987eb7f91a9bfe6e0e6e6c70
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52845
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-17173 utils: cleanup lfs flushctx 04/52604/17
Sebastien Buisson [Mon, 13 Nov 2023 10:02:24 +0000 (11:02 +0100)]
LU-17173 utils: cleanup lfs flushctx

When lfs flushctx is called without mount points, build the list of
all mounts first, and then call the ioctl to flush associated
contexts. Otherwise fetching the mount points unfortunately refreshes
the contexts being flushed, because the mount points are being
accessed.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I75b9efe4c65ce66f5f692f9e49a28fde705d0140
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-17173 gss: user keys go to user keyring 71/52771/14
Sebastien Buisson [Fri, 20 Oct 2023 08:27:14 +0000 (10:27 +0200)]
LU-17173 gss: user keys go to user keyring

Keys for root, that are used for Lustre internal processing, are
stored in the session keyring. That way they can be found by all
Lustre processes in userspace and in the kernel.
For end user keys, it is better to store them in the user keyring.
This simplifies key management, makes them shared accross all user
sessions, and avoids unfortunate key leak if lfs flushctx is not
called at user logout.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibb3d326e89dcacc89e77eca76cdb773861d3a8a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17078 ldlm: do not spin up thread for local cancels 92/52192/5
Patrick Farrell [Thu, 31 Aug 2023 00:07:03 +0000 (20:07 -0400)]
LU-17078 ldlm: do not spin up thread for local cancels

When doing lockless IO on the client, the server is
responsible for taking LDLM locks for each IO.

Currently, the server sends these locks to a separate
thread for cancellation.  This behavior is necessary on the
client where a lock may protect a large number of cached
pages, so cancelling it in a user thread may introduce
unacceptable delays.  But the server doesn't have cached
pages, so it makes more sense for the server to do the
cancellation in the same thread.

We do this by not spinning up an ldlm_bl thread for
cancellations of local (server side only) locks.

This improves 4K DIO random read performance by about 9%.

Without patch, maximum server IOPs on 4K reads:
2864k IOPS

With patch:
3118k IOPS

This is the maximum performance achieved with many clients
and client threads doing 4K random AIO reads from different
files.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia996732780d278c5d0bc290c5484e3bc325a347a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-17029 lustre.spec.in: match rpm macro openEuler for openEuler Linux 54/51954/4
Xinliang Liu [Mon, 7 Aug 2023 10:18:49 +0000 (10:18 +0000)]
LU-17029 lustre.spec.in: match rpm macro openEuler for openEuler Linux

So that it can handle openEuler derived OSes, because different
derived OS has different vendor name, like KylinOS's vendor name
is Kylin.

Change-Id: I12ceda5bf9d1f17a75d4adddbad292fd1ae9967b
Test-Parameters: trivial
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17016 mdd: no EXDEV for parent dir projid mismatch 68/51868/17
Andreas Dilger [Fri, 4 Aug 2023 05:01:42 +0000 (23:01 -0600)]
LU-17016 mdd: no EXDEV for parent dir projid mismatch

Don't return EXDEV if the parent directory projid of a renamed
directory does not match the projid of the target dir.  Only the
projid of the source directory itself and the target matter.

Rename variables in mdd_rename_sanity_check() and mdd_rename()
so the object and attribute variable names are consistent.

Improve console error messages to contain more useful information.
Replace spaces with tabs in affected functions.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7aa53f6d168926719ad9fd5df3c760e6c73ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
3 months agoLU-17131 ldiskfs: Add Ubuntu 20.04.5 release 5.15 kernel 14/52414/6
Shaun Tancheff [Thu, 18 Jan 2024 04:30:34 +0000 (11:30 +0700)]
LU-17131 ldiskfs: Add Ubuntu 20.04.5 release 5.15 kernel

Add support for Ubuntu 20.04.5 5.15 kernel similar to el9.2
with updated patches:
    ext4-corrupted-inode-block-bitmaps-handling-patches.patch
    ext4-data-in-dirent.patch
    ext4-dont-check-before-replay.patch
    ext4-inode-version.patch
    ext4-mballoc-extra-checks.patch
    ext4-prealloc.patch
    ext4-filename-encode.patch

Tested with tag Ubuntu-hwe-5.15-5.15.0-91.101_20.04.1

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic1b4b0f25a9ac984186cf4f37b5a73d93af93ebd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52414
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17131 ldiskfs: Refresh ubuntu 5.11 server 13/52413/6
Shaun Tancheff [Sun, 14 Jan 2024 00:50:37 +0000 (17:50 -0700)]
LU-17131 ldiskfs: Refresh ubuntu 5.11 server

Refresh ext4-pdirop and ext4-delayed-iput,
Add
  ext4-filename-encode support
  ext4-add-periodic-superblock-update

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Icd066a4f507842312924f7c7818208d8f07c8c70
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52413
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-17383 statahead: quit statahead with a long time wait 35/53535/5
Qian Yingjin [Fri, 22 Dec 2023 09:16:07 +0000 (04:16 -0500)]
LU-17383 statahead: quit statahead with a long time wait

If the thread is not doing stat for more than a time threshold
(@sbi->ll_sa_timeout, 30 seconds by default) then it probably does
not care too much about performance, or is no longer using this
directory.
Quit the statahead thread with a long time wait in this case.

This patch also fixes defects reported by Coverity Scan for
Lustre.

Fixes: e10bf68d7c3 ("LU-14361 statahead: regularized fname statahead pattern")
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia7c478268fe12eeefa6dfae1b3c94451f010d1d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17426 mdt: relax same MDT file rename lock 26/53726/5
Lai Siyao [Tue, 16 Jan 2024 01:33:22 +0000 (20:33 -0500)]
LU-17426 mdt: relax same MDT file rename lock

Allow cross-directory rename of regular files (strictly, any
non-directory) on the same MDT without holding the BigFilesystemLock
(BFL), as file renames cannot change the directory hierarchy.

This should improve the performance for these rename operations, and
reduce contention between local MDT file renames in different parts of
the directory tree.

Add "mdt.*.enable_parallel_rename_crossdir" parameter to disable
cross-directory file renames if there is an issue with this change.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I511b392e46c46140cac6aa3ede02bfe793729f7f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53726
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-13906 build: Conditionally require kmod-zfs-devel 56/46356/13
Shaun Tancheff [Fri, 6 Oct 2023 08:03:09 +0000 (03:03 -0500)]
LU-13906 build: Conditionally require kmod-zfs-devel

Server with zfs support requires either kmod-zfs-devel
or a configure that points to the required headers and
library files via configure.

Here we check the configure arguments for '--with-zfs-obj='
if the zfs path is specified for configure the package
requirement is not needed.

Otherwise require the kmod-zfs-devel package and require
one of libzfs-devel, libzfs4-devel or libzfs5-devel

HPE-bug-id: LUS-9743, LUS-10363
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia12239ac7e3912ff50ec7c8e2ceb888862afbc34
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46356
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
3 months agoLU-17385 tests: sanity-lfsck 23d fix and enable 91/53591/4
Alexander Zarochentsev [Thu, 4 Jan 2024 19:07:18 +0000 (19:07 +0000)]
LU-17385 tests: sanity-lfsck 23d fix and enable

lfsck "-t layout -o" requests lfsck runs on all MTDs,
the test needs to wait them all before the next
test starts.

Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ida0bf876b60a73258a5a9bf392f96383c88adcb9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53591
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17444 utils: fix fd leak after conversion to llapi_root_path_open 36/53736/3
Dominique Martinet [Thu, 18 Jan 2024 20:46:10 +0000 (05:46 +0900)]
LU-17444 utils: fix fd leak after conversion to llapi_root_path_open

Conversions to llapi_root_path_open missed a few close() calls, leading
to fd leaks.

These should be obvious enough to regroup in a single commit.

Fixes: 7154244354e3 ("LU-16786 utils: Replace open call to WANT_FD")
Change-Id: I3af25ef2981367bfaea7f5280972f84bee09a5c2
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53736
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-17364 llite: don't use stale page. 50/53550/8
Alexey Lyashkov [Mon, 25 Dec 2023 11:52:35 +0000 (14:52 +0300)]
LU-17364 llite: don't use stale page.

using stale page for write might confuse a read path,
which expect any IO page have PG_uptodate flag set,
and it caused an panic with removing from IO.

Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ia01129ceaecf53d8d9f301c26cd2d65122f6a267
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53550
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14361 statahead: increase the initial statahead count 34/53634/2
Qian Yingjin [Wed, 10 Jan 2024 09:25:21 +0000 (04:25 -0500)]
LU-14361 statahead: increase the initial statahead count

In this patch, we increase the initial stat-ahead count from the
default 8 to 64 during the fname statahead pattern test sanity/
test_123i. The origial starting statahead count is too small, may
result in that the statahead thread quits wrongly. This will fail
sanity/test_123i fairly often.

We also imporve aheadmany and use it to generate the fname stat()
workload to verify that fname statahead pattern works correctly.

Test-Parameters: mdtcount=4 mdscount=2 testlist=sanity env=ONLY=123i,ONLY_REPEAT=100
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7d13120a9480ea5b2e53963789074429c414ff90
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53634
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17284 mdt: revalidate object for migration 87/53087/6
Alex Zhuravlev [Sat, 11 Nov 2023 12:21:48 +0000 (15:21 +0300)]
LU-17284 mdt: revalidate object for migration

if the source object is remote, then we should revlidate it
once the object's ldlm lock is granted. otherwise we can't
use the object's attributes:
lu_object_attr())
ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9896cdd011f858091ac68b50b74e2f1f027f7331
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-16637 llite: tolerate fresh page cache pages after truncate 54/53554/4
Andrew Perepechko [Tue, 26 Dec 2023 17:02:12 +0000 (20:02 +0300)]
LU-16637 llite: tolerate fresh page cache pages after truncate

Truncate called by ll_layout_refesh() can race with a fast read
or tiny write, which can add an uninitialized non-uptodate page
into the page cache.

We want to avoid expensive locking for this rare case so if there
is any leftover in the cache after truncate, just check that
the pages are not uptodate, not dirty and do not have any
filesystem-specific information attached to them.

Change-Id: I8cadc022a3d1822a585f32e1a765e59ad0ff434d
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11937
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53554
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13307 nodemap: have nodemap_add_member support large NIDs 35/53135/12
James Simmons [Sun, 7 Jan 2024 15:13:38 +0000 (08:13 -0700)]
LU-13307 nodemap: have nodemap_add_member support large NIDs

Currently when mounting lustre using IPv6 address fails with

Lustre: 27361:0:(nodemap_handler.c:395:nodemap_add_member())
  lustre-MDT0000: error adding to nodemap, no valid NIDs found
LustreError: 11-0: lustre-MDT0000-osp-MDT0003:
  operation mds_connect to node 0@lo failed: rc = -22

This was due to no nodemap being set so the ptlrpc layer was not
seeing any new peers. Adding minimal support to nodemap allows
mounting.

Change-Id: If9cfe88ec92afc3f14788f3f3ded8387a1b5d8c7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16861 obdfilter: Exclude quotes when getting NIDs 20/53620/9
Arshad Hussain [Tue, 9 Jan 2024 06:12:57 +0000 (11:42 +0530)]
LU-16861 obdfilter: Exclude quotes when getting NIDs

In get_targets(), when getting NIDs the quotes were also included.
Exclude quotes when generating NIDs as they are not required.

Use $LCTL instead of $lctl, and make it also work in Janitor testing.

Test-Parameters: trivial testlist=obdfilter-survey
Fixes: 9ef9906d7 ("LU-6863 tests: change obdfilter-survey.sh for CLIENTONLY mode")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8642539fc6b396f1339e20e4fef8bc78cda2d969
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17054 lnet: use GFP_KERNEL for alloc w/o spinlock 96/53596/5
Andreas Dilger [Thu, 4 Jan 2024 21:32:13 +0000 (14:32 -0700)]
LU-17054 lnet: use GFP_KERNEL for alloc w/o spinlock

Do not use genradix_ptr_alloc(GFP_ATOMIC) when not allocating
under a spinlock in lnet_cpt_of_nid_show_start(), since this
puts unnecessary strain on the atomic memory pools.  This
function grabs mutex_lock(&the_lnet.ln_api_mutex) so the caller
cannot be holding a spinlock at the time.

Fix minor code style issues in this function.

Fixes: 466e25a6a3 ("LU-17054 lnet: Change cpt-of-nid to get result from kernel")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I091959940bffadc380bff9329bb83e8b099ed63f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17394 libcfs: print cfs_fail_val when fail_loc hit 85/53585/3
Andreas Dilger [Thu, 4 Jan 2024 05:20:35 +0000 (22:20 -0700)]
LU-17394 libcfs: print cfs_fail_val when fail_loc hit

Add some more information to the console message when fail_loc is hit.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I99fe4524f3764b068c96965c0b86bd4d7b341707
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53585
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
3 months agoLU-17172 lov: include FID in some lov asserts 02/52602/3
John L. Hammond [Thu, 4 Nov 2021 16:12:57 +0000 (11:12 -0500)]
LU-17172 lov: include FID in some lov asserts

Include the file FID in the assertions in lov_entry() and
lov_mirror_entry(). Use these two functions more consistently in the
lov layer.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I65978fe409842289c158021fb1b8042916d90e23
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16791 utils: ZFS 2.2 const prop args 19/52519/7
Brian Atkinson [Tue, 26 Sep 2023 18:35:43 +0000 (12:35 -0600)]
LU-16791 utils: ZFS 2.2 const prop args

ZFS 2.2 now expects const char * from certain interfaces in
sys/nvpair.h. I updated the build system to detect if this is the case
and if so update the paramters passed to certain functions in
libmount_utils_zfs.c to account for these changes.

Without this patch, Lustre master would not build with ZFS master and
the 2.2 release candidates.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Change-Id: I0469eeff6dafa6c276fc616381530b6b679d9da1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52519
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 months agoLU-17351 ldiskfs: RHEL 9.3 ldiskfs server 94/53394/7
Shaun Tancheff [Thu, 4 Jan 2024 23:34:37 +0000 (15:34 -0800)]
LU-17351 ldiskfs: RHEL 9.3 ldiskfs server

Updated patch series for el9.3 needs an updated
ext4-data-in-dirent

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.2 testlist=sanity

Test-Parameters: trivial mdtcount=4 mdscount=2 \
  clientdistro=el9.2 serverdistro=el9.3 testlist=sanity

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-3

HPE-bug-id: LUS-12050
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Iac9731570422c57ef494602b1a40ac0b3d87d991
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53394
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-10391 lnet: use GFP_ATOMIC for alloc under spinlock 97/53597/5
Andreas Dilger [Thu, 4 Jan 2024 21:07:43 +0000 (14:07 -0700)]
LU-10391 lnet: use GFP_ATOMIC for alloc under spinlock

Use genradix_ptr_alloc(GFP_ATOMIC) when allocating under a spinlock
(in this case lnet_net_lock_current() is acquiring the per-CPT lock)
to avoid "BUG: sleeping function called from invalid context" in
lnet_discover() and lnet_scan_route() when memory debugging enabled.

 BUG: sleeping function called from invalid context at page_alloc.c:3423
 in_atomic(): 1, irqs_disabled(): 0, pid: 22268, name: lnetctl
 CPU: 3 PID: 22268 Comm: lnetctl  3.10.0-7.9-debug #1
 Call Trace:
   dump_stack+0x19/0x1b
   __might_sleep+0xd9/0x100
   __alloc_pages_nodemask+0x313/0xca0
   alloc_pages_current+0x98/0x110
   __get_free_pages+0xe/0x50
   __genradix_ptr_alloc+0xa2/0x1a0 [libcfs]
   lnet_discover+0x16e/0x220 [lnet]
   lnet_ping_cmd+0x6ab/0x1160 [lnet]
   genl_family_rcv_msg+0x1fa/0x420
   genl_rcv_msg+0x5b/0xc0
   netlink_rcv_skb+0xa9/0xc0
   genl_rcv+0x28/0x40
   netlink_unicast+0x16a/0x210
   netlink_sendmsg+0x308/0x420
   sock_sendmsg+0xb0/0xf0
   ___sys_sendmsg+0x401/0x410
   __sys_sendmsg+0x51/0x90
   SyS_sendmsg+0x12/0x20

Fixes: 68254c484a ("LU-10391 lnet: handle discovery with Netlink")
Fixes: 4ccac8297d ("LU-9680 lnet: collect data about routes using Netlink")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I96f1fd6f6273a7720d661526e58a94210f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53597
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17398 build: detect mlnx-ofa_kernel-devel contents 13/53613/6
Shaun Tancheff [Tue, 9 Jan 2024 05:49:13 +0000 (12:49 +0700)]
LU-17398 build: detect mlnx-ofa_kernel-devel contents

Parse the configure_args for --with-o2ib and allow the
user specified path to override the mofed defaults.

Further when mlnx-ofa_kernel-devel contents are available silence
the BuildRequires: to allow for an mlnx source installation to
satisfy the lustre build requirements.

In addition move the mlnx specfic requirements to the mofed lnd
when '--with multiple_lnds' is enabled.

Fixes: 67cd54d05d ("LU-16967 build: Separate lnet LND rpm packaging")
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I30c6b3a196021634c621f6f6c556bf32f28faeed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53613
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
4 months agoLU-17398 build: quash rpmbuild false warning 04/53604/5
Shaun Tancheff [Sat, 6 Jan 2024 15:13:27 +0000 (22:13 +0700)]
LU-17398 build: quash rpmbuild false warning

Rewrite comment so %() is escaped and does not generate
a false and misleaning warning during build:

 sh: -c: line 0: syntax error near unexpected token `)'
 sh: -c: line 0: `)'
 warning: Macro expanded in comment on line 261: %()

Test-Parameters: trivial
Fixes: 67cd54d05d ("LU-16967 build: Separate lnet LND rpm packaging")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I94fa0da88a5c7f64461cd8fc3cea343a7a087413
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 months agoLU-17370 utils: simplify lfs help text 64/53564/3
Alexandre Ioffe [Fri, 29 Dec 2023 05:42:41 +0000 (21:42 -0800)]
LU-17370 utils: simplify lfs help text

Simplify help text for lfs getstripe and lfs setstripe.
Update corresponding man pages lfs-getstripe and lfs-setstripe.
On man pages make left side adjustment and disable hyphenation:
'.nh', '.ad l' to prevent hyphenation of keywords

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: Iae9d3534230ee7d325fbeffd78b5c12632a4a161
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53564
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
4 months agoLU-17276 tests: performance test case for flock 58/53558/4
Yang Sheng [Wed, 27 Dec 2023 18:26:41 +0000 (02:26 +0800)]
LU-17276 tests: performance test case for flock

Add some test cases for flock performance.

Test-Parameters: trivial testlist=performance-sanity env=ONLY=6,ONLY_REPEAT=50
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Id76d38e14bec6a095fe26e22d08569731d4669c9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53558
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
4 months agoLU-16502 lutf: add headers to lutf C code 80/53480/3
Timothy Day [Fri, 15 Dec 2023 21:17:18 +0000 (21:17 +0000)]
LU-16502 lutf: add headers to lutf C code

Add SPDX text and documentation to lutf C code.
This will make it easier for developers to
find where different functionality lives.

Test-Parameters: @lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I67d8acc6b5968e76667130f38018ddcf0fcfd3b0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53480
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17289 test: disable sanity/test_906 temporarily 62/53362/5
Qian Yingjin [Thu, 7 Dec 2023 09:45:01 +0000 (04:45 -0500)]
LU-17289 test: disable sanity/test_906 temporarily

On the rhel9.3, the fio io_uring engine testing failed with error
"Operation not permitted" on both local file systems (Ext4 and
xfs) and Lustre:
"fio: pid=4551, err=1/file:engines/io_uring.c:1047,
func=io_queue_init, error=Operation not permitted"

This is a generic failure in rhel9.3.
Thus we disable sanity/test_906 temporarily until the bug is fixed
in rhel9.3.

Test-Parameters: trivial testlist=sanity clientdistro=el9.3
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3805b475c5f3d0b62dc6c57c4cd93f2bc1b67b76
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53362
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17160 build: Use PyConfig_InitPythonConfig 3.11 and later 58/52558/7
Shaun Tancheff [Sat, 11 Nov 2023 01:40:22 +0000 (19:40 -0600)]
LU-17160 build: Use PyConfig_InitPythonConfig 3.11 and later

Python PEP https://peps.python.org/pep-0587 changed the python
initialization scheme and deprecated Py_SetProgramName() among
other functions.

As of python 3.11 the new init scheme is available.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia7bdb63a74295ab8cf0313f16bfd03d78cf8fcf7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52558
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10391 mgs: handle strscpy return value properly 71/53571/3
James Simmons [Tue, 2 Jan 2024 00:11:35 +0000 (19:11 -0500)]
LU-10391 mgs: handle strscpy return value properly

The function strscpy() returns either an error code or the length
of the string copied. We need on success of strscpy in the
function mgs_steal_client_llog_handler() reset rc to zero to avoid
the return value been seen as an error. While we could use large
NIDs some functionality like nodemap was failing due to this
mishandling of the return code.

Fixes: c0cb747ebe9 ("LU-13306 mgs: use large NIDS in the nid table on the MGS")
Test-Parameters: trivial
Change-Id: I013d34e0d0223367efea97f71dd4baa1052e2e1b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53571
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-10003 lnet: implement Netlink version of lnet distance API. 56/53556/2
James Simmons [Sat, 23 Dec 2023 18:43:20 +0000 (13:43 -0500)]
LU-10003 lnet: implement Netlink version of lnet distance API.

Userland can query the distance of a peer using an ioctl. Move
this over to Netlink so we can support large NIDs for IPv6
handling.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I090538e4cc55fd26bd61888de659b99bba85a111
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53556
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-6142 build: ignore deleted lines for comment context 43/53543/3
Andreas Dilger [Sat, 23 Dec 2023 01:24:18 +0000 (18:24 -0700)]
LU-6142 build: ignore deleted lines for comment context

checkpatch should skip removed lines when checking comments context.
Otherwise, false "WARNING: memory barrier without comment" and other
messages can be reported when a comment exists on the previous line
but is hidden by the removed line.

For example, a change like below was previously incorrectly flagged:

        /* matched by smp_store_release() in some_function() */
 -      if (smp_load_acquire(&list->tail) == head))
 +      if (smp_load_acquire(&list->tail) == head) && flags == 0)

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib0cd11e66a5b6a3c4505222eb89ff6479246023a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53543
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17365 lod: handle llog errors gracefuly 10/53510/7
Mikhail Pershin [Wed, 13 Dec 2023 12:43:53 +0000 (15:43 +0300)]
LU-17365 lod: handle llog errors gracefuly

Distinguish remote llog errors by their source and type
in LOD lod_sub_prep_llog() and uniform errors reported
by llog_osd_read_header() and llog_init_handle.

- Partial llog header or 0-size llog is to be
  reinitialized, new header is created
- in llog_read_header() dt_attr_get() and read_header()
  thier errors are printed and returned as -EIO to caller
- llog with invalid llog header data is skipped and new
  one is created to be used instead. To indicate that
  the llog_init_handle() returns -EINVAL error code instead
  of -EIO. Therefore network errors are to be handled by
  lod_sub_recovery_thread() retry logic while bad llog
  content will lead to immediate llog re-creation.
- lod_sub_init_llogs() tries to init all targets even
  if some failed
- always recreate llogs after recovery abort no matter
  if ctxt->loc_handle exists or not

Patch tries to cover known issues and types of error during
update log recovery and provides also better debug for
similar cases in future

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2705e0dc245ed4365123ce47137193a9ed769673
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53510
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10391 tests: except sanity-lnet/253+254 due to failures 93/53593/3
James Simmons [Thu, 4 Jan 2024 20:02:08 +0000 (15:02 -0500)]
LU-10391 tests: except sanity-lnet/253+254 due to failures

Moving to Netlink for ping support has exposed a timing issues
with some of the sanity-lnet test.  Skip those tests temporarily
until the issue can be fixed.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Ia30fcbf8eca7baaf6f896e6e51229c69cea82804
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53593
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
4 months agoNew tag 2.15.60 2.15.60 v2_15_60
Oleg Drokin [Wed, 3 Jan 2024 18:17:29 +0000 (13:17 -0500)]
New tag 2.15.60

Change-Id: Iebcf2a80949d8d5d14a2e6aeb6582884b11d343f
Signed-off-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17215 tests: sanity/398q to use $tfile 72/52772/4
Alex Zhuravlev [Fri, 20 Oct 2023 11:23:26 +0000 (14:23 +0300)]
LU-17215 tests: sanity/398q to use $tfile

tfile seems to be a typo

Fixes: 43c3a804fe2 ("LU-13805 tests: Add racing tests of BIO, DIO")

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I85b0afce577b708ef9e69747774bd248484bd9dd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17374 gss: get rid of rsi cache entries after req handle 88/53488/3
Sebastien Buisson [Mon, 18 Dec 2023 13:59:30 +0000 (14:59 +0100)]
LU-17374 gss: get rid of rsi cache entries after req handle

RPCSEC init requests are kept in the rsi cache. While this is useful
during request processing involving upcall/downcall with userspace,
rsi entries are never used again once RPCSEC init requests have been
handled completely.
And keeping entries in the rsi cache has some impact on authentication
speed. When a new RPCSEC init request is received, the first step is
to check if there is a valid matching entry in the cache. It is never
the case, except if an authentication request is replayed, but GSS
rejects that anyway.
So we spend time browsing a cache from which we expect no match. Even
if the upcall cache mechanism takes this lookup opportunity to remove
invalid or expired entries, it is even better to remove cache entries
as soon as we know they are done.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ia9946578c3d3149e6235d832df28214ae8984f1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53488
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17366 kernel: update SLES15 SP5 [5.14.21-150500.55.39.1] 67/53467/2
Jian Yu [Thu, 14 Dec 2023 18:35:00 +0000 (10:35 -0800)]
LU-17366 kernel: update SLES15 SP5 [5.14.21-150500.55.39.1]

Update SLES15 SP5 kernel to 5.14.21-150500.55.39.1 for Lustre client.

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=sles15sp5 testlist=sanity

Change-Id: Id9476e8726728b00d4079cdaf31b081f89190eb1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53467
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-8802 obd: str2dev is missing obd_device_unlock() 66/53466/2
Timothy Day [Thu, 14 Dec 2023 16:27:40 +0000 (16:27 +0000)]
LU-8802 obd: str2dev is missing obd_device_unlock()

class_str2dev() was missing an obd_device_unlock(). I haven't
seen any bugs related to this missing unlock. I suspect
the mount state machine avoids this. Add the unlock just
to be safe.

Fixes: c5e5060d950 ("LU-8802 obd: remove MAX_OBD_DEVICES")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7a813f9d4931a7a9979223bfde5efea07f1e5228
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53466
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16823 lustre: test if large nid is support 98/53398/2
James Simmons [Sun, 10 Dec 2023 14:50:43 +0000 (09:50 -0500)]
LU-16823 lustre: test if large nid is support

Update all LNetGetId() calls to use large NIDs if the connect
flags report large NID support. For the case of lmv_setup()
we update setting qos_rr_index, to avoid the thundering herd,
using nidhash().

Change-Id: I80fda9454f154e27fbc75abb1899c0ccca03097b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53398
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17352 utils: lljobstat can read dumped stats files 97/53397/6
Lei Feng [Sun, 10 Dec 2023 08:45:38 +0000 (16:45 +0800)]
LU-17352 utils: lljobstat can read dumped stats files

Improve lljobstat command to read dumped stats file.
Usually the file is generated by command:
  lctl get_param *.*.job_stats > all_job_stats.txt

Multiple files can be specified with multiple --statsfile
options. For example:
  lljobstat --statsfile=1.txt --statsfile=2.txt

Stats data from multiple files will be added up and
sorted. Then the top jobs will be listed.

Try to use CLoader to accelerate the YAML parsing.

Handle SIGINT and exit silently if lljobstat is in the loop
of reading system job_stats files periodically.

Fix a bug when the job_id is a pure number.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: Iee1ce69d2befb9d021e34effd4fc65a47297c1fb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16751 build: remove symlinks and add SPDX 96/53396/3
Timothy Day [Sat, 9 Dec 2023 21:54:41 +0000 (21:54 +0000)]
LU-16751 build: remove symlinks and add SPDX

Remove commit-msg and prepare-commit-msg links
in the build/ directory. Add proper headers and
SPDX to the remaining files in build/.

Test-Parameters: trivial
Fixes: 25c93758d6 ("LU-1199 build: Clean out the build directory")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I649ed60f5541be2832555efa2e0cf64cd1a5c67c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53396
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-13791 mdt: parameter to tune capabilities 38/53538/7
Andreas Dilger [Wed, 20 Dec 2023 18:45:50 +0000 (11:45 -0700)]
LU-13791 mdt: parameter to tune capabilities

Add mdt.*.enable_cap_mask to allow specific capabilities to
be enabled and disabled individually.

Fixes: f05edf8e2b ("LU-13791 sec: enable FS capabilities")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6fc0130a90693d673d8c2158e7e31c2de951553d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53538
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17317 gss: do not continue using expired reverse context 75/53375/4
Sebastien Buisson [Thu, 7 Dec 2023 17:07:09 +0000 (18:07 +0100)]
LU-17317 gss: do not continue using expired reverse context

In case a server uses an expired gss context to send a callback
request to a client, it might be that the associated context on
the client has already expired, and been purged from the cache.
This results in a GSS_S_NO_CONTEXT reply.
In this specific scenario, the server must mark its reverse context
as dead. This will lead to destruction of the expired context, and
creation of a new context suitable for further callback requests.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4af90cd70a3815851ec555ea85b49714c8da4202
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53375
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17245 utils: fix lfs error messages with multiple paths 42/52942/3
Thomas Bertschinger [Tue, 31 Oct 2023 19:59:18 +0000 (13:59 -0600)]
LU-17245 utils: fix lfs error messages with multiple paths

When using some lfs utilities (find, getstripe) with multiple paths,
if a subset of the paths has an error, for example due to a typo, the
error message produced can be misleading. For "getstripe" it refers to
the last file on the command line regardless of which file had the
error, and for "find" it prints out the right filename but uses
the error code from the last file on the command line.

This cleans up these error messages for "lfs find" and
"lfs getstripe".

This also adjusts "lfs setdirstripe" to continue for subsequent files
if it encounters an error for earlier files on the command line.

Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Fixes: bc500536b6dd ("LU-930 utils: fix 'lfs find' error message")
Fixes: 4affa48f676b ("LU-5170 utils: Continue on error when multiple files requested")
Fixes: a24f61532927 ("LU-11213 dne: add new dir hash type "space"")
Change-Id: I9cdd007912ffb4f6ebc31e422851977e49186ae7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52942
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16771 llite: add statfs_project tunable 72/52872/5
Stephane Thiell [Fri, 3 Nov 2023 04:24:26 +0000 (21:24 -0700)]
LU-16771 llite: add statfs_project tunable

Add a client tunable and mount option to turn off project-enabled
statfs() if needed, for example to speed up statfs() execution by
avoiding project quota check.

This new llite tunable statfs_project is set to 1 by default (feature
enabled). To turn statfs_project off:

    lctl set_param llite.*.statfs_project=0

Additionally, statfs_project can be disabled at mount time with:

   mount -t lustre -o nostatfs_project ...

Signed-off-by: Stephane Thiell <sthiell@stanford.edu>
Change-Id: I1c3eb27e66b1d05a1c713732dfe0a4d8f7af769f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52872
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17186 utils: replace gethostby*() with get*info() 32/52632/10
Jian Yu [Tue, 17 Oct 2023 18:06:39 +0000 (11:06 -0700)]
LU-17186 utils: replace gethostby*() with get*info()

This patch replaces the deprecated gethostbyname() and
gethostbyaddr() functions with getaddrinfo() and getnameinfo()
functions respectively.

The getaddrinfo() function combines the functionality provided by the
gethostbyname() and getservbyname() functions into a single interface,
but unlike the latter functions, getaddrinfo() is reentrant and allows
programs to eliminate IPv4-versus-IPv6 dependencies.

The getnameinfo() function is the inverse of getaddrinfo(): it
converts a socket address to a corresponding host and service, in a
protocol-independent manner. It combines the functionality of
gethostbyaddr() and getservbyport(), but unlike those functions,
getnameinfo() is reentrant and allows programs to eliminate
IPv4-versus-IPv6 dependencies.

Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iacb5583826cd2f7329455bc6cbb4477f9087f15a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52632
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14361 statahead: regularized fname statahead pattern 08/41308/40
Qian Yingjin [Mon, 10 Oct 2022 08:17:31 +0000 (04:17 -0400)]
LU-14361 statahead: regularized fname statahead pattern

Some applications do stat() calls under a directory within which
all children files have regularized file name:
- mdtest benchmark tool: mdtest.$rank.$i
- ML/AI with ingested data that have typically a format rule of
  the filename in the directory.

The most common format for regularized file name is that the
suffix part of the file name is number-indexing.
However, in the current statahead mechanism, the statahead is
populated by the order of the hash of the file name via readdir()
calls, not a kind of sorting order.

In this patch, we improve the statahead to prefetch attributes for
the files with regularized indexing file name via asynchronous
batching RPC.

This patch adds the support to do statahead for these kinds of
applications, which can be optimized, but without opendir()/
close() to start/stop statahead thread explicitly.

Instead, the statahead thread will stop and quit when found
that there was no acitivy for more than a certain time period
(i.e. 30 seconds).

Test-Parameters: mdtcount=4 mdscount=2 testlist=sanity env=ONLY=27p,ONLY_REPEAT=5
Test-Parameters: mdtcount=4 mdscount=2 testlist=sanity env=ONLY=27p,ONLY_REPEAT=5
Test-Parameters: mdtcount=4 mdscount=2 testlist=sanity env=ONLY=123f,ONLY_REPEAT=10
Test-Parameters: mdtcount=4 mdscount=2 testlist=sanity env=ONLY=123f,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ide11ec5a651ae74884ddbe1cecede4f5c961e38d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41308
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-9680 lnet: Empty route/peer table is not an error 66/53366/4
Chris Horn [Tue, 5 Dec 2023 07:38:12 +0000 (01:38 -0600)]
LU-9680 lnet: Empty route/peer table is not an error

lnetctl peer/route show command, without any other arguments,  should
not return an error if the peer/route tables are empty. If the user
specifies a particular peer/route to show, and that peer/route does
not exist then this is an error.

Modify the dumpit routines to check the netlink message length to
determine whether the user supplied any arguments to the show
commands, and use this information to return the proper status. Some
dead code was also removed from lnet_route_show_dump().

We also fix an issue with older kernels where non-zero return status
from old dumpit commands was not being returned correctly.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9a188c573b0f373052208dbea52ea56181719769
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-10391 lnet: update ping to handle multiple NIDs 61/49361/7
James Simmons [Mon, 18 Dec 2023 21:02:38 +0000 (16:02 -0500)]
LU-10391 lnet: update ping to handle multiple NIDs

The original lnet ping Netlink code was setup to handle the pre-MR
API. Update this code to handle the newer MR version of ping. This
unifies all ping handling under one system and it supports larger
NID handling for IPv6. The big change is now that we support
updating the key table we can report failed pings in a different
format than the successful pings.

Instead of using the Netlink API version flag test if the passed
in LNet processid is LNET_PID_LUSTRE to display a NID or PID for
the ping results. Also cleanup the memory for failed ping NID
list.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I77a0e313bf2b7035e501726068fd45bb3a118d06
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17332 ldiskfs: do no update superblock after journal destroy 99/53499/2
Li Dongyang [Tue, 19 Dec 2023 01:00:15 +0000 (12:00 +1100)]
LU-17332 ldiskfs: do no update superblock after journal destroy

Trying to start a transaction after journal destroy
during umount will lead to a crash.

This patch is adding the same checks from
041340404e LU-16982 ldiskfs: Fix crash after "umount -d -f /mnt/..."
for el9 series.

Change-Id: Ibb89e9f5104b0980a8d9543561ac643322e3724d
Fixes: e27a7b33d6 ("LU-16298 ldiskfs: Periodically write ldiskfs superblock")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53499
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17385 tests: always_except sanity-lfsck/24 44/53544/8
Andreas Dilger [Sat, 23 Dec 2023 07:19:02 +0000 (00:19 -0700)]
LU-17385 tests: always_except sanity-lfsck/24

Sanity test_24/26a started failing recently due to the landing of
new test_23d.  Disable test_23d for now to avoid tests failing, but
do not remove it so that it is possible to continue debugging it.
Add extra debugging to see why this is happening.

Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-lfsck mdscount=2 mdtcount=4
Fixes: 07e02a600e ("LU-16826 tests: lfsck to repair a dangling remote entry")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib6edf1d014ceb6b5d965aadc11272a88e8c001d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53544
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17358 lprocfs: make job_stats job_id valid yaml 24/53424/2
Nathaniel Clark [Tue, 12 Dec 2023 18:05:22 +0000 (13:05 -0500)]
LU-17358 lprocfs: make job_stats job_id valid yaml

Fix quoting job_id to account for leading '@' being reserved.

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ifce3edc9b636db2f059ab9960488972a152d2e7a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53424
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17349 tests: sanity-quota_81 decrease timeout 84/53384/3
Sergey Cheremencev [Sun, 3 Dec 2023 04:06:23 +0000 (07:06 +0300)]
LU-17349 tests: sanity-quota_81 decrease timeout

Decrease cfs fail timeout in sanity-quota_81 from 30
to 10 seconds to avoid soft lockup.

Fixes: 862f0baa7c21 ("LU-15097 quota: stop pool_recalc before killing pool")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I8630db7b3948b335fef5d5349f960f79cb877fc3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53384
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17347 debs: also move .ddeb files into debs/ 78/53378/2
Aurelien Degremont [Fri, 8 Dec 2023 12:34:09 +0000 (13:34 +0100)]
LU-17347 debs: also move .ddeb files into debs/

When building debian packages, the resulting packages are
moved into a 'debs/' subdir.

Don't miss the debug symbol packages 'dbgsym', which are
suffixed .ddeb.

Also add .buildinfo file.

Test-Parameters: trivial
Change-Id: I52d0bddfaafc67c4a2a2dbc786d7f320c0b979f8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17317 gss: no cache flush for rsi and rsc 77/53377/3
Sebastien Buisson [Tue, 5 Dec 2023 16:02:21 +0000 (17:02 +0100)]
LU-17317 gss: no cache flush for rsi and rsc

RPCSEC init and RPCSEC context caches hold gss-related information
of security contexts established between network peers. These cache
entries are tightly coupled with contexts handled in the sptlrpc layer
so they must not be purged directly. They are inserted into the cache
when sptlrpc security contexts are established, and removed when the
corresponding security contexts are destroyed.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Fixes: 8d828762d1 ("LU-17015 gss: support large kerberos token for rpc sec init")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I903f75a4b5229286fcaed3e9d96b5eee7f653f15
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53377
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9] 57/53357/2
Jian Yu [Thu, 7 Dec 2023 00:52:50 +0000 (16:52 -0800)]
LU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9]

Update RHEL 8.9 kernel to 4.18.0-513.9.1.el8_9.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.8 serverdistro=el8.9 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.9 \
testgroup=full-part-3

Change-Id: Ied0d2873974a3c8cc6e346373457c8ebc09740d6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53357
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17336 gss: fix __user pointer in rsi_upcall_seq_write 42/53342/3
Sebastien Buisson [Wed, 6 Dec 2023 08:15:18 +0000 (09:15 +0100)]
LU-17336 gss: fix __user pointer in rsi_upcall_seq_write

rsi_upcall_seq_write() uses sscanf to get the string passed from
userspace, but this needs to be copied to a kernel buffer first.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2ec875b7c6c158695857fe912ec1dd9f41ddc25d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17000 utils: Check return value of yaml_parser_initialize 31/53331/2
Arshad Hussain [Tue, 5 Dec 2023 08:54:42 +0000 (14:24 +0530)]
LU-17000 utils: Check return value of yaml_parser_initialize

This patch adds return value checks to function
yaml_parser_initialize() and fopen() under lustre_cfg.c
And funciton cYAML_build_tree() under cyaml.c

Test-Parameters: trivial
CoverityID: 410239 ("Unchecked return value")
CoverityID: 410238 ("Unchecked return value")
Fixes: 65062463 (LU-14359 hsm: support a flatter HSM archive format)
Fixes: 8961f2d8 (LU-4939 utils: allow configuration through yaml files)
Change-Id: I67a34adee3e4d25f97244487684a613426637a70
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53331
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16097 quota: fix Null pointer dereference 30/53330/3
Hongchao Zhang [Sun, 26 Nov 2023 11:56:43 +0000 (19:56 +0800)]
LU-16097 quota: fix Null pointer dereference

The "qbody" should be checked whether it is NULL or not.

CoverityID: 410242 ("Dereference after null check")

Fixes: 57ac32a2 ("LU-16097 quota: release preacquired quota when over limits")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Idab61f3ebac24307c6d5db0d42429914858d21cb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53330
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17000 lnet: Fix dereference after NULL under ksocknal_recv_hello 05/53305/3
Arshad Hussain [Fri, 1 Dec 2023 04:58:32 +0000 (23:58 -0500)]
LU-17000 lnet: Fix dereference after NULL under ksocknal_recv_hello

This patch fixes 'conn->ksnc_proto' which was
dereferenced under function ksocknal_recv_hello()
even though it could be NULL.

This patch also removes 'returns' in between
the function and replaces it with 'goto'.
Allowing exit from a single place.

CoverityID: 410244 ("Dereference after null check")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Fixes: cb5f92c0e (LU-10391 ksocklnd: use ksocknal_protocol v4 for IPv6)
Change-Id: I95196d481b537281ab8643f1ee6162db450bef20
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53305
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-7328 misc: no non-ASCII characters in commit messages 04/53304/2
Andreas Dilger [Thu, 30 Nov 2023 23:00:38 +0000 (16:00 -0700)]
LU-7328 misc: no non-ASCII characters in commit messages

Some commit messages have control characters, or fancy quotation
marks, or mdash hyphens or similar, and this messes up the display
of "git log" and other tools depending on the current locale and
character set used in the terminal.

Add a check into commit-msg to reject commit messages that have
non-ASCII characters. This does not apply to characters used in
the Signed-off-by: or similar fields that list people's names.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I99d0954a68f8a5391195553ebf4b69181b6991f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53304
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17334 lov: handle object created on newly added OST 35/53335/8
Andreas Dilger [Tue, 5 Dec 2023 20:45:44 +0000 (12:45 -0800)]
LU-17334 lov: handle object created on newly added OST

When a new OST is added to a filesystem without no_create,
then a new object created on the OST relatively quickly
after it is added to the filesystem, in particular because
the new OST would be preferred by QOS space balancing
due to lots of free space. However, it might take a few
seconds for the addition of the new OST to be propagated
across all of the clients, so there is a risk that the MDS
creates file object on OSTs that a client is not yet aware of,
which returns an error to the application immediately.

This patch fixes the issue by adding a loop in lsme_unpack()
that is waiting and retrying for some number of seconds for
the filesystem layout to be updated if either the
"loi->loi_ost_idx >= lov->desc.ld_tgt_count" or "!ltd"
condition is hit.

Change-Id: Idc29b8c66079afaea25428577daf51370fa2b084
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53335
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14651 build: fix build for el7.9 kernels 03/53503/2
Andrew Perepechko [Mon, 18 Dec 2023 18:19:26 +0000 (11:19 -0700)]
LU-14651 build: fix build for el7.9 kernels

Handle extra setattr_prepare() argument added in Linux 5.12 kernels
when building on older kernels.

HPE-bug-id: LUS-12059
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: Ie7fd1c4d51b7a9b086cfca0db941321cbcce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53503
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: James Simmons <jsimmons@infradead.org>
4 months agoLU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn 98/53298/2
Serguei Smirnov [Thu, 30 Nov 2023 18:55:11 +0000 (10:55 -0800)]
LU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn

There were examples in the field with RoCE setups which demonstrate
that CM_EVENT_UNREACHABLE may be received when connection is already
in ESTABLISHED state. This causes an assert in kiblnd_cm_callback to
fail.

Handle this in a more gracious manner: report the event as unexpected
and allow the flow to continue. If there are indeed issues on
the connection, it is expected to report transaction errors later
and get cleaned up without crashing the whole system.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If32166fe9fc59e025609c2035cb1c03d3bed22f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53298
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17251 osp: start OST object precreate earlier 45/53245/4
Andreas Dilger [Sun, 26 Nov 2023 05:58:09 +0000 (22:58 -0700)]
LU-17251 osp: start OST object precreate earlier

If the OST object precreate count gets large (usually due to high
MDT file create workload, but sometimes also forced during testing)
then send an OST_CREATE RPC sooner when the number of precreated
objects gets low.

Currently the MDS will wait until 1/2 of the precreated OST objects
are consumed, but if create_count = 10000, then this can put bursty
create workloads on the OST.  Instead, send an OST_CREATE RPC when
the precreate pool is at most 1024 objects below target, so that the
MDS keeps its precreated pool more full and the OST doesn't have to
create so many objects at once (which also locks object directories
for a longer time).

Don't set opd_force_creation=true when osp.*.create_count is set
larger, and instead rely on the improved precreate check to force
OST object creation to start sooner, as opd_force_creation=true
can cause the OSP precreation to stop completely in some cases.

Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Fixes: df5b4c0a8b ("LU-17251 osp: force precreate if create_count grows")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id2d12636d535485919ca5eec3adb18b1e6ce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53245
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16502 lutf: remove lutf_start.sh wrapper script 37/53237/2
Timothy Day [Fri, 24 Nov 2023 17:38:10 +0000 (17:38 +0000)]
LU-16502 lutf: remove lutf_start.sh wrapper script

Remove bash wrapper script lutf_start.sh. Set the environment
natively in Python. LUTF currently involves a number of nested
wrapper scripts. Hence, this patch aims to simplify LUTF.

It also makes it simplier to import this script into another
Python script, by providing a reusable function to set the
environment natively.

Test-Parameters: @lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I56d80c12f9e50f3f8de1668ffa04c855a9829601
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53237
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16502 python: improve support for virtual environments 09/53209/2
Timothy Day [Wed, 22 Nov 2023 17:56:17 +0000 (17:56 +0000)]
LU-16502 python: improve support for virtual environments

Python virutal environments make it easy to install the
correct Python packages isolated from the rest of the
system.

 https://docs.python.org/3/library/venv.html

.venv is added to .gitignore and a simple virtual environment
example has been added to the README.

This patch collects all of the requirements for various
scripts in the Lustre tree and consolidates them in a
top level requirements.txt. lu_object.py spacing was
fixed due to parsing errors.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I69d074e9ba50022817bd243fb82d004366ab6adf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53209
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16502 lutf: add proper config option and fix bugs 00/53200/3
Timothy Day [Tue, 21 Nov 2023 19:19:24 +0000 (19:19 +0000)]
LU-16502 lutf: add proper config option and fix bugs

LUTF did not have a proper configuration option. Since
no message was printed at configure time, this made it
hard to debug why LUTF was not being built.

Fix a few minor bugs in headers that prevented shared
libraries from being `import`ed by python.

Fix a small Clang error in liblutf_agent.c.

Test-Parameters: @lnet
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I6680b203bef08b7afa326a1cbe30c96b5c29e95c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53200
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17000 utils: fix leak in 'lfs find' error handling 82/53182/4
Arshad Hussain [Mon, 20 Nov 2023 07:43:07 +0000 (13:13 +0530)]
LU-17000 utils: fix leak in 'lfs find' error handling

Fix memory leak reported by Coverity in
setup_indexes() in case of errors during
OST UUID initialization.

CoverityID: 397693 ("Resource leak")

Test-Parameters: trivial testlist=sanity,conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Fixes: 05334b90a5d ("LU-16331 utils: fix 'lfs find -O <uuid>' with gaps")
Change-Id: Ibfd10cebaf3198ae2e9bb35686be420e4cd0050b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53182
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14361 tests: add aheadmany to .gitignore 73/53173/2
Timothy Day [Fri, 17 Nov 2023 18:14:15 +0000 (18:14 +0000)]
LU-14361 tests: add aheadmany to .gitignore

Add aheadmany to .gitignore.

Fixes: 5317f8a ("LU-14361 statahead: Add test for statahead advise")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1003200b7ed34e90d2aa0f75cb4c4f071eaeea04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17261 lov: ignore broken components 96/52996/5
Alex Zhuravlev [Sun, 5 Nov 2023 13:51:29 +0000 (16:51 +0300)]
LU-17261 lov: ignore broken components

if some component of a mirrored file is broken, it makes sense
to try another (possible valid) replica rather than give up
immediately.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I32ea0efa90109f5159bf8b6c4e0efe1d543580c3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52996
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17263 utils: 'lfs find -size' to use 512-byte units 94/52994/6
Andreas Dilger [Sun, 5 Nov 2023 05:44:02 +0000 (23:44 -0600)]
LU-17263 utils: 'lfs find -size' to use 512-byte units

Change the 'lfs find -size' argument to 512-byte blocks by default if
no unit is given.  This better matches find(1) and avoids confusion
when converting "find" arguments to "lfs find".  Accept the 'c' suffix
like find(1) to specify a number of characters (bytes).

Most users/scripts will specify a unit, so it is expected not to cause
significant upset with this change.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3124e667acc06928f41a3d3006e1d9b4a43ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Anjus George <georgea@ornl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17137 utils: l_getidentity Coverity cleanup 15/53315/3
Shaun Tancheff [Mon, 4 Dec 2023 10:42:41 +0000 (04:42 -0600)]
LU-17137 utils: l_getidentity Coverity cleanup

Do not assign newmod a value past the end of the allocated
space. This can confuse coverity.

Instead only assign valid addresses (or NULL).

CoverityID: 410235 ("Memory - illegal access")

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I767ed1273ebfab68d634b3ff22b81a4621405dd2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53315
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-16762 statahead: wait until statahead thread quit 02/52302/3
Qian Yingjin [Thu, 7 Sep 2023 08:33:02 +0000 (04:33 -0400)]
LU-16762 statahead: wait until statahead thread quit

It must wait until statahead thread quit. After that, we can get
accurate hit/miss stats for stat() workload such as "ls -l".

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I902b299e039de6c584b386856fb3f7a8989eb73b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16796 lov: Change struct pool_desc to use kref 22/52122/5
Arshad Hussain [Sun, 27 Aug 2023 21:39:47 +0000 (03:09 +0530)]
LU-16796 lov: Change struct pool_desc to use kref

This patch changes struct pool_desc to use
kref(refcount_t) instead of atomic_t

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3829de190ec148c2e087f6a0262bf3bb76c196af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52122
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-16397 test: check quota setting on QSD 33/49533/5
Hongchao Zhang [Fri, 24 Nov 2023 08:33:10 +0000 (16:33 +0800)]
LU-16397 test: check quota setting on QSD

In some case, the quota setting at QMT could not be transfered to
QSD in time, which could cause the test to fail.
This patch adds check on QSD after setting the quota limit by LFS.

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Ia999317a36a0f97c1f66726cdc10e9edac3d8a53
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49533
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14810 lnet: Cancel discovery ping/push on shutdown 56/53356/2
Chris Horn [Tue, 5 Dec 2023 09:56:57 +0000 (03:56 -0600)]
LU-14810 lnet: Cancel discovery ping/push on shutdown

Discovery shutdown can race with ping and push events. In some cases
this can result in failing to unlink ping/push MDs on shutdown.
Protect against this by checking for PING/PUSH_FAILED state on peers
on the request queue.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=500,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I84a1f5beb6508651bc62e1dd93271f9e72f5081c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53356
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-14287 tests: Add 'fallocate' to racer 63/41663/23
Arshad Hussain [Fri, 1 Dec 2023 10:20:22 +0000 (15:50 +0530)]
LU-14287 tests: Add 'fallocate' to racer

This patch adds fallocate(file_fallocate.sh)
under racer runs

Test-Parameters: trivial testlist=racer
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I8e807bfc5c2b29dfb610a0b35e7083a9609517b0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-10391 lnet: Ping target corrupted 20/53320/3
Chris Horn [Mon, 4 Dec 2023 21:14:53 +0000 (15:14 -0600)]
LU-10391 lnet: Ping target corrupted

If NIs with large NIDs are added then the discovery ping target
can become corrupted. This is due to a typo in
len_ping_target_install_locked(). Instead of writing the NI status to
the lnet_ni_large_status::ns_status field, we were instead writing the
status value 8 bytes into the lnet_ni_large_status (ns_status is
offset 8 into lnet_ni_status). This overwrites some of the struct
lnet_nid.

Test-Parameters: trivial
Fixes: db0fb8f ("LU-10391 lnet: allow ping packet to contain large nids")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I7ccff7ae59feac5edc6a97a86c861ffbdb0bb333
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53320
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 months agoLU-17306 ofd: return error for reconnection 95/53195/5
Alexander Boyko [Thu, 16 Nov 2023 22:57:24 +0000 (17:57 -0500)]
LU-17306 ofd: return error for reconnection

During the cleanup orphan phase, reconnection leads to unsynchronized
last id between MDT and OST. This means that MDT could assign non
existing objects to a client for a file create operation.

ofd_create_hdl()) capstor-OST0087: dropping old orphan cleanup request
MDS LAST_ID [0x2540000400:0xb6941:0x0] (747841) is 352 behind OST
    LAST_ID [0x2540000400:0xb6aa1:0x0] (748193), trust the OST

recovery-small 144c reproduce bug where MDT lost synchronization
with OST.

Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request")
HPE-bug-id: LUS-11969
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I22c3d3b3db2acc9ad8f1b978b234afe7d3eef51d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-16826 tests: lfsck to repair a dangling remote entry 98/50998/9
Alexander Zarochentsev [Mon, 15 May 2023 17:47:39 +0000 (13:47 -0400)]
LU-16826 tests: lfsck to repair a dangling remote entry

Testing how lfsck repairs a dir entry to a non-existing
Lustre object.

HPE-bug-id: LUS-11609
Test-Parameters: trivial testlist=sanity-lfsck fstype=ldiskfs mdscount=2 mdtcount=4 env=ONLY=23d
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I88dc8be98bacd2a199facfe3569567de6a713ff6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50998
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 months agoLU-17048 mdd: protect layout change in MDD layer 46/52146/17
Bobi Jam [Mon, 28 Aug 2023 13:08:34 +0000 (21:08 +0800)]
LU-17048 mdd: protect layout change in MDD layer

We need to detect changes to the LOD layout in between transaction
declaration and when the objects are locked during transaction
execution. Otherwise, if another thread has modified the layout
of an object used by the transaction then the declaration may
be incorrect.

This patch save objects' layout generation in transaction delaration
phase, and check whether they have been changed by others in the
transaction execution phase, if that's the case, the transaction will
be retried for several times.

Fixes: b7bd4e3422 ("LU-14621 mdd: fix lock-tx order in mdd_xattr_merge()")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I25fe03c6e8fc4eebccc039e62dfc88db1179cb26
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-17312 tests: skip conf-sanity test_53 in interop 26/53226/2
Andreas Dilger [Fri, 24 Nov 2023 07:34:20 +0000 (00:34 -0700)]
LU-17312 tests: skip conf-sanity test_53 in interop

Skip conf-sanity test_53 in interop because older servers cannot
stop any running service threads above threads_max.

Remove old test interop for servers < 2.3.

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: testlist=conf-sanity env=ONLY=53 serverversion=2.12
Fixes: 183cb1e3cd ("LU-947 ptlrpc: allow stopping threads above threads_max")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia95405060c607c7a070720ed32a7a43b1c3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 months agoLU-16739 uapi: restore LUSTRE_NODEMAP_NAME_LENGTH in lustre_disk.h 08/53208/4
James Simmons [Wed, 22 Nov 2023 15:29:58 +0000 (10:29 -0500)]
LU-16739 uapi: restore LUSTRE_NODEMAP_NAME_LENGTH in lustre_disk.h

sanity test 400b fails with:

lustre_disk.h:266:18: error: 'LUSTRE_NODEMAP_NAME_LENGTH' undeclared here (not in a function)
 char   ncr_name[LUSTRE_NODEMAP_NAME_LENGTH + 1];

This is due to the move of LUSTRE_NODEMAP_NAME_LENGTH to lustre_idl.h.
Move it back and this time add a message to avoid this in the
future.

Test-Parameters: trivial
Fixes: 8d828762d1 ("LU-17015 gss: support large kerberos token for rpc sec init")
Change-Id: I5479bbf13f26bfd3b4f6e5f7c0c1688d810fca53
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53208
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>