Whamcloud - gitweb
fs/lustre-release.git
8 weeks agoLU-17512 utils: new ? operator for jobid_name 32/55332/13
Maximilian Dilger [Thu, 6 Jun 2024 05:27:05 +0000 (01:27 -0400)]
LU-17512 utils: new ? operator for jobid_name

Added new ? operator when setting the jobid_name. The intended use
is: "jobid_name=%j?%H" This will use the jobid if it is available
and otherwise uses the short hostname.

Signed-off-by: Maximilian Dilger <mdilger@whamcloud.com>
Change-Id: I418860fce5a81aa8a0a0a43c2d8bdb6d107779f9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55332
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
8 weeks agoLU-16741 llite: rename ptlrpc_req_finished for component llite 85/54985/4
Arshad Hussain [Thu, 2 May 2024 10:18:09 +0000 (06:18 -0400)]
LU-16741 llite: rename ptlrpc_req_finished for component llite

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
llite component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I216aa2797fbebeecae82b1d45301df7a860bde65
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54985
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17812 ldlm: stack trace log for LDLM error 96/54896/18
Rajeev Mishra [Tue, 26 Mar 2024 02:15:31 +0000 (02:15 +0000)]
LU-17812 ldlm: stack trace log for LDLM error

Added support to dump the stack trace in
ldlm_lock_debug(), the stack trace is logged only
for the case of D_ERROR and and when dump_stack_on_error
is enabled

Test-Parameters: testlist=sanity env=ONLY=105g
HPE-bug-id: LUS-12165
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: I4ce280334e0273df1751257e8db03ea680831696
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 weeks agoLU-17760 lnet: Crash caused by uninitialized interface name 59/54859/13
Frank Sehr [Fri, 19 Apr 2024 22:33:12 +0000 (18:33 -0400)]
LU-17760 lnet: Crash caused by uninitialized interface name

When adding an interface with ip2net, a duplicate configuration of an
already existing interface can cause a crash or misconfiguration of
lnet. Incoming interface names have to be checked if they are null and
furthermore duplicate interface configurations have to be removed.
When a duplicate is detected add has to be added to a list to be able
to shut it down otherwise shutdown would assert.
The problem can be repoduced on tcp and o2ib networks.
Steps that were used to reproduce the problem in the original
configuration, but it is reproducable in other variations and
in tcp networks.
modprobe lnet
lnetctl lnet configure
lnetctl net add --net  o2ib --if mlxib1
lnetctl net add --net  o2ib --if mlxib1
       --ip2net "o2ib 172.30.12.*"

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: Ie76d97cc52855ab897a9e07a3697483189d4b19e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54859
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-6142 lnet: Fix style issues for conrpc.c 34/55734/2
Arshad Hussain [Mon, 15 Jul 2024 08:29:42 +0000 (04:29 -0400)]
LU-6142 lnet: Fix style issues for conrpc.c

This patch fixes issues reported by checkpatch
for file lnet/selftest/conrpc.c

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Icd8a9ffffd34c3330fc7c710359bcaf7f197ea52
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-6142 lnet: Fix style issues for brw_test.c 33/55733/2
Arshad Hussain [Mon, 15 Jul 2024 07:51:36 +0000 (03:51 -0400)]
LU-6142 lnet: Fix style issues for brw_test.c

This patch fixes issues reported by checkpatch
for file lnet/selftest/brw_test.c

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I6ccda68a9becf44801e3623acac30ce4c5804374
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55733
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-6142 lnet: Fix style issues for rpc.h 32/55732/2
Arshad Hussain [Mon, 15 Jul 2024 08:55:36 +0000 (04:55 -0400)]
LU-6142 lnet: Fix style issues for rpc.h

This patch fixes issues reported by checkpatch
for file lnet/selftest/rpc.h

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I99524bd815936c95d048a7617acfde3327d8d5e1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55732
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-6142 lnet: Fix style issues for ping_test.c 31/55731/3
Arshad Hussain [Mon, 15 Jul 2024 09:07:11 +0000 (05:07 -0400)]
LU-6142 lnet: Fix style issues for ping_test.c

This patch fixes issues reported by checkpatch
for file lnet/selftest/ping_test.c

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I828ce5ccf6bfc9868fc7a8f9fc9bcb8a9293d118
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55731
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17974 quota: fix qmt_pool_lqes_lookup_spec 35/55535/2
Sergey Cheremencev [Tue, 25 Jun 2024 19:52:21 +0000 (22:52 +0300)]
LU-17974 quota: fix qmt_pool_lqes_lookup_spec

Return 0 from qmt_pool_lqes_lookup_spec if
between found lqes exists global lqe. And
return -ENOENT if
* no lqes have been found
* no global lqe between found lqes
This patch aimed to prevent below panic:

 (qmt_lock.c:957:qmt_id_lock_notify())
ASSERTION( lqe->lqe_is_global ) failed:
 (qmt_lock.c:957:qmt_id_lock_notify()) LBUG
 ...
 Call Trace TBD:
 libcfs_call_trace+0x6f/0xa0 [libcfs]
 lbug_with_loc+0x3f/0x70 [libcfs]
 qmt_id_lock_notify+0x1ee/0x330 [lquota]
 qmt_site_recalc_cb+0x34b/0x550 [lquota]
 cfs_hash_for_each_tight+0x122/0x310 [libcfs]
 qmt_pool_recalc+0x375/0xa80 [lquota]
 kthread+0x134/0x150
 ret_from_fork+0x35/0x40
 Kernel panic - not syncing: LBUG

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I62a2175b7b05c49f28b4e87c36ed653d1b9a71cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17635 lfsck: detect missing LMV hash 64/55364/3
Alexander Zarochentsev [Fri, 7 Jun 2024 17:19:08 +0000 (17:19 +0000)]
LU-17635 lfsck: detect missing LMV hash

Detect striped dirs with a missing LMV hash,
attempting to set it for trivial cases
mark BAD_TYPE otherwise.

HPE-bug-id: LUS-12379
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ibce4dd9cf01d653c431f7b7968691a4d704af9d9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55364
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-18010 build: remove dpatch from dependency 11/55711/2
Shuichi Ihara [Thu, 11 Jul 2024 21:53:02 +0000 (06:53 +0900)]
LU-18010 build: remove dpatch from dependency

dpatch is no longer available in ubuntu24.04.
Let's remove from dependency. if it really needs, use quilt instead.

Test-Parameters: trivial
Signed-off-by: Shuichi Ihara <sihara@ddn.com>
Change-Id: I94939ec6fe87fdbfe2a5904298d90ec324796921
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55711
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-18018 kernel: update RHEL 9.4 [5.14.0-427.24.1.el9_4] 85/55685/3
Jian Yu [Wed, 10 Jul 2024 06:53:41 +0000 (23:53 -0700)]
LU-18018 kernel: update RHEL 9.4 [5.14.0-427.24.1.el9_4]

Update RHEL 9.4 kernel to 5.14.0-427.24.1.el9_4.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.4 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.4 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.4 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.4 testlist=sanity

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-3

Change-Id: If795f9b12a4c7f7eac14b0d38c8078c0013d64da
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55685
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-930 doc: fix lfs-project.1 quoting 80/55680/2
Andreas Dilger [Tue, 9 Jul 2024 21:48:15 +0000 (15:48 -0600)]
LU-930 doc: fix lfs-project.1 quoting

Fix the quoting for the "-0" description, which otherwise is
formatted badly.  Improve the description to give more direction
as to the intended usage of this option.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id32c542a0697bc0c3c79775051d98d05be4ece5f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55680
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-18011 pcc: fix build failure in ->fileattr_set() 71/55671/4
Qian Yingjin [Tue, 9 Jul 2024 08:31:45 +0000 (04:31 -0400)]
LU-18011 pcc: fix build failure in ->fileattr_set()

The build failed on linux-6.8 kernel:
rc = inode->i_op->fileattr_set(&init_user_ns, dentry, &fa);
       ^~~~~~~~~~~~~
       |
       struct user_namespace *
pcc.c:3265:40: note: expected 'struct mnt_idmap' but argument is
of type 'struct user_namespace'.

Replace "&init_user_ns" with "&nop_mnt_idmap" to fix the build
error.

Fixes: 2d1a906ff11 ("LU-12358 pcc: add project quota support on PCC backend")
Test-Parameters: trivial testlist=sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib79d79fa1aa6e99719d1658cdc4c03e1fa1ea064
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55671
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17714 gss: support revoked session keyring 27/55627/3
Sebastien Buisson [Thu, 4 Jul 2024 15:09:23 +0000 (17:09 +0200)]
LU-17714 gss: support revoked session keyring

In case the session keyring for a regular user has been revoked, the
key ends up being linked to the user session keyring. So we must
detect this case and properly unlink the key from the correct keyring.
This applies to the initial key creation workflow, as well as to the
explicit context flush ('lfs flushctx').

Add sanity-krb5 test_10 to exercise this capability.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If96703a2de9a4172613bfbd96e7529b16169cf58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17940 gss: get rid of root key in all cases 55/55555/6
Sebastien Buisson [Thu, 27 Jun 2024 15:20:58 +0000 (17:20 +0200)]
LU-17940 gss: get rid of root key in all cases

The root key associated with a GSS context (gck_key) is used to pass
information between kernel and userspace during GSS context
negotiation.
Whether the GSS context negotiation went well or not, the context and
the key used in this process should be unbound once done. And this
should mean unlinking the key but also directly invalidating it
instead of just revoking it, to make sure the key is ignored by all
searches and other operations.
For the same reasons, invalidate the key when the GSS upcall times
out or the context pre-initilization fails.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I8b61d22e942d0dca16b96780889976c3a5f00f6a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55555
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17971 gss: do not make lsvcgss record its PID 09/55509/2
Sebastien Buisson [Mon, 24 Jun 2024 07:32:35 +0000 (09:32 +0200)]
LU-17971 gss: do not make lsvcgss record its PID

The lsvcgssd daemon is expected to spawn a few additional threads at
startup to carry out extra work. In this case finding the PID of the
'main' thread can be complicated.
So do not try to record this by ourselves, and let systemctl handle
that.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7ddfcd5b5f3c69a46079b42d76fb9585953e30b1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55509
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17689 o2iblnd: handle unexpected network data gracefully 01/55501/5
Serguei Smirnov [Fri, 21 Jun 2024 17:40:20 +0000 (10:40 -0700)]
LU-17689 o2iblnd: handle unexpected network data gracefully

Remove assertions in favour of graceful handling of
unexpected data coming in: prefer to report and handle the error
and carry on.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I62dc260e781ab0d2a5069560ca05f692a612bb8f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17928 lnet: add kmod devel package 87/55387/7
Shaun Tancheff [Wed, 10 Jul 2024 03:20:01 +0000 (10:20 +0700)]
LU-17928 lnet: add kmod devel package

This creates a new kernel module development package for building
kernel modules that depend on the Lustre/LNet kAPI

The most notable of these is DVS which uses LNet

Along with the kernel includes add a package config file: lnet.pc
and the Module.symvers needed for linking against Lustre/LNet kAPI

Use:
   pkg-config --variable=symversdir lnet
to find the path to Module.symvers and include files.

In addition the dkms build can differ enough that the packaged
Module.symvers and config.h (and possibly the headers as well)
may diff enough that they are not interchangeable.

Use the update-alternatives subsystem to enable the dkms and kmp
packages to co-exist and the kmp-devel package to work with either.

Also loosens user space requirement to require:
 Lustre version >= major.minor
and not the exact build

Test-Parameters: trivial
HPE-bug-id: LUS-12246, LUS-12378, LUS-12351
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Caleb Carlson <caleb.carlson@hpe.com>
Change-Id: Idb00b881e8f6d4a703cc71fd0d8768e1f433fca3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55387
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17899 gss: improved systemd unit file for SSK daemon 79/55379/4
Chris Hunter [Thu, 6 Jun 2024 05:44:12 +0000 (01:44 -0400)]
LU-17899 gss: improved systemd unit file for SSK daemon

Add operation ordering to lsvcgss initscript/service unit
so it starts after systemd network services are running.

Signed-off-by: Chris Hunter <chunter@ddn.com>
Change-Id: Iad39d01aae16732ff646383814033d6efb34af5e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-12597 tests: allow comma-separated node lists 40/54340/5
Andreas Dilger [Tue, 1 Aug 2023 21:18:12 +0000 (15:18 -0600)]
LU-12597 tests: allow comma-separated node lists

Allow some functions that deal with space-separated node lists to
also accept comma-separated node lists, to prepare for a future
where $(osts_nodes) and $(mdts_nodes) will return comma-separated
lists already, instead of having to call comma_list each time.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I36e5a2a0814fd6564ca560ad93fdaba0423ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16959 lnet: auto-tune ARP-related sysctl setting 10/53310/24
Frank Sehr [Fri, 1 Dec 2023 23:00:51 +0000 (15:00 -0800)]
LU-16959 lnet: auto-tune ARP-related sysctl setting

Default linux settings for net.ipv4.neigh.default.gc_thresh* may be
too low. The configuration file contains recommended threshold values
for the arp table configuration for larger systems. These values are
not set by default and can be enabled by setting the
enable_sysctl_setup parameter to 1 in the configuration file.
To activate the changes immediately please execute
sysctl -p /etc/lnet-sysctl.conf as root.
New ticket fot documentation
LUDOC-528 - Adding documentation for enable_sysctl_setup

Test-Parameters: trivial testlist=sanity-lnet env=ONLY=260
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: I34af4b402b59341ee7e9cfb45fef7c67eb5e78e9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53310
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17131 ldiskfs: Refresh suse15 sp3 series 44/52944/6
Shaun Tancheff [Mon, 26 Feb 2024 15:24:01 +0000 (22:24 +0700)]
LU-17131 ldiskfs: Refresh suse15 sp3 series

Add:
  ext4-filename-encode.patch
  ext4-add-periodic-superblock-update.patch

Update:
  ext4-encdata.patch

Test-Parameters: trivial
HPE-bug-id: LUS-11967
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idb942ecaf7bac4e335f448885cf3836bc900f416
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52944
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17583 llite: getattr/open should not revalidate dentry 54/54354/5
Etienne AUJAMES [Mon, 11 Mar 2024 17:51:57 +0000 (18:51 +0100)]
LU-17583 llite: getattr/open should not revalidate dentry

ll_getattr() and ll_intent_file_open() do not perform a lookup, it
get the attr and ldlm locks by FID (inode). So this should not
revalidate the dentry, otherwise it may produce dir cache
inconsistencies (e.g: with cwd fd).

Add a regression test: sanityn 31s, 31t

Fixes: 14ca315 ("LU-10948 llite: Revalidate dentries in ll_intent_file_open")
Fixes: 92fadf9 ("LU-15200 llite: revalidate dentry if LOOKUP lock fetched")
Test-Parameters: testlist=sanityn env=ONLY=31s,ONLY_REPEAT=20
Test-Parameters: testlist=sanityn env=ONLY=31t,ONLY_REPEAT=20
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ic9823cddf37373dc95f4de3219c88c0fa0600fa7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54354
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
2 months agoLU-18009 obd: remove o_fid_init/o_fid_fini 67/55667/2
Timothy Day [Mon, 8 Jul 2024 21:22:55 +0000 (21:22 +0000)]
LU-18009 obd: remove o_fid_init/o_fid_fini

In every case, o_fid_init is client_fid_init and o_fid_fini is
client_fid_fini. Remove these function pointers.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Idec4e5d7948b12d67f919f58b97a7119775aaf4e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55667
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 lnet: Correctly handle args passed to jt_show_fault() 51/55651/4
Arshad Hussain [Sun, 7 Jul 2024 01:17:26 +0000 (21:17 -0400)]
LU-17000 lnet: Correctly handle args passed to jt_show_fault()

Remove 'return 0' from jt_show_fault() args processing
default case and instead set rc to -EINVAL. This correctly
takes care of bad args passed. Eg: 'lnetctl fault show -x delay'
or 'lnetctl fault show -t'. The 'rc' check deemed unnecessary
by coverity now becomes legit.

Test-Parameters: trivial testlist=sanity-lnet
CoverityID: 429592 ("Logically dead code")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Id1dc52218405dbd094a7e8304aafeff57b46ab79
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-18004 ptlrpc: shrink timeout using MIN and not MAX value 39/55639/3
Aurelien Degremont [Fri, 5 Jul 2024 13:04:19 +0000 (15:04 +0200)]
LU-18004 ptlrpc: shrink timeout using MIN and not MAX value

Change import_select_connection() to correctly use
CONNECTION_SWITCH_MIN.

When trying to set a small timeout for quick connection tests
patch v2_15_61-238-g94d05d0737 wrongly used CONNECTION_SWITCH_MAX
instead of CONNECTION_SWITCH_MIN.

Fixes: 94d05d0737 ("LU-17379 mgc: try MGS nodes faster")
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: Ia85eac787441d7bef6fd47b083060bf14a8f9a31
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16350 ldiskfs: removed unused ldiskfs patch 37/55637/3
Shaun Tancheff [Sat, 6 Jul 2024 01:55:16 +0000 (08:55 +0700)]
LU-16350 ldiskfs: removed unused ldiskfs patch

The ldiskfs patch:
   linux-5.18/ext4-prealloc.patch
is not used removed it.

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I628c05681366c937a2a60f1b731c4c628720a8f9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55637
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-18000 kernel: update SLES15 SP5 [5.14.21-150500.55.68.1] 21/55621/2
Jian Yu [Thu, 4 Jul 2024 00:21:59 +0000 (17:21 -0700)]
LU-18000 kernel: update SLES15 SP5 [5.14.21-150500.55.68.1]

Update SLES15 SP5 kernel to 5.14.21-150500.55.68.1 for Lustre client.

Test-Parameters: trivial mdtcount=4 mdscount=2 \
  clientdistro=sles15sp5 testlist=sanity

Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-1
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-2
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-3

Change-Id: Id88738be17f8fabe845f943c88d6428faecc63be
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55621
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17998 kernel: update RHEL 8.10 [4.18.0-553.8.1.el8_10] 19/55619/2
Jian Yu [Thu, 4 Jul 2024 00:01:02 +0000 (17:01 -0700)]
LU-17998 kernel: update RHEL 8.10 [4.18.0-553.8.1.el8_10]

Update RHEL 8.10 kernel to 4.18.0-553.8.1.el8_10.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.10 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-3

Change-Id: I578a3ecae6539d674b7078f08227a56a729a6e22
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55619
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17990 tests: sanity 33hh MDT index match often 11/55611/3
Frederick Dilger [Wed, 3 Jul 2024 03:52:45 +0000 (21:52 -0600)]
LU-17990 tests: sanity 33hh MDT index match often

test_33hh in sanity.sh failed likely due to random chance as
occasionally the generation of names will only contain only numbers
or only letters.

To reduce the chance of this being an issue, if the test fails it
will re-run up to 3 times internally, after which if there is still
an issue something is surely wrong and it will fail.

Test-Parameters: trivial testlist=sanity env=ONLY=33hh,ONLY_REPEAT=100
Test-Parameters: trivial testlist=sanity env=ONLY=33hh,ONLY_REPEAT=100

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I4385bd2621f1305e9c11b27f9eb67f9a45aa606a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55611
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17996 mgs: add ability to clear exports 96/55596/2
Sebastien Buisson [Tue, 2 Jul 2024 08:04:56 +0000 (10:04 +0200)]
LU-17996 mgs: add ability to clear exports

Just like with other targets (MDT, OST), give the ability to clear
dead exports from the exports list 'mgs.MGS.exports'.
Improve sanity-sec test_31 to benefit from this new ability.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4e99de31834753d223fd3cfe226f6f0343f2586b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17848 dt: allow dio_lookup/insert/delete to be optional 33/55633/2
Timothy Day [Thu, 4 Jul 2024 18:32:06 +0000 (18:32 +0000)]
LU-17848 dt: allow dio_lookup/insert/delete to be optional

Not every user of the dio API require these operations. Return
EOPNOTSUPP rather than LASSERT() if they are not implemented.

Clean up some stub functions in osp and lfsck.

Test-Parameters: trivial
Test-Parameters: trivial fstype=zfs
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I2df23b87cfca5844f8c5ca843251c463909fcd47
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55633
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17848 dt: allow declare functions to be optional 32/55632/2
Timothy Day [Thu, 4 Jul 2024 17:59:19 +0000 (17:59 +0000)]
LU-17848 dt: allow declare functions to be optional

If an OSD (or other dt implementer) doesn't have anything to
declare, don't force it to implement a declare function for
an operation.

Clean up some examples of useless declare functions in osp
and lfsck.

Test-Parameters: trivial
Test-Parameters: trivial fstype=zfs
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I0d7e9f491ff2a8f6e4f3bf315a10437cd42c2351
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55632
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17848 osd-zfs: use LU_TYPE_INIT_FINI() macro 09/55609/3
Timothy Day [Tue, 2 Jul 2024 23:27:02 +0000 (23:27 +0000)]
LU-17848 osd-zfs: use LU_TYPE_INIT_FINI() macro

Use LU_TYPE_INIT_FINI() macro rather than implementing the
required functions manually.

Test-Parameters: trivial
Test-Parameters: trivial fstype=zfs
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I4117d1174bc7d07b184eb16d826452b075b04ea3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55609
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17848 osd-zfs: remove osd_ladvise()/falloc() 08/55608/3
Timothy Day [Tue, 2 Jul 2024 23:14:00 +0000 (23:14 +0000)]
LU-17848 osd-zfs: remove osd_ladvise()/falloc()

These are implemented as stub functions that return EOPNOTSUPP.
Remove the functions and add a check in the corresponding dt
functions instead.

Test-Parameters: trivial
Test-Parameters: trivial fstype=zfs
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I6fad0a9ca8b07e3d09701e71773dc896a3845b9e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55608
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17848 osd: remove osd_invalidate() for ldiskfs/ZFS 07/55607/3
Timothy Day [Tue, 2 Jul 2024 23:06:12 +0000 (23:06 +0000)]
LU-17848 osd: remove osd_invalidate() for ldiskfs/ZFS

This is implemented as a stub function that returns 0.
Remove the implementations from the OSD and add a check into
dt_invalidate().

Test-Parameters: trivial
Test-Parameters: trivial fstype=zfs
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ieee086218dc83c3129bc572689a14c79c981bcb7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55607
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17848 osd: remove osd_check_stale() for ldiskfs/ZFS 06/55606/3
Timothy Day [Tue, 2 Jul 2024 22:54:20 +0000 (22:54 +0000)]
LU-17848 osd: remove osd_check_stale() for ldiskfs/ZFS

This is implemented as a stub function that returns false.
Remove the implementations from the OSD and add a check into
dt_check_stale().

Test-Parameters: trivial
Test-Parameters: trivial fstype=zfs
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id7fb2c1d1600a3dcc5d278cb2dab5d65a10bdefd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17165 tests: fix recovery-small test_141 91/55591/2
Sebastien Buisson [Mon, 1 Jul 2024 09:01:54 +0000 (11:01 +0200)]
LU-17165 tests: fix recovery-small test_141

Wait for import generation change before counting the locks when the
MGS has been restarted. And to make things clearer, check lock count
on OST side.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=recovery-small env=ONLY=141,ONLY_REPEAT=20
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie9526aa38e3a669b7865516a296dfeed438a83f3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55591
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17985 osd-ldiskfs: drop osd object if failed to create 71/55571/4
Hongchao Zhang [Fri, 21 Jun 2024 21:51:31 +0000 (05:51 +0800)]
LU-17985 osd-ldiskfs: drop osd object if failed to create

In osd_create, if the newly created inode had already contained
correct XATTR_NAME_LMA but failed to update the OI, it will clear
osd_object->oo_inode, the osd_object should also be dropped.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4ff5952c154ce459c78514b88b1810471635c703
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 months agoLU-14094 tests: improve sanity.sh test_311 66/55566/6
Emoly Liu [Fri, 28 Jun 2024 17:12:29 +0000 (01:12 +0800)]
LU-14094 tests: improve sanity.sh test_311

Improve sanity.sh test_311 to see why the number of the objects
doesn't decrease as expected.

Test-Parameters: trivial testlist=sanity env=ONLY=311,ONLY_REPEAT=200
Test-Parameters: trivial testlist=sanity env=ONLY=311,ONLY_REPEAT=200
Test-Parameters: trivial testlist=sanity env=ONLY=311,ONLY_REPEAT=200
Test-Parameters: trivial testlist=sanity env=ONLY=311,ONLY_REPEAT=200
Test-Parameters: trivial testlist=sanity env=ONLY=311,ONLY_REPEAT=200
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Iabbaed42c5654ef31bc9f98fe9868785f8ff2f18
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55566
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17984 lnet: Remove the correct state on failure 52/55552/2
Shaun Tancheff [Thu, 27 Jun 2024 10:49:22 +0000 (17:49 +0700)]
LU-17984 lnet: Remove the correct state on failure

On cpu init a failure to setup CPUHP_AP_ONLINE_DYN should
remove the previously setup state CPUHP_BP_PREPARE_DYN

CPUHP_AP_ONLINE_DYN should be CPUHP_BP_PREPARE_DYN

Test-Parameters: trivial
Fixes: 6d27c2c8c72 ("LU-17592 build: compatibility updates for kernel 6.8")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic9fc9dd4e798be3a0db65092e2b8e545ec5d4687
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55552
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9119 lnet: remove "struct' from generated comment 07/55507/2
Olaf Weber [Fri, 27 Jan 2017 15:17:01 +0000 (16:17 +0100)]
LU-9119 lnet: remove "struct' from generated comment

The CHECK_STRUCT() generates a comment saying
"Checks for struct " followed by the type name.
If the type name is 'struct mumble' the result
is "Checks for struct struct mumble". Drop the
extra "struct".

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
Change-Id: I90b13a2c500c63accb90ef567b197defd5521dea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55507
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-930 doc: document no_create mount option 03/55503/3
Andreas Dilger [Fri, 21 Jun 2024 22:15:33 +0000 (16:15 -0600)]
LU-930 doc: document no_create mount option

Add the "-o no_create" mount option to the mount.lustre.8 man page.

Test-Parameters: trivial
Fixes: 1dbcd0bab8 ("LU-12998 mds: add no_create parameter to stop creates")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I143f46f71fdcff8ce320861e7ade0f7a9a1f96f7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Maximilian Dilger <mdilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17460 lnet: properly handle net_device referencing 82/55582/5
James Simmons [Mon, 8 Jul 2024 20:20:52 +0000 (16:20 -0400)]
LU-17460 lnet: properly handle net_device referencing

Most of the time LNet uses __in[6]_dev_get_xxx() which does no
reference changes. The one expection is the use of dev_get_by_index()
called in lnet_create_socket(). Replace dev_get_by_index() with
dev_get_by_index_rcu(). Also examined the code to make sure the
right type of locking was done. If we use rcu locking we should
use for_each_netdev_rcu() so update ksocknal_ip2index().

Test-Parameters: trivial testlist=sanity-lnet
Fixes: e4fa181abf1 ("LU-10391 lnet: allow creation of IPv6 socket.")
Fixes: 09c6e2b8722 ("LU-16836 lnet: ensure dev notification on lnd startup")
Change-Id: I0c496652553318bd0e47fa1e03d6e631fd8421bb
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55582
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-15053 tests: sanity-quota_13 fix 97/55197/7
Sergey Cheremencev [Fri, 24 May 2024 16:28:47 +0000 (19:28 +0300)]
LU-15053 tests: sanity-quota_13 fix

Scope a case when there are any extra users
with quota limit and non zero usage besides
TSTUSR and TSTUSR2. This is possible when
tests are started with ENABLE_QUOTA=yes.
In a such case each user may have a lock between
OST and QMT depending. Take this into account
in sanity-quota_13. Fix with following failure:

  sanity-quota test_13: @@@@@@ FAIL: 2 cached locks

Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: testlist=sanity-quota env=ONLY=13,ONLY_REPEAT=100
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iaf48d0eb80eef0fc5ebc8246e8fac3f9c96563c0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55197
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17629 utils: support hostname with 94/54894/9
James Simmons [Wed, 26 Jun 2024 17:43:35 +0000 (13:43 -0400)]
LU-17629 utils: support hostname with
 lustre_lnet_parse_nid_range()

For a hostname it's possible it maps to multiple IPs. In
this case lnetctl commands that attempt to use the hostname
can resolve to the wrong IP address. Update the function
lustre_lnet_parse_nid_range() to work with hostnames and
properly resolve the correct IP address. Update both
lnetctl ping and lnetctl discover to work with
lnet_parse_nid_range().

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I670799edcb04a02380e96c289ba26854b057d978
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17722 tests: trim tmpfs from wait_delete_completed() 20/54720/10
Alex Zhuravlev [Wed, 10 Apr 2024 12:27:22 +0000 (15:27 +0300)]
LU-17722 tests: trim tmpfs from wait_delete_completed()

to release unused ram

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Idcd4d15e0f56184e1d1897f3a64d5b62baaf7edb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54720
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-10499 pcc: add stats for attach|detach|auto_attach 18/54418/7
Qian Yingjin [Fri, 27 Aug 2021 03:46:13 +0000 (11:46 +0800)]
LU-10499 pcc: add stats for attach|detach|auto_attach

In this patch, we add stats for PCC attach, detach and
auto_attach.
With this feature, we verify that PCC can auto-attach the file
into PCC cache without having to re-fetch the data of the whole
file.
Add sanity-pcc test_44.

EX-bug-id: EX-3715
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia0c1cd6b414998e72859aaf34c125b5a4e4e743c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54418
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10499 pcc: avoid to specify ID for every attach 14/54414/6
Qian Yingjin [Tue, 8 Jun 2021 09:54:49 +0000 (17:54 +0800)]
LU-10499 pcc: avoid to specify ID for every attach

In this patch, it avoids the need to specify "-i <attach_id>" for
every attach as in the very common case there is only a single
cache for that client.
If attach ID is not specified, it will select the first dataset
on the client as PCC backend.

And the new format of PCC state is as follows:
file: /mnt/lustre/f42.sanity-pcc, type: readonly, PCC_file:
/d42.sanity-pcc/0402/0x200000401:0x3:0x0, open_count: 0, flags: 0

EX-3752 pcc: show attaching state for PCC state output

When set llite.*.pcc_async_threshold=0, the client will do PCC
attach in asynchronous way.
When the file is large, attaching the file into PCC may take some
time.
In this patch, we improve that output of the PCC command
"lfs pcc state" to show that the file is in PCC attaching state
when the file is still in the phase of copying from Lustre OSTs
to PCC.
Was-Change-Id: I101d87638f5afac41fb4f55b4aaf95d938bc8ccd

EX-bug-id: EX-3292 EX-3752
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icd23eda5dca4711f9bb7af940f6cef5ddb97ce69
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54414
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-10499 pcc: avoid dead lock for auto attach in PCC-RO 90/54390/9
Qian Yingjin [Wed, 12 May 2021 03:43:28 +0000 (11:43 +0800)]
LU-10499 pcc: avoid dead lock for auto attach in PCC-RO

In this patch, It releases the pcc inode lock when calling
ll_layout_refresh() in @pcc_try_auto_attach() as it may cause the
following deadlock:
1. The client is writing or truncating a file in readonly mode.
   At this time, it will send a write layout intent lock to clear
   the readonly state on the layout on MDT.
2. A read process tries to auto attach the file with pcc inode
   lock hold. During the pregress of auto attach, it will call
   ll_layout_refresh(). The client-side enqueue request for a
   layout lock returned a blocked lock, it will sleep and wait for
   the lock being granted;
3. MDT will take EX layout lock to cancel all cached layout lock
   on client to change the layout for clearing the PCC-RO state.
4. when the client handles the revocation of layout lock, it needs
   to invalidate the PCC state which needs under the protection of
   pcc inode lock.

EX-3191 pcc: add test for mmap | write | detach racer

This patch adds the mmap racer among: (write | read | mmap_cat |
detach | unlink): sanity-pcc/test_99.
Was-Change-Id: I5db160851a95937275fea6ae32f40dcd0fe69f46

EX-3478 pcc: avoid uninitialized pcc mutext lock in cleanup

Running racer concurrently crashed in the following way:
  RIP: 0010:[...]  [...] __list_add+0x1b/0xc0
  __mutex_lock_slowpath+0xa6/0x1d0
  mutex_lock+0x1f/0x2f
  pcc_inode_free+0x1e/0x60 [lustre]
  ll_clear_inode+0x64/0x6a0 [lustre]
  ll_delete_inode+0x5d/0x220 [lustre]
  evict+0xb4/0x180
  iput+0xfc/0x190
  ll_iget+0x156/0x350 [lustre]
  ll_prep_inode+0x212/0x9b0 [lustre]

After analysis, we found that the mutex @lli_pcc_lock is not
initialized. The reason is that ll_lli_init() is not called to
initialize @lli.
When call pcc_inode_free(), it will call mutex_lock() on the
uniniitialized @lli_pcc_lock, thus crash the kernel.

In liblustreapi_pcc.c, it should set errno on error return.
Was-Change-Id: I612c79a5b8eb4fa9daeb9e446a457e95c666c04a

EX-3636 pcc: reset file mmaping for the file once mmaped

For a file once mmaped and cached on PCC, a new open will set the
mapping for the file handle of PCC copy (@file->f_mapping) with
the one of the Lustre file handle. When the file is detached from
PCC due to manual detach or layout lock shrinking, the normal I/O
(read/write) will auto-attach the file into PCC again during I/O
as the layout version is unchanged. However, it still needs to
reset the file mapping (@pcc_file->f_mapping) with the mapping of
the PCC copy. Otherwise it will cause panic as follows:
[  935.516823] RIP: 0010:_raw_read_lock+0xa/0x20
[  935.517077]  ll_cl_find+0x19/0x60 [lustre]
[  935.517098]  ll_readpage+0x51/0x820 [lustre]
[  935.517110]  read_pages+0x122/0x190
[  935.517119]  __do_page_cache_readahead+0x1c1/0x1e0
[  935.517131]  ondemand_readahead+0x1f9/0x2c0
[  935.517142]  pagecache_get_page+0x30/0x2c0
[  935.517165]  generic_file_buffered_read+0x556/0xa00
[  935.517189]  pcc_try_auto_attach+0x3ac/0x400 [lustre]
[  935.517552]  pcc_io_init+0x146/0x560 [lustre]
[  935.517906]  pcc_file_read_iter+0x24d/0x2b0 [lustre]
[  935.518259]  ll_file_read_iter+0x74/0x2e0 [lustre]
[  935.518604]  new_sync_read+0x121/0x170
[  935.518937]  vfs_read+0x8a/0x140

This patch adds sanity-pcc test_98 to verify it.

I/O for a file previously opened before attach into PCC or once
opened while in ATTACHING state will fallback to Lustre OSTs.
For the later mmap() on the file, the mmap() I/O also needs to
fallback to Lustre OSTs and cannot read directly from local valid
cached PCC copy until all fallback file handles are closed as the
mapping of the PCC copy is replaced with the one of Lustre file
when mmapped a file.
Add sanity-pcc test_97 to verify it.

And we also forbid to auto attach the file which is still in
mmapped I/O.

EX-3636 pcc: auto attach should skip if already attached

When try to auto attach a file into PCC, if found that the file
had already attached into PCC, it should skip the auto attach
processing. Otherwise, it will result in wrong PCC inode refcount
when multiple threads try to auto attach a file at the same time.

For a file once mmapped into PCC and detached due to layout lock
shrinking or manual detach command, If found that file is still
valid cached (attach into PCC again by another thread), in the
@pcc_mmap_io_init(), it should set the mapping of PCC copy with
the one of Lustre file again.
Was-Change-Id: I5f049ca7d6db8708712e79e9ad459fc60b80f2be

LU-17964 pcc: set mapneg bit in all cases of normal I/O fallback

When a file is copying data from Lustre OSTs to the PCC copy, the
file is in PCC ATTACHING state. New opens and I/O on this file
will fallback to the normal I/O path (Lustre OSTs) before the
attach is finished. And the file handle will be set with fallback
and mapneg bit. Currently we only clear the fallback and mapneg
bit when the file handle is closed.

To support mmap() I/O, we replace the mapping of the PCC copy with
the one of the Lustre file. However, we can do that only if the
Lustre file has not any opened file handle with mapneg bit set.
Otherwise, we can not switch the mapping and the mmap() I/O will
also fallback to Lustre OSTs and use the mapping of the Lustre
file.

Once a mmap()ed file was detached from PCC backend due to the
manual detach command or the revocation of the LAYOUT ibit lock
(which protects the cache validity of PCC cache access), we should
reset the mapping of the PCC file accordingly and set fallback and
mapneg bits if the I/O is falling back into the normal path
(Lustre OSTs).
Was-Change-Id: Ibd152aaf724dcff48efbe022dc7f3e70848b4e0d

EX-bug-id: EX-3080 EX-3191 EX-3478 EX-3480 EX-3636
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I18890d19d03726a5991c923505e8c5363382fdc2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54390
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16350 ldiskfs: Server support for linux v6.7 / Ubuntu 24.04 16/54216/10
Shaun Tancheff [Sat, 6 Jul 2024 01:54:39 +0000 (08:54 +0700)]
LU-16350 ldiskfs: Server support for linux v6.7 / Ubuntu 24.04

Exclude kunit tests [files matching *-test.c] from ldiskfs build

Updated patch series for Linux v6.7:
  ext4-corrupted-inode-block-bitmaps-handling-patches.patch
  ext4-ialloc-uid-gid-and-pass-owner-down.patch

Updated patch series for Linux v6.5:
   ext4-data-in-dirent.patch

Change struct osd_it_ea_dirent.oied_name from zero length
to flexible array so strncmp works as expected.

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2b2325a5874a91096fbd63750096e459065668bc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54216
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-6142 misc: remove empty header 66/55666/2
Timothy Day [Mon, 8 Jul 2024 21:09:40 +0000 (21:09 +0000)]
LU-6142 misc: remove empty header

This header doesn't do anything.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia544c9629ea5c787390c393baa45c310126c14ea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55666
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 osd-ldiskfs: SPDX for ldiskfs OSD 13/55613/2
Timothy Day [Wed, 3 Jul 2024 04:35:43 +0000 (04:35 +0000)]
LU-6142 osd-ldiskfs: SPDX for ldiskfs OSD

Convert from verbose license text to SPDX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ie9a298fcf9999af1edfbf1acb58c23d7c83fbb7c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 osd-zfs: SPDX for ZFS OSD 12/55612/2
Timothy Day [Wed, 3 Jul 2024 04:17:14 +0000 (04:17 +0000)]
LU-6142 osd-zfs: SPDX for ZFS OSD

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I02da1c1772da64b88a4469a07d42522472f6ba72
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55612
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14810 lnet: Do not issue multiple PUSHes 59/55559/2
Chris Horn [Thu, 27 Jun 2024 16:40:19 +0000 (10:40 -0600)]
LU-14810 lnet: Do not issue multiple PUSHes

PUSH ACK may be delayed in network. Meanwhile, some event could cause
peer to go through discovery again (e.g. config change or NI state
change). The discovery state machine doesn't consider whether there
is an outstanding PUSH so it may issue another one for the same peer.
When delayed ACK arrives it will then clear PUSH_SENT, so now
discovery doesn't know that there is an outstanding PUSH. If discovery
is stopped then it doesn't unlink the push MD and this can cause an
assert in lnet_assert_handler_unused() because the push event handler
is still in use.

Modify the discovery state machine to check for PUSH_SENT when
determining whether a peer needs a PUSH.

sanity-lnet test_304 can reproduce this issue under ipv6
configuration if modules are unloaded at the end of the test.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic3f7a8b44f85a18afb939fdbfa1f9bc5dc64d93d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55559
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 lnet: Fix style issues for timer.c 47/55447/3
Arshad Hussain [Mon, 17 Jun 2024 05:39:25 +0000 (01:39 -0400)]
LU-6142 lnet: Fix style issues for timer.c

This patch fixes issues reported by checkpatch
for file lnet/selftest/timer.c

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia672f0800389fc5fb1c323919b9345faba1b0347
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55447
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17893 tests: replay-dual/31 fails a lot 21/55321/3
Timothy Day [Wed, 5 Jun 2024 14:43:50 +0000 (14:43 +0000)]
LU-17893 tests: replay-dual/31 fails a lot

Add some debugging.

Test-Parameters: trivial testlist=replay-dual env=ONLY=31,ONLY_REPEAT=50
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic417ee5d6a3d53ce0e7aa51708dc9d9317b1ce30
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55321
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 llite: Fix style issues for file.c 39/54139/12
Arshad Hussain [Thu, 22 Feb 2024 07:45:37 +0000 (13:15 +0530)]
LU-6142 llite: Fix style issues for file.c

This patch fixes issues reported by checkpatch
for file lustre/llite/file.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I66d7a4adfad48d6de26be8d009f004efb90b6d23
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54139
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-15491 mdt: rename lock inversion deadlock 52/46352/8
Oleg Drokin [Fri, 21 Jun 2024 04:43:37 +0000 (12:43 +0800)]
LU-15491 mdt: rename lock inversion deadlock

MDT rewrite got rid of lock ordering by fid and replaced it
with parent/child ordering, but child to child lock inversion is
still possible with hardlinks as follows:

thread 1: mv dir1/10 dir2/10
thread 2: mv dir2/11 dir2/2
where dir1/10 is hardlink to dir4/2 and dir2/10 is hardlink to dir2/11

To solve this we enforce child ordering by fid in case of local rename

This should not create problems aynwhere else but rename since
the next closest candidate - link() does not delete the target
if it exists so it's not locked in that case.

Fixes: d76cc65d5d68 ("LU-12125 mds: allow parallel regular file rename")
Change-Id: Idd3525e8b1a0de411766bdcaa480c60d3b5e491b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
2 months agoLU-17404 kernel: update RHEL 9.4 [5.14.0-427.22.1.el9_4] 87/55487/4
Jian Yu [Thu, 20 Jun 2024 22:26:19 +0000 (15:26 -0700)]
LU-17404 kernel: update RHEL 9.4 [5.14.0-427.22.1.el9_4]

Update RHEL 9.4 kernel to 5.14.0-427.22.1.el9_4.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.4 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.4 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.4 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.4 testlist=sanity

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-3

Change-Id: Icb9aac22fbc4b7d34d13288e98fa4c28022db82e
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55487
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17978 tests: always except sanity-pcc test_33 on el8.9+ 89/55589/5
Sebastien Buisson [Mon, 1 Jul 2024 08:21:47 +0000 (10:21 +0200)]
LU-17978 tests: always except sanity-pcc test_33 on el8.9+

sanity-pcc test_33 is failing 100% of the time on rhel8.9 and el810
due to previous reported inconsistent LSOM problem in LU-17781,
so add it to ALWAYS_EXCEPT until a fix is found.

Test-Parameters: trivial clientdistro=el8.9 testlist=sanity-pcc
Test-Parameters: trivial clientdistro=el8.10 testlist=sanity-pcc
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ic8a48708e26776ff84201b040cbb9993fc1fe25a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55589
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17404 kernel: RHEL 9.4 server support 13/54713/12
Jian Yu [Fri, 28 Jun 2024 16:56:41 +0000 (09:56 -0700)]
LU-17404 kernel: RHEL 9.4 server support

This patch makes changes to support RHEL 9.4 release
with kernel 5.14.0-427.20.1.el9_4 for Lustre server.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.4 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.4 serverdistro=el9.3 testlist=sanity

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.4 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.4 testlist=sanity

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el9.4 serverdistro=el9.4 \
  testgroup=full-part-3

Change-Id: I4741041c6b7f5604b13523a24060b6a2804a5ef2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54713
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17986 tests: Update sanity-lnet 111 for IPv6 61/55561/2
Chris Horn [Thu, 27 Jun 2024 20:08:49 +0000 (14:08 -0600)]
LU-17986 tests: Update sanity-lnet 111 for IPv6

Modify test_111 to use setup_router_test() so that it will use
IPv6 NIDs for the --gateway argument when that is what the test
environment has configured.

The test is also modified to check success/failure of the route add
command, and to verify that the correct number for routes were added.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8dea5874dd01eb7f35603bed46660cd6c6b85041
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55561
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17976 build: fix nla_strnid() -Werror=missing-prototypes 37/55537/2
Jian Yu [Tue, 25 Jun 2024 23:07:30 +0000 (16:07 -0700)]
LU-17976 build: fix nla_strnid() -Werror=missing-prototypes

This patch explicitly defines nla_strnid() as a static function
to fix the following build failure:

  lnet/lnet/api-ni.c:2929:1: error:
  no previous prototype for 'nla_strnid' [-Werror=missing-prototypes]
   2929 | nla_strnid(struct nlattr **attr, struct lnet_nid *nid, int *rem,
        | ^~~~~~~~~~

It also fixes:
- lnet_fault_show_done()
- lnet_fault_show_start()
- lnet_fault_show_dump()
- __ll_dio_user_copy()
- ll_dio_user_copy_helper()

Change-Id: I0225794b3fac2f36aafadff783ca921fbc757edd
Fixes: f1c6623 ("LU-10391 lnet: Fault injection add/del ioctls to netlink")
Fixes: 1fa633c ("LU-17478 clio: parallelize unaligned DIO write copy")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55537
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17975 tests: execvp error on file write_append_truncate 30/55530/4
Elena Gryaznova [Tue, 25 Jun 2024 10:08:27 +0000 (13:08 +0300)]
LU-17975 tests: execvp error on file write_append_truncate

test fails on sles15sp4 clients due to write_append_truncate
not found:
  execvp error on file write_append_truncate
    (No such file or directory)

HPE-bug-id: LUS-8427
Test-Parameters: trivial testlist=parallel-scale env=ONLY=write_append_truncate
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I88a5a6c30510dff19b59d32eca2a90566a21f64e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17793 libcfs: fix objtool warning in lbug_with_loc() 05/55505/4
Jian Yu [Tue, 25 Jun 2024 03:00:20 +0000 (20:00 -0700)]
LU-17793 libcfs: fix objtool warning in lbug_with_loc()

After lbug_with_loc() was removed from the objtool
global_noreturns array in Linux commit v6.4-rc2-10-g34245659debd,
building Lustre hit the following warning:

  libcfs/libcfs/fail.o: warning: objtool: __cfs_fail_check_set()
  falls through to next function __cfs_fail_timeout_set()

This patch fixes the above warning by adding an unreachable
panic() at the end of lbug_with_loc() to terminate all of
the call paths in that function.

As a consequence of this change, we need to make the patch
fix more errors, such as:

  lnet/lnet/api-ni.c: In function 'lnet_res_type2str':
  libcfs/include/libcfs/libcfs_private.h:119:9:
  error: this statement may fall through [-Werror=implicit-fallthrough=]
    119 |         lbug_with_loc(&msgdata); \
        |         ^~~~~~~~~~~~~~~~~~~~~~~
  lnet/lnet/api-ni.c:1143:17: note: in expansion of macro 'LBUG'
   1143 |                 LBUG();
        |                 ^~~~
  lnet/lnet/api-ni.c:1144:9: note: here
   1144 |         case LNET_COOKIE_TYPE_MD:
        |         ^~~~

and

  lustre/obdclass/lprocfs_status.c: In function 'lprocfs_stats_lock':
  lustre/obdclass/lprocfs_status.c:470:1:
  error: control reaches end of non-void function [-Werror=return-type]
    470 | }
        | ^

Change-Id: I5574559619b4b6746f4e7da51f3213ede246a73b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55505
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17969 tests: don't use hard coded mount path for quota testing 04/55504/2
James Simmons [Sat, 22 Jun 2024 12:25:29 +0000 (08:25 -0400)]
LU-17969 tests: don't use hard coded mount path for quota testing

On setup for quota testing sanity-quota assumes the mount path
is always /mnt/lustre. This is not always true so update to
$MOUNT.

Fixes: 7e1fb1a296e ("LU-17179 tests: check the system is clean")
Test-Parameters: trivial testlist=sanity-quota
Change-Id: I2c1fbf5f8f38b4c508137b2c6e956d47031e9c12
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55504
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17904 build: fix typo in vvp_set_batch_dirty 01/55301/4
Shaun Tancheff [Thu, 6 Jun 2024 02:40:46 +0000 (09:40 +0700)]
LU-17904 build: fix typo in vvp_set_batch_dirty

Fix typo vvp_set_batch_dirty() when kallsyms_lookup_name()
is exported and account_page_dirtied is not.

HPE-bug-id: LUS-12374
Test-Parameters: trivial
Fixes: b82eab822c0 ("LU-17081 build: Prefer folio_batch to pagevec")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8b2e6884e74e384aba6e563bef30072175cc0efc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-17903 build: enable fast path of vvp_set_batch_dirty 00/55300/4
Shaun Tancheff [Fri, 28 Jun 2024 07:49:31 +0000 (14:49 +0700)]
LU-17903 build: enable fast path of vvp_set_batch_dirty

SUSE 15 SP6 6.4 kernel retains kallsyms_lookup_name so
the fast path of vvp_set_batch_dirty() can be enabled.

However the combination of kallsyms_lookup_name without
lock_page_memcg breaks some old assumptions

Prefer folio_memcg_lock to lock_page_memcg however

Linux commit v5.15-12272-g913ffbdd9985
  mm: unexport folio_memcg_{,un}lock

folio_memcg_lock is also not exported so use
kallsyms_lookup_name to acquire the symbol

HPE-bug-id: LUS-12371
Test-Parameters: trivial
Fixes: 61e83a6f130 ("LU-16113 build: Fix configure tests for lock_page_memcg")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8ac6b7bde8ee8964db5a801c2f3c4dfb2ef459f9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55300
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10522 utils: new --mindepth for lfs find 26/55226/14
Maximilian Dilger [Tue, 28 May 2024 17:55:27 +0000 (13:55 -0400)]
LU-10522 utils: new --mindepth for lfs find

Added [--mindepth | -d] option for 'lfs find' to print
only the results found after N levels. Similar usage to existing
mindepth found in find.

Signed-off-by: Maximilian Dilger <mdilger@whamcloud.com>
Change-Id: I7816e27355f6796edc8437f700342da1b7e564d0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55226
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17833 ptlrpc: Check lru_resize during connection 60/55060/19
Di Wang [Wed, 8 May 2024 23:20:40 +0000 (23:20 +0000)]
LU-17833 ptlrpc: Check lru_resize during connection

Since the parameter log processing might finish before connection
is established, so it should check if lru size has been disabled
by parameters log in ptlrpc_connect_set_flags().

OCI-bug-id: LFS-229
Signed-off-by: Di Wang <di.d.wang@oracle.com>
Change-Id: I246fcbcd17aa201f80b6950d8eff57489dc81645
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55060
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-14712 ldiskfs: add bg_trimmed_threshold interface 67/55567/2
Li Dongyang [Fri, 28 Jun 2024 02:18:50 +0000 (12:18 +1000)]
LU-14712 ldiskfs: add bg_trimmed_threshold interface

Export interface bg_trimmed_threshold via sysfs, and only clear
BG_TRIMMED flag when there are enough blocks freed since
last fstrim(default 128).

Make sure we use cpu_to_le32() with EXT2_FLAGS_TRACK_TRIM.

Change-Id: I98d86f6d7335af53b8e74c747797b0dff3abb5d0
Test-Parameters: trivial
Fixes: ad30edf910 ("LU-14712 ldiskfs: introduce EXT4_BG_TRIMMED to optimize fstrim")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55567
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10499 pcc: keep PCC copy when it is being attached 89/54389/5
Qian Yingjin [Tue, 11 May 2021 03:38:36 +0000 (11:38 +0800)]
LU-10499 pcc: keep PCC copy when it is being attached

When detach a file from PCC backend via FID, if the file is being
attached, it should not purge the coresponding PCC copy from the
PCC backend. Just keep the PCC copy to finish the attach process.

EX-3144 pcc: revalidate the pointer after attach
In this patch, we also fix a bug during PCC open attach. It
refreshes @pcci pointer since the lock may be released in
@pcc_try_readonly_open_attach().
Was-Change-Id: I470358dfde525e08e7110e862b30b527e5db94fe

EX-bug-id: EX-3133 EX-3144
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I8a8f7c6986d51eaf9b2516e5dd5a6f21aa38b7db
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54389
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10499 pcc: don't reopen mountpoint for each cache file 88/54388/6
Qian Yingjin [Fri, 19 Mar 2021 08:45:26 +0000 (16:45 +0800)]
LU-10499 pcc: don't reopen mountpoint for each cache file

When scanning and processing files in the PCC cache filesystem
(e.g. "llapi_pcc_scan_detach()" is looking for the Lustre
mountpoint and reopening it for every file processed.

This patch changed it to open the Lustre mountpoint only once,
then reuse the file handle for all of the later calls. The file
handle will be closed when finished the processing.

This patch also repaces to use llapi_fid_parse to get FID from
an given string.

EX-bug-id: EX-2861
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Iad92c216262296096e30ca4a4c6b2765dfd3afaa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54388
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10499 pcc: async attach in the background for PCC-RO file 79/54379/9
Qian Yingjin [Mon, 22 Mar 2021 09:16:15 +0000 (17:16 +0800)]
LU-10499 pcc: async attach in the background for PCC-RO file

In current PCC, it may have a long delay while the whole file is
being copied into the cache before it can be used. There is a
significant delay for the first file access if the file is large,
which wastes valuable computing time. Being able to shorten this
time to first access may help application efficiency.

In this patch, it adds an tuning parameter "async_threshold",
which means the size threshold to determine doing PCC-RO attach
asynchronously in the background.

When the file size is samller than the threshold, the PCC attach
during open() will be performed in synchronous way.
Otherwise, the client will start a dedicated kernel thread to
copy data from Lustre OSTs to the PCC copy in the background, but
reads could fall back to the normal Lustre I/O path from Lustre
OSTs until the file is fully cached.

This may double the reads to the Lustre filesystem initially if
the file is not read sequentially, but would avoid the high
latency for data access. This may be some cache sharing (avoiding
double reads) if the PCC copy and the application both shared
the filesystem cached pages on the client.

The tuning parameter "llite.*.pcc_async_threshold" is set with
256MiB by default.

EX-3880 pcc: add pcc_async_affinity for async PCC attach

This patch adds a tunable parameter "llite.*.pcc_async_affinity"
that enables or disables the CPT selection in PCC-RO asynchronous
attach for testing.
Was-Change-Id: I1473a7547555a2d6c615d37182b6cc359194aae0

EX-bug-id: EX-2873 EX-3880
Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia80992e9050cc6e4c7f61949fc4013dec303e150
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12358 pcc: add project quota support on PCC backend 31/39831/11
Qian Yingjin [Sun, 6 Sep 2020 08:52:04 +0000 (16:52 +0800)]
LU-12358 pcc: add project quota support on PCC backend

Current PCC can enforce a quota limitation of the capacity usage
for each user and group to provide cache isolation. An admin
can specify the quota enforcement on the local PCC file system.

Users can perform PCC-cached I/O on files until they receive a
return value -ENOSPC of -EDQUOT, which means that they hit the
quota limit or that there is no free capacity left on the local
PCC backend fs during I/O or the attach process. At this time,
I/O will fall back to the normal I/O path.

This patch adds project quota on the PCC backend file system
along with user/group quota.

With this feature, it can have multiple PCC backends on a single
client with different caching rules, so we can define upfront
how much of the client FS can be used for each cache.

Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib93da953d4a3a7091f62094f8175bde91e819895
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10499 pcc: get PCC state for a file without opening itself 72/54372/8
Qian Yingjin [Thu, 25 Feb 2021 12:43:58 +0000 (20:43 +0800)]
LU-10499 pcc: get PCC state for a file without opening itself

Originally to get PCC state for a given file, the user needs to
open the file and then get the current PCC state of the file via
the file handle. After that, close the file.

If the file is met the predefined condition of auto prefetching
into PCC at the open time, "lfs pcc state" command on the file
will attach the file into PCC cache. This may be not the intention
of the user.

In this patch, we rework the "lfs pcc state" command. It always
open the parent directory, and then do the lookup by name/FID
without open the file itself to get the PCC state.

EX-5358 pcc: remove realpath() from lfs_pcc_state()

Before Ubuntu 20.04, realpath() executes lstat() for each
component of the path. If the file is still valid cached on PCC
device with the layout generation unchanged, the Lustre file will
be auto re-attach during the stat() call in the Lustre kernel.
This may result in misunderstanding for the operation "lfs pcc
state" on a file that has already detached but still valid cached
on PCC according to the unchanged layout generation.

This problem exposes on the newer Ubuntu 22.04 in which realpath()
executes readlink() for each component of the path instead of
lstat():
readlink("/mnt", 0x7fffd5760800, 1023)  = -1
readlink("/mnt/lustre", 0x7fffd5760800, 1023) = -1
readlink("/mnt/lustre/sanity-pcc.f15", 0x7fffd5760800, 1023) = -1
Was-Change-Id: I50ae46a1e952a3faaf0d7a7293579e239156d6d3

LU-16030 pcc: enlarge PCC backend size for sanity-pcc script

In this patch, it removes realpath() from lfs_pcc_state() to avoid
this misunderstanding behavior for the command: $LFS pcc state.
And it also fixes the test scripts sanity-pcc: test_15, test_16,
test_27, test_39.
Was-Change-Id: Ib19f01ed054cb6c9eecceabea1f1da72dea0b113

EX-bug-id: EX-2455 EX-5358 LU-16030
Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I310a7e73dc6c0f4318dc27df2e02ecf6559ee5b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54372
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-10499 pcc: check first before set PCC-RO on a file 70/54370/8
Qian Yingjin [Fri, 5 Feb 2021 03:48:26 +0000 (11:48 +0800)]
LU-10499 pcc: check first before set PCC-RO on a file

In this patch, MDT takes a CR layout lock against the file object
first to check whether the file is already PCC-RO cached. If so,
return immediately; Otherwise, take an EX lock on the file to
update the FLR PCC-RO state accordingly. By this check, it can
avoid heavy lock contention and unnecessary revocation of the
layout lock granted to the other clients when multiple processes
from many clients perform read-only attach on a shared file
simultaneously.

EX-bug-id: EX-2455
Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If59315abe444917f8a890b60a38c239b8ee045bf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54370
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-14003 pcc: rework PCC mmap implementation 92/40092/22
Qian Yingjin [Wed, 30 Sep 2020 03:00:43 +0000 (11:00 +0800)]
LU-14003 pcc: rework PCC mmap implementation

In the old PCC mmap implementation, it replaces the vm_file with
the file of the PCC copy, and then call ->fault() or
->page_mkwrite() on the PCC copy, after that restore the vm_file
with the one of the Lustre file.
This design exists problem as a mmaped region (vma) could be
faulted concurrently with multiple children threads (each children
threads can clone the VM of the parent process). There is no any
atomic guarantee for the replacement and restore the vm_file during
calling ->fault() or ->page_mkwrite().

This patch reworks the mmap() implementation for PCC.
In the new design, PCC mmap replaces the inode mapping of the PCC
copy on the PCC backend filesystem with the one of the Lustre file.
By this way, the mmaped region (vma) will link into the mapping of
the Lustre inode not the mapping of the PCC copy.
It keeps using vm_file with the file handle of the PCC copy until
the PCC cached file is detached or unmmaped.

LU-14003 pcc: convert mapping pagecache for mmap

In the PCC mmap implementation, it will replace the mapping of
the PCC copy with the one of the Lustre file when do mmap() to
make the mmapped region (vma) link into the mapping of the
Lustre file not the mapping of the PCC copy.
At this time, in the old design the pagecache in the original
mapping of the PCC copy is simply dropped as the mapping of each
page is different after the replacement of the mapping.

This may have negative impact on the mmap performance.
The reason is that during PCC attach it will write the data from
Lustre into PCC copy in buffered I/O mode, these data will keep
in pagecache and managed by the mapping of the PCC copy if there
is enough system memory. Then for the latter mmap, the page fault
could directly read data from the pagecache to speed up the mmap
operation.
If drop these pagecahe due to the different mapping of each pages,
the page fault must read page from the disk and may result in bad
performance.

To make full use of these pagecache of the PCC copy, during mmap
call, it can first remove the page from the original mapping of
the PCC copy, and then convert and add it into the mapping of the
Lustre file. By this way, all pagecaches are converted and can be
reused for the latter page fault.
Was-Change-Id: I1591937543d7d31b8811ec62088accd0070d7d37

EX-8421 llite: disable kernel readahead for pcc mmap

Set ra_pages to 0 for PCC files when mmaped, because
otherwise this setting carries through to Lustre and will
cause crashes and possible inconsistencies.  This happens
because the PCC file and Lustre file share a mapping, which
is a weird trick required to have mmap work on PCC.

Add a set of asserts which confirm kernel readahead is
disabled and wasn't used for mmap.
Was-Change-Id: I117042d68fac25158e8141c243acba698cf1930f

LU-17866 pcc: zero ra_pages explictly for a file after PCC mmap

To support mmap under PCC, we do some special magic with mmap to
allow Lustre and PCC to share the page mapping.
The mapping host (@mapping->host) for the Lustre file is replaced
with the PCC copy for mmap. This may result in the wrong setting
of @ra_pages for the Lustre file handle with the backing store of
the PCC copy in the kernel:
->do_dentry_open()->file_ra_state_init():
file_ra_state_init(struct file_ra_state *ra,
   struct address_space *mapping)
{
ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
ra->prev_pos = -1;
}

Setting readahead pages for a file handle is the last step of the
open() call and it is not under the control inside the Lustre file
system.
Thus, to avoid setting @ra_pages wrongly we set @ra_pages with
zero for Lustre file handle explictly in all read I/O path.

When invalidate a PCC copy, we will switch back the mapping
between Lustre and PCC. We also set mapping->a_ops back with
@ll_aops.
The readahead path in PCC backend may enter the ->readpage() in
Lustre. Then we check whethter the file handle is a Lustre file
handle. If not, it should be from mmap readahead I/O path of the
PCC copy and return error code directly in this case.
Was-Change-Id: Id1e4a9e47bb484e97053759e1743fd2fce040149

Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icc5019a691dfb04b5e1fdd580d83915cfe590158
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40092
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17464 lod: use OBD_ALLOC_LARGE for ldo_comp_entries 49/55449/3
Bobi Jam [Mon, 17 Jun 2024 10:10:05 +0000 (18:10 +0800)]
LU-17464 lod: use OBD_ALLOC_LARGE for ldo_comp_entries

The lod_object::ldo_comp_entries is allocated/free with _LARGE macros
so that it could be large enought to use vmalloc instead of kmalloc
for memory allocation. There are some places use OBD_ALLOC without
_LARGE to re-allocate memory which mismatch the assumption.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie356ae875329af07c893586fa4b1485dbd17afe6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55449
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-6142 lnet: Fix style issues for console.[ch] 48/55448/2
Arshad Hussain [Mon, 17 Jun 2024 08:29:19 +0000 (04:29 -0400)]
LU-6142 lnet: Fix style issues for console.[ch]

This patch fixes issues reported by checkpatch
for file lnet/selftest/console.[ch]

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I484b2ffee5d5add360055b424e23fdc97c5618ae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55448
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16822 tests: Modify local_node to check for IPv6 63/55463/4
Chris Horn [Mon, 17 Jun 2024 18:09:06 +0000 (12:09 -0600)]
LU-16822 tests: Modify local_node to check for IPv6

Nodes may be configured with just IPv6 addresses, so local_node()
needs to look for both IPv4 and IPv6 addresses to determine if a given
host is local.

sanity-lnet/110 is re-written so that it does not rely on
local_addr_list(). Otherwise the test may attempt to configure an NI
using an invalid address. This test case can now execute on o2ib
configs.

Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id3471376dbb2089a44b00ed7cb9bc2256e5e7501
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55463
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17404 kernel: fix filemap_splice_read detection 54/55454/3
Sebastien Buisson [Mon, 17 Jun 2024 14:29:33 +0000 (16:29 +0200)]
LU-17404 kernel: fix filemap_splice_read detection

On Centos 9 kernel 5.14, filemap_splice_read is in the header files,
but the symbol is not exported by the kernel.
So instead of trying to build a kernel module with a call to this
function, just use LB_CHECK_EXPORT on this symbol.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1f55d0b41c46a992204c1cebc3f5c8c7dbc6128e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55454
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 months agoLU-16822 tests: Force IPv6 testing in mixed environment 22/55422/9
Srikanth Ramamurthy [Mon, 17 Jun 2024 17:57:40 +0000 (11:57 -0600)]
LU-16822 tests: Force IPv6 testing in mixed environment

When an interface has both IPv4 and v6 addresses LNet will, by
default, configure the NI using the v4 address. The '--large' option
to lnetctl lnet configure tells LNet to configure the NI using the
v6 address instead. This patch adds a test-framework environment
variable that, when set, passes the --large option to lnet configure.
This allows us to force testing of IPv6 when running in a mixed v4/v6
environment.

This patch implements ip_is_v6() which is needed by some of the
router tests when using IPv6 NIDs.

Some test cases are added to the except list:
 - 230 requires lctl conn_list to be updated to work with large nids.
 - 303, and 500 have been found to trip LU-17460 bug.

Change-Id: I8934a87bfd836779b167df39c5d09d97ff78debf
Test-Parameters: trivial
Signed-off-by: Srikanth Ramamurthy <srramamu@microsoft.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55422
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoNew release 2.15.64 2.15.64 v2_15_64
Oleg Drokin [Tue, 25 Jun 2024 03:34:26 +0000 (23:34 -0400)]
New release 2.15.64

Change-Id: I0d760b4b58bd24b72e781c52465c07417725cffe
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9119 lnet: whitespace cleanup for wirecheck.c 46/55446/5
Olaf Weber [Mon, 17 Jun 2024 04:14:17 +0000 (00:14 -0400)]
LU-9119 lnet: whitespace cleanup for wirecheck.c

Clean up the whitespace use in lnet/utils/wirecheck.c

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
Change-Id: I5c90d09fd694c8151f6f11f716c491ac3db79eb0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55446
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17856 tests: ignore sanity stripe-count off-by-1 27/55427/3
Frederick Dilger [Fri, 14 Jun 2024 03:29:10 +0000 (23:29 -0400)]
LU-17856 tests: ignore sanity stripe-count off-by-1

In some cases the MDS may not create all stripes on a file, if the
MDT-OST connection does not have precreated objects. This is OK,
so the tests should not fail the stripe-count check if trying to
create a fully-striped file and one of the stripes is missing.

parse_layout_param was modified to change the output value of
stripe count to be $OSTCOUNT if the stripe_count=$OSTCOUNT - 1.

Even if the stripe_count was meant to be $OSTCOUNT - 1 this
shouldn't fail any tests as both tested values will be modified.

Test-Parameters: trivial testlist=sanity env=ONLY=56xd-56xe,ONLY_REPEAT=100
Test-Parameters: trivial testlist=sanity env=ONLY=65n,ONLY_REPEAT=100
Test-Parameters: trivial testlist=sanity env=ONLY=184d,ONLY_REPEAT=100
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Ie908a07d21b75e3ba60b7e6ca326675684ee2037
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55427
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17948 pcc: fix llapi_pcc_del() -Werror=enum-int-mismatch 17/55417/3
Jian Yu [Thu, 13 Jun 2024 00:07:07 +0000 (17:07 -0700)]
LU-17948 pcc: fix llapi_pcc_del() -Werror=enum-int-mismatch

gcc 13 does not allow mixing of enum and integer
types between function declaration and implementation.

This patch fixes the following build failures:
liblustreapi_pcc.c:755:5: error:
conflicting types for 'llapi_pcc_del' due to enum/integer mismatch;
have 'int(const char *, const char *, enum lu_pcc_cleanup_flags)'
[-Werror=enum-int-mismatch]
  755 | int llapi_pcc_del(const char *mntpath, const char *pccpath,
      |     ^~~~~~~~~~~~~

liblustreapi_pcc.c:790:5: error:
conflicting types for 'llapi_pcc_clear' due to enum/integer mismatch;
have 'int(const char *, enum lu_pcc_cleanup_flags)' [-Werror=enum-int-mismatch]
  790 | int llapi_pcc_clear(const char *mntpath, enum lu_pcc_cleanup_flags flags)
      |     ^~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-pcc

Change-Id: I2900b59a609410c6faab78d24f6176bc5c268e98
Fixes: 0d7d9ae ("LU-17657 build: gcc 13 stricter enum checking")
Fixes: c74878c ("LU-12373 pcc: uncache the pcc copies when remove a PCC backend")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55417
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17948 llite: replace i_mtime.tv_sec with inode_get_mtime_sec() 16/55416/2
Jian Yu [Wed, 12 Jun 2024 23:32:53 +0000 (16:32 -0700)]
LU-17948 llite: replace i_mtime.tv_sec with inode_get_mtime_sec()

This patch replaces i_mtime.tv_sec with inode_get_mtime_sec() to
fix the following build failure:

lustre/llite/pcc.c:1691:32: error:
'struct inode' has no member named 'i_mtime'; did you mean '__i_mtime'?
 1691 |         item.pm_mtime = inode->i_mtime.tv_sec;
      |                                ^~~~~~~
      |                                __i_mtime

Test-Parameters: trivial testlist=sanity-pcc

Change-Id: Iaed264c32be3d48039c5350ebd306f4fc3ef5eb9
Fixes: 3835f4d ("LU-13881 pcc: comparator support for PCC rules")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55416
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17947 build: fix LASSERTF [-Werror=format=] failure 15/55415/3
Jian Yu [Thu, 13 Jun 2024 11:44:46 +0000 (07:44 -0400)]
LU-17947 build: fix LASSERTF [-Werror=format=] failure

This patch fixes the following build failures:

libcfs/include/libcfs/libcfs_private.h:89:34:
error: format '%o' expects argument of type 'unsigned int',
but argument 4 has type 'long unsigned int' [-Werror=format=]
   89 | "ASSERTION( %s ) failed: " fmt, #cond, \
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~
lustre/ptlrpc/wiretest.c:2718:9: note: in expansion of macro 'LASSERTF'
 2718 | LASSERTF(MDS_FMODE_CLOSED == 000000000000UL, "found 0%.11oUL\n",
      | ^~~~~~~~

Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I97a895e6234721c34f681d0ee7ce91ead4dd30f8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55415
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17945 lnet: fix nla_extract_val() -Werror=missing-prototypes 13/55413/2
Jian Yu [Wed, 12 Jun 2024 21:38:10 +0000 (14:38 -0700)]
LU-17945 lnet: fix nla_extract_val() -Werror=missing-prototypes

This patch explicitly defines nla_extract_val() as a static function
to fix the following build failure:

lnet/lnet/api-ni.c:2888:1: error:
no previous prototype for 'nla_extract_val' [-Werror=missing-prototypes]
 2888 | nla_extract_val(struct nlattr **attr, int *rem,
      | ^~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-lnet

Change-Id: Ieb11d25ea8fcd19b715e2decf958cfd9d920bcc8
Fixes: 629d80d ("LU-10003 lnet: migrate fail nid to Netlink")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55413
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17940 gss: get rid of root key sooner 06/55406/5
Sebastien Buisson [Thu, 13 Jun 2024 09:19:04 +0000 (11:19 +0200)]
LU-17940 gss: get rid of root key sooner

The root key associated with a GSS context (gck_key) is used to pass
information between kernel and userspace during GSS context
negotiation.
Once the GSS context for root is up-to-date, the key is never used
again, although it has a permanent validity. And when the context
expires, the key is directly revoked and replaced with a new one to
serve the negotiation of a new root context.
So to avoid issues with keys staying in the root's kernel keyring and
being accidentally revoked, just get rid of the key associated with a
root context as soon as the negotiation process has finished.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4be773723b9046ed451684bd141d5ef2bc584bfb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55406
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-11671 tests: re-enable sanity test_45 for aarch64 03/55403/6
Xinliang Liu [Wed, 12 Jun 2024 10:24:53 +0000 (10:24 +0000)]
LU-11671 tests: re-enable sanity test_45 for aarch64

This is fixed by patch https://review.whamcloud.com/54763
 ("LU-17733 tests: sanity test_45 fix dirty count").

Test-Parameters: trivial
Test-Parameters: testlist=sanity clientarch=aarch64 \
  clientdistro=el9.3 env=ONLY=45,ONLY_REPEAT=100

Change-Id: I4716a0bee2689ffb33db8a81f1f33be6562b929e
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55403
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17000 utils: Initialize 'idgot' time before using 02/55402/2
Arshad Hussain [Wed, 12 Jun 2024 09:35:17 +0000 (05:35 -0400)]
LU-17000 utils: Initialize 'idgot' time before using

In case there is an error reading the contents of permission
file. gettimeofday() is correctly not called on 'idgot'.
However, this means that 'idgot' timeval is left uninitialized.
This patch Initialize 'idgot' timeval to 0 so that in case as
above the value is printed as zero and not garbage.

Test-Parameters: trivial
CoverityID: 397122 ("Uninitialized scalar variable")
Fixes: d5b26443 ("LU-16615 utils: add messages in l_getidentity")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie3d5dff1f02ede83690472e60cc14c12ec5d978a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17935 kfilnd: Cleanup debug logging 98/55398/2
Chris Horn [Fri, 10 May 2024 17:41:57 +0000 (11:41 -0600)]
LU-17935 kfilnd: Cleanup debug logging

Log messages that refer to a struct kfilnd_transaction should print
the pointer to the struct with "TN %p".

Assign kfilnd_transaction::msg_type in the kfilnd_process_rx_event
path so that debug messages show the correct message type.

HPE-bug-id: LUs-11325
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iabe3bf245b64f1eb66c85259072491c723fb6119
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55398
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17930 gss: node principal expectations 92/55392/4
Sebastien Buisson [Tue, 11 Jun 2024 10:40:26 +0000 (12:40 +0200)]
LU-17930 gss: node principal expectations

When a credentials cache exists for Kerberos, lgss_keyring looks into
it to find a valid entry. The cache's principal must match the
expected role for the GSS request being processed:
- LGSS_ROOT_CRED_MDT: expect "lustre_mds" principal;
- LGSS_ROOT_CRED_OST: expect "lustre_oss" principal;
- LGSS_ROOT_CRED_ROOT: expect "lustre_root" or "host" principal.
And there is the special case of the GSS request on the MGC, for which
by convention all 3 roles are applied at the same time.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4c46b03bb012c5f56bd26efdfaa6dab5fc7de31a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55392
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17929 ptlrpc: ptlrpc_request_alloc_pack() always returns an error code 91/55391/6
Aurelien Degremont [Tue, 11 Jun 2024 08:33:11 +0000 (10:33 +0200)]
LU-17929 ptlrpc: ptlrpc_request_alloc_pack() always returns an error code

Current code was always considering that when this function
returns NULL it meant ENOMEM error, but this is not always
true, especially when using GSS by example, or when
reconnecting from an IDLE state.
Also, instead of having every caller converting NULL to
ENOMEM, do that directly in the function when
appropriate.

Make ptlrpc_request_alloc_pack() return -errno in case
of error instead of a NULL pointer.

Thanks to that change, error code will be propagated up
and will help error reporting and debugging.

Took the opportunity to simplify related error path
for 2 HSM functions.

Also changed param.status to a signed data, as it can
store -errno.

Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: Id2b873d5f0c5cb89db070f6db00269545e6c85e8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55391
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17919 tests: wait to resolve ENOSPC in sanity 398l 59/55359/4
Patrick Farrell [Fri, 7 Jun 2024 15:49:40 +0000 (11:49 -0400)]
LU-17919 tests: wait to resolve ENOSPC in sanity 398l

Test 398l does not wait to clear up the ENOSPC it induces,
so sometimes it causes 398m to fail with ENOSPC.

Wait for deletes to resolve this.

OCI-bug-id: LFS-288

Note on test-parameters - we can't 'REPEAT' a pair of
tests, it would run 398l over and over and then 398m,
which doesn't test what we need to test.  So instead we
just create 5 sessions like this.

Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I2fcc1069a0304bc6edfa576331b6255289b71b98
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55359
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-16491 utils: update getdirstripe yaml format 46/55346/10
Frederick Dilger [Thu, 6 Jun 2024 20:00:25 +0000 (16:00 -0400)]
LU-16491 utils: update getdirstripe yaml format

'lfs getdirstripe --yaml' now prints directory layout in yaml
format. getdirstripe now also prints the "self" FID whether the
directory is striped or not. "migrating" fields
(lmv_migrate_offset, lmv_migrate_hash) were not included because
of the additional code complexity required to add the two fields.
The migrating fields are stored in 'struct lmv_mds_md_v1' which
AFAIK isn't available though getdirstripe.

For 0 striped directories, lmv_objects: will now contain
information on the directory itself, this information
becomes redundant with -v, however it is useful when the
lmv_fid isn't being shown.

New YAML layout:

    lmv_fid:           0x280000404:0x5:0x0
    lmv_magic:         0xcd20cd0
    lmv_stripe_count:  4
    lmv_stripe_offset: 2
    lmv_hash_type:     crush
    lmv_objects:
          - l_mdt_idx: 2
            l_fid:     0x280000400:0x4:0x0
          - l_mdt_idx: 0
            l_fid:     0x200000402:0x4:0x0
          - l_mdt_idx: 1
            l_fid:     0x240000402:0x2:0x0
          - l_mdt_idx: 3
            l_fid:     0x2c0000403:0x2:0x0

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I03ddc24816484d11c8c70892831e9edc9da5455a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>