Whamcloud - gitweb
fs/lustre-release.git
3 weeks agoLU-17900 llite: handle AT_GETATTR_NOSEC flag if present 96/55296/3
Bruno Faccini [Mon, 3 Jun 2024 14:47:51 +0000 (16:47 +0200)]
LU-17900 llite: handle AT_GETATTR_NOSEC flag if present

Starting with v6.7-rc1-1-g8a924db2d7b5, a new AT_GETATTR_NOSEC
flag can be passed in addition by vfs_getattr_nosec() to the
underlying FS getattr() interface routine.
So it must be handled/masked in ll_vfs_getattr() in order to avoid
to pass it back to vfs_getattr(), like already done by
ecryptfs/overlayfs and thus no longer get a warning/stack displayed.

Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Change-Id: I1d041913a6fc3ab9158fd611cb7d14dd1b7f694b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55296
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-6142 mgc: SPDX for management client 85/55285/2
Timothy Day [Sun, 2 Jun 2024 22:50:52 +0000 (22:50 +0000)]
LU-6142 mgc: SPDX for management client

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I24de13d3c859710e439b880afd1c6024c2da8937
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17675 tests: flush opencache in sanity-flr/61a 88/54788/5
Alex Zhuravlev [Mon, 15 Apr 2024 05:38:39 +0000 (08:38 +0300)]
LU-17675 tests: flush opencache in sanity-flr/61a

flush opencache to update MDS's atime with close RPC

Test-Parameters: trivial testlist=sanity-flr clientdistro=el9.3
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5f4d3400b3f772553ee6004ac271a4aa644699e0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54788
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-6142 llite: Fix style issues for llite_lib.c 40/54140/6
Arshad Hussain [Thu, 22 Feb 2024 06:23:20 +0000 (11:53 +0530)]
LU-6142 llite: Fix style issues for llite_lib.c

This patch fixes issues reported by checkpatch
for file lustre/llite/llite_lib.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I593c37a3dd19c9915c44e18033ce53dc965bbbda
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54140
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17711 osd-ldiskfs: do not delete dotdot during rename 23/54723/10
Li Dongyang [Wed, 17 Apr 2024 05:36:55 +0000 (15:36 +1000)]
LU-17711 osd-ldiskfs: do not delete dotdot during rename

Since upstream kernel commit v5.12-rc4-32-g6c0912739699
ext4_dir_entry_2 after rec_len will be wiped when deleting
the entry.

This creates a problem with rename, when we delete dotdot
first and if it's a dx dir, kernel will wipe entire dx_root
in the block after dotdot entry.

We can just update the dotdot entry in-place without deleting.

For dx dirs, ext4_update_dotdot() takes care of dotdot and
inserting dotdot is an update, use it for linear dirs also.

Rewrite ext4_update_dotdot() to get a few fixes:
*use ext4_read_dirblock to get the first block.
*do not assert on data read from disk, we check the dot and
dotdot entry and if anything looks wrong, we return -EFSCORRUPTED.
*make sure the change is journalled.
*set metadata_csum correctly for dx dirs.

Update ext4-data-in-dirent.patch, if dotdot entry has no space
for dirdata, try to expand the dotdot entry by moving the
entries behind it, or move the dx_root for dx dirs.

Add conf-sanity/154 to verify that the ".." entry was updated
properly after restore, including with an htree split directory
with dx_root entry.

Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Change-Id: I33e862739fa44f583aaa4369190d6d80271db13b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54723
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16822 lnet: use proper Netlink flags for setup 29/55129/5
James Simmons [Fri, 31 May 2024 17:33:46 +0000 (11:33 -0600)]
LU-16822 lnet: use proper Netlink flags for setup

The Netlink flags sent to lnet_net_conf_cmd() were incorrect.
You can't use both NLM_F_EXCL and NLM_F_REPLACE together. If you
think about it these flags are opposites. Together this flags
also equal NLM_F_DUMP which the kernel doesn't support for this
operations so it failed with EOPNOTSUPP which tells user land
to use the old API so the failure wasn't easily detected.
We replace NLM_F_REPLACE with NLM_F_APPEND to avoid this
issue. Also for some reason lct_version gets stomped on
so we can't use it.

Fixes: ab6c8bd18e1 ("LU-16822 lnet: always initialize IPv6 at start up")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I6b9eb013f6fc10276e91848d7b5f17d406fbbdb4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55129
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-16822 tests: Setup IPv6 with fake network device 99/53599/3
James Simmons [Fri, 5 Jan 2024 14:00:24 +0000 (07:00 -0700)]
LU-16822 tests: Setup IPv6 with fake network device

Several of the LNet sanity test create a fake network device and
setup an IP. Only a IPv4 was setup so also setup a IPv6 address
to increase the testing coverage for large NID support.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: If29adf74f1fe6449ad3f48663c2872a39bf4664c
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53599
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-17871 ldlm: FLOCK ownlocks may be not set 84/55184/7
Andriy Skulysh [Wed, 3 Apr 2024 10:34:32 +0000 (13:34 +0300)]
LU-17871 ldlm: FLOCK ownlocks may be not set

Conflict checking loop should continue until ownlocks is set.
Ownlocks variable is essential for lock merges.

Change-Id: Ied526581dd7d4f100c95f2fe582d117a87a8a584
Fixes: b07a57027e (LU-15402 ldlm: speedup RD flock enqueue)
HPE-bug-id: LUS-12243
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17881 build: fix _flavor definition 25/55225/2
Caleb Carlson [Tue, 28 May 2024 17:40:56 +0000 (11:40 -0600)]
LU-17881 build: fix _flavor definition

Allow user to override _flavor definition.
Move ordering of _kver definition to be fully
defined before using it to define _flavor.
This prevents _flavor from getting defined with
the kernel patch version field.

Signed-off-by: Caleb Carlson <caleb.carlson@hpe.com>
Test-Parameters: trivial
HPE-bug-id: LUS-12267
Change-Id: Ibd4db360d8c16f487453593cb0a9fd2a6a5a8c62
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17877 lnet: export REGISTER_FUNC with EXPORT_SYMBOL_GPL 17/55217/3
Rebanta Mitra [Mon, 27 May 2024 07:57:13 +0000 (00:57 -0700)]
LU-17877 lnet: export REGISTER_FUNC with EXPORT_SYMBOL_GPL

This patch exports REGISTER_FUNC and UNREGISTER_FUNC
with EXPORT_SYMBOL_GPL to load GPL-licensed modules.

Test-Parameters: trivial

Change-Id: I3a0d4e2b27911af36e210692d28892590eb0371c
Signed-off-by: Rebanta Mitra <rmitra@nvidia.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55217
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-17343 utils: added --path option for lctl list_param 02/55202/3
Frederick Dilger [Sat, 25 May 2024 23:23:20 +0000 (19:23 -0400)]
LU-17343 utils: added --path option for lctl list_param

Added 'lctl list_param [-p] PARAM' option that prints the
actual pathname(s) for PARAM instead of the parameter names(s).
This should allow users to "resolve" PARAM pathnames so that they
can be used directly, which avoids having to hard code them. Also
renamed "po_only_path" and "po_show_path" to be "po_only_name" and
"po_show_name" to avoid confusion with "po_only_pathname" for the new
option.

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I2259b930f3ac5cc46ac7a9a36218a44fa110157c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55202
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17699 utils: new --skip option for lfs find 00/55200/5
Frederick Dilger [Sat, 25 May 2024 21:46:08 +0000 (17:46 -0400)]
LU-17699 utils: new --skip option for lfs find

Added [--skip | -k] options for 'lfs find' to skip a percentage of all
files. This brings the benifit of allowing a certain percentage of the
files in the scanned directory to be migraated to new OSTs instead of
migrating entire directory trees. Note that the file size is not taken
into account when skipping files which could result in an unbalanced
set of files being returned. If there are fewer than 25 files there
could be an increased margin of error with the results, however it
should still be relatively negligable (at most 10%).

Planned to implement further utility with --skip-rebalance which would
calculate the pertentage of files that needed to be return vs. skipped
based on the fullness ratio of each OST vs. the avergae fullness of a
balanced filesystem to avoid the user having to calculate the skip
percentage themselves.

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I3ff1600f25f3be54f2a353fa78f7b8b7f98f591a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55200
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17873 test: ignore WIFSIGNALED if rc is 0 94/55194/3
Hongchao Zhang [Sat, 20 Apr 2024 17:53:11 +0000 (01:53 +0800)]
LU-17873 test: ignore WIFSIGNALED if rc is 0

Ignored the checking resulst of WIFSIGNALED if the return status
of the "lctl test_create" thread is zero.

Test-Parameters: trivial envdefinitions=SLOW=yes,DEBUG_SIZE=64 mdtcount=1 testlist=mds-survey,mds-survey,mds-survey,mds-survey,mds-survey,mds-survey
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ifc3727d48010c9f00f38baff9ff91b5cc3afce5c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55194
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-9859 libcfs: move crypto wrappers to lnet 87/55187/5
James Simmons [Fri, 24 May 2024 00:07:21 +0000 (20:07 -0400)]
LU-9859 libcfs: move crypto wrappers to lnet

The crypto wrappers in libcfs is one of the last item that is not
debugging related in the module. We can move it to LNet which
moves us closer to libcfs being just a debugging module.

Test-Parameters: trivial
Change-Id: Idbc058fe2cafc04e4300a576e3368c0961ce98a4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55187
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-15644 llog: don't replace llog error with -ENOTDIR 51/55151/2
Mikhail Pershin [Sat, 18 May 2024 19:43:05 +0000 (22:43 +0300)]
LU-15644 llog: don't replace llog error with -ENOTDIR

The dt_try_as_dir() contains check for object existence
which is reported as -ENOTDIR after all. In case of llog
that goes to upper level and cause error reporting to
console. It is not relevant neither by error code nor by
debug level

Patch skips check for object existence in case of llog,
it is excessive anyway.
Debug level is reduced as well to don't spawn console
messages in case of -ENOENT, -ESTALE or -EIO errors

Fixes: 1ebc9ed460 ("LU-15902 obdclass: dt_try_as_dir() check dir exists")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id404204566898a6ac2e258b7824491effc5fc92e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55151
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17848 dt: cleanup dt_object.h header 26/55126/4
Timothy Day [Sat, 18 May 2024 16:54:37 +0000 (16:54 +0000)]
LU-17848 dt: cleanup dt_object.h header

Cleanup a number of LASSERT statements to unify style.

Use kernel doc style instead of the old Doxygen style. Avoid
using ** for comments that aren't kernel doc.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia23492534a05bce4850ca38ab7c06a07000504d3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55126
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16904 tests: Fix sanity test 248c and 360 when PFL layout is used 18/54918/3
Wei Liu [Thu, 25 Apr 2024 18:35:13 +0000 (11:35 -0700)]
LU-16904 tests: Fix sanity test 248c and 360 when PFL layout is used

For 248c, use stripe counct 1 for root before the readahead issue described
in LU-17755 and LU-15155 get fixed
For 360, use stripe count 1 for test dir to make sure the files created under
it have object greater than 1M on single OST to test delayed iput

Test-Parameters: trivial
Test-Parameters: testlist=sanity env=fs_STRIPEPARAMS="-E 1M -c1 -E eof" env=ONLY="248c,360"
Test-Parameters: testlist=sanity env=fs_STRIPEPARAMS="-E 64k -c 1 -E eof" env=ONLY="248c,360"
Test-Parameters: testlist=sanity env=fs_STRIPEPARAMS="-E 64k -c 1 -E eof -c 2" env=ONLY="248c,360"
Test-Parameters: testlist=sanity env=fs_STRIPEPARAMS="-E 64k -c 1 -E 1M -c 2 -E eof -c 4 -S 4M" env=ONLY="248c,360"

Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I93341001714c5d0942f2f8f2895ca8bb545dc344
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54918
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-17710 llite: protect parallel accesses to lli_*id 27/54727/5
Etienne AUJAMES [Wed, 10 Apr 2024 17:08:49 +0000 (19:08 +0200)]
LU-17710 llite: protect parallel accesses to lli_*id

OSC obtains process uid/gid/jobid from the ll_inode_info. This can be
racy if several processes access the same file. This can lead to
corrupted or incoherent set of values.

This patch replaced the fields lli_jobid/lli_uid/lli_gid by a common
"struct job_info lli_jobinfo" field.

struct job_info {
       char ji_jobid[LUSTRE_JOBID_SIZE];
       __u32 ji_uid;
       __u32 ji_gid;
};

The accesses are protected by a seqlock (lli_jobinfo_seqlock).

Additionally, this saves and restores process uid/gid values for
readahead works (cra_jobid is replaced by cra_jobinfo).

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Idf01c1e4b533aea405c3a4439c0df0fcfc4dea56
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54727
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17710 obdclass: background jobid garbage collection 26/54726/3
Etienne AUJAMES [Wed, 10 Apr 2024 12:16:41 +0000 (14:16 +0200)]
LU-17710 obdclass: background jobid garbage collection

The jobid pidmap garbage collection is done directly in
lustre_get_jobid()/jobid_get_from_cache() every 5 min.

This patch run the garbage collection in background with a "delayed
work" handler.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I5719e278ec6bde0f8c15fd2e3fe9757c714747c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54726
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17718 obdclass: potential string overflow upcall_cache.c 10/54710/6
Sebastien Buisson [Tue, 9 Apr 2024 13:00:41 +0000 (15:00 +0200)]
LU-17718 obdclass: potential string overflow upcall_cache.c

Use strncpy() in upcall_cache_set_upcall() to quiet Coverity warning.
And reorganize the function so that the code flow is more linear in
the success case.

CoverityID: 424705: ("String overflow")

Fixes: 2153e86541 ("LU-17497 obdclass: check upcall incorrect values")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1aee77f78c92c6c571dfe358435a2733cc3ba9d9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54710
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 weeks agoLU-7665 test: improve sanity 300p 25/54625/2
Lai Siyao [Mon, 18 Mar 2024 05:13:14 +0000 (01:13 -0400)]
LU-7665 test: improve sanity 300p

Sanity test 300p set OBD_FAIL_OUT_ENOSPC once, but it may fail llog
operation (not critical), therefore subsequent mkdir succeeds. Change
the fail_loc to always fail so the test can be more robust.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I128ce39aaf97e1785a8c135a696d0b404b48a2a8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54625
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-11085 ldlm: optimise extent locks with identical extent 87/54587/17
Mr NeilBrown [Thu, 28 Mar 2024 23:15:55 +0000 (19:15 -0400)]
LU-11085 ldlm: optimise extent locks with identical extent

Many locks with identical extent is (apparently) common.  Rather than
putting all of these locks in the extent tree, possibly making it much
bigger than needed, link them all together with only one in the extent
tree.

When removing the one in the extent tree, if there are others, one
of those must be placed in the tree where the original was.
extent_replace() does this. It could be in generic code.

A new extent_insert_unique() is added.  Ideally this would be provided
by the standard interval_tree code.

As extent_insert() is now not used, INTERVAL_TREE_DEFINE is told to
make all functions 'static inline' so we don't get warnings about the
unused function.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9f8433514f8451abc80bbb6050499599e0f93520
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54587
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
3 weeks agoLU-17462 build: make some deb packages optional 88/53788/10
Shaun Tancheff [Sat, 3 Feb 2024 07:21:03 +0000 (14:21 +0700)]
LU-17462 build: make some deb packages optional

make building the utils, tests and iokit packages optional.

Also mpi is option in the --disable-mpitests

If --disable-mpitests or --disable-tests are disable the mpi
package dependancies should also be dropped.

Test-Parameters: trivial
HPE-bug-id: LUS-12091
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Icd232571f7052ec0a4b25c32ff573c3b5f76de21
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53788
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
3 weeks agoLU-17233 dkms: support for kfilnd and gnilnd 56/52856/11
Shaun Tancheff [Sat, 25 May 2024 17:12:10 +0000 (11:12 -0600)]
LU-17233 dkms: support for kfilnd and gnilnd

dkms should try to build build kkfilnd if kfi support is
detected. Similarly kgnilnd can be built if perquisites
can be found.

Test-Parameters: trivial
HPE-bug-id: LUS-12070, LUS-11893, LUS-11902
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie4b0957e7a0eda4f25ae96a12619baae6d6d170a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52856
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17509 dkms: SUSE15 depends on unavailable libmount 45/53945/6
Shaun Tancheff [Wed, 7 Feb 2024 09:53:13 +0000 (16:53 +0700)]
LU-17509 dkms: SUSE15 depends on unavailable libmount

SUSE renamed libmount in to libmount1 however the requiement
is statisifed by libmount-devel which in turn requires the
appropriate libmount package.

Drop the explicit libmount requirement from lustre-dkms.spec

Test-Parameters: trivial
HPE-bug-id: LUS-12141
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ifdee172483b73f9f66eb97883851febf94134309
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53945
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
3 weeks agoLU-17461 dkms: improve /etc/sysconfig/lustre 87/53787/8
Shaun Tancheff [Tue, 6 Feb 2024 18:48:44 +0000 (01:48 +0700)]
LU-17461 dkms: improve /etc/sysconfig/lustre

Expand the features available in /etc/sysconfig/lustre
to enable more flexability to dkms users.

Providing y/n switches for common features:
    LUSTRE_DKMS_ENABLE_GSS=y/n
    LUSTRE_DKMS_ENABLE_GSS_KEYRING=y/n
    LUSTRE_DKMS_ENABLE_CRYPTO=y/n
    LUSTRE_DKMS_ENABLE_IOKIT=y/n

As well as a catch-all to pass to configure:
    LUSTRE_DKMS_CONFIGURE_EXTRA='string passed to configure'

Add suport for dpkg checking for libkrb5-dev to enable or
disable gss by default, if it is not otherwise specifed.

Test-Parameters: trivial
HPE-bug-id: LUS-12097
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Id8dd17c867d9aeb1ec27632729433ba128dcfd0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53787
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-13881 pcc: comparator support for PCC rules 85/39585/20
Qian Yingjin [Thu, 6 Aug 2020 08:29:21 +0000 (16:29 +0800)]
LU-13881 pcc: comparator support for PCC rules

There are increasing requirements for PCC rules to add comparator
support:
- File data larger or smaller than certain threshold should not
  auto cache in PCC (i.e. larger than the capacity of PCC backend
  on a client).
- Users can specify a range of UID/GID/ProjID for auto caching on
  PCC when define a rule;

In addition to the original equal (=) operator, this patch also
adds greater than (>) and less than (<) comparison operators.

The following rule expressions are supported:
- "projid={100}&size>{1M}&size<{500G}"
- "projid>{100}&projid<{110}"
- "uid<{1500}&uid>{1000}"

EX-2872 pcc: mtime rule for 'lctl pcc add'

Add an "mtime>N" rule to allow skipping files for PCC-RO auto-attach
if they were created or modified more than N seconds ago.  Otherwise,
it may be that files are added to the PCC cache before they finished
writing, or if they will be modified again quickly after creation.
Was-Change-Id: Ibb99bff5b483717ae6e5b83f82f1bcd86c3ebbe5

This patch disabled sanity-pcc/test_33 on rhel9.3 kernel until the
inconsistent LSOM problem is solved.

EX-bug-id: EX-2872
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I9f024eb6903f5652ba3cf04fa289456803493b2c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39585
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-12373 pcc: uncache the pcc copies when remove a PCC backend 52/38352/25
Qian Yingjin [Fri, 14 Jun 2019 09:29:55 +0000 (05:29 -0400)]
LU-12373 pcc: uncache the pcc copies when remove a PCC backend

Currently when remove a PCC backend from a client, it does not
make any special handling for previously cached files at all.
Users can still use PCC caching service for these files. This
may not what users want. The reason is as follows:

1) For RW-PCC cached files, it does not restore the data back
into Lustre OSTs of the main filesystem. Although the PCC
backend falls back as a tranditional HSM storage solution
since the lhsmtool_posix copytool is still running at this
client. But this is dangerous, and likly to cause user data
to be lost if the PCC device may be permanently unavailable.

2) The space used by these PCC cached files may not released.

In this patch, when remove a PCC backend from a client, the
default action is to scan the PCC backend fs, uncache
(detach and remove) the PCC copy from PCC by FID.

We also add an option "--keep|-k" for PCC backend removal.
It behaves as before, just remove the PCC backend, but
retain the data on the cache.

This patch also introduces a common library to scan the HSM
backend.

EX-2579 pcc: support a flatter HSM archive format

Add versioning (v1 and V2) to the HSM (PCC) archive format (directory
layout):
v1: (oid & 0xffff)/-/-/-/-/-/FID
v2: ((oid ^ seq) & 0xffff)/FID

v1 is the original layout and the default. v2 is the new layout which
should be selected for new installs.
Was-Change-Id: If660f3cf4c02469bb23e65a44f86f0346367adf6

LU-12373 pcc: delete stale PCC copy when remove PCC backend

By default, when removing a PCC backend from a client, the action
is to scan the PCC backend FS, uncache (detach and remove) all
scanned PCC copies from PCC by FIDs.

However, during the tests, we found that some old stale PCC copies
are not removed when an adminstrator runs "lctl pcc del|clear".
The reason is that these PCC copies are already detached from PCC
when running the commands.

This patch fixes this bug: when removing a PCC backend from a
client, it will also delete all non-cached PCC copies from PCC
backend to free up the space.
Was-Change-Id: Id829abe7e6cb1294e6baea76452f4a9178711451

EX-bug-id: EX-2579
Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib4db36137c025fd78c7022c8b8c39b63e3b9ad4d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 weeks agoLU-10918 pcc: auto RO-PCC caching when O_RDONLY open files 46/38346/33
Qian Yingjin [Wed, 22 Aug 2018 13:19:48 +0000 (21:19 +0800)]
LU-10918 pcc: auto RO-PCC caching when O_RDONLY open files

During the file open() operation, if the file is being opened with
O_RDONLY flags, and the file matches the predefined rule, it will
be prefetched and attached into RO-PCC automatically.

Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ib2c2ab51d67aed84eb7676c8df191faa33dfad39
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
3 weeks agoLU-10499 pcc: test interoperability with PCC-RO 86/54386/14
Qian Yingjin [Thu, 25 Mar 2021 02:44:16 +0000 (10:44 +0800)]
LU-10499 pcc: test interoperability with PCC-RO

For Lustre 2.15.0 servers, it fails many of subtests that are
PCC-RO specific.
In this patch, each subtest related to PCC-RO adds an connect
flag check and skip it when run against old servers without
PCC-RO support.

EX-4006 pcc: make "pccro=1" default

To avoid a risk that users will accidentally configure PCC-RW and
potentially lose data if those client nodes go offline, this patch
makes "pccro=1" default for PCC backends.

This patch adds a new option "--w|--write" for PCC-RW cache
mode when attach a file.
Also It makes "--r|--readonly" as a default option for PCC attach
command.
Was-Change-Id: I56735b0ebe8f0d9ef22b3f7e39e8cccfa3aad443

EX-8739 tests: skip sanity-pcc tests on el9.3

Skip sanity-pcc test_6, test_7a/7b, test_23, test_35 on RHEL9.3
clients due to continuous failures with PCC-RW, which is unused.

Skip sanity-pcc test_102 due to el9.3 fio io_uring bug.
Was-Change-Id: I76cbd0342788fff8b0167c0656e941f96d73fc48

EX-bug-id: EX-2860 EX-4006 EX-8739
Test-Parameters: clientdistro=el9.3 serverversion=EXA6 serverdistro=el8.8 testlist=sanity-pcc
Test-Parameters: clientdistro=el8.9 serverversion=EXA6 serverdistro=el8.8 testlist=sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ie4fc41b2dc51a038027009fbcc6e86f9d61cd54f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17743 o2iblnd: fix privileged port check in passive_connect 66/55266/2
Serguei Smirnov [Thu, 30 May 2024 18:14:09 +0000 (11:14 -0700)]
LU-17743 o2iblnd: fix privileged port check in passive_connect

Check that the port is in "privileged" range only if
kib_require_priv_port is set

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 9b18afa ("LU-17743 ko2iblnd: move to struct lnet_nid")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I3ed9c174d983be68aecc4b8e12aaae7c096d26e8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55266
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-17250 mgs: fix resource leak in name_create_osp 38/55238/2
Etienne AUJAMES [Wed, 29 May 2024 19:32:27 +0000 (21:32 +0200)]
LU-17250 mgs: fix resource leak in name_create_osp

This patch fixes a resource leak detected by Coverity:

CID 425355:    (RESOURCE_LEAK)
/lustre/mgs/mgs_llog.c: 189 in name_create_osp()

Fixes: d4682ff ("LU-17250 mgs: generate a new MDT configuration by copy")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I8e0cbc3507e5a9882b2cfadfd68aea318575fc7a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55238
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17869 llapi: Fixed function header comments 80/55180/3
Rajeev Mishra [Wed, 22 May 2024 23:41:34 +0000 (23:41 +0000)]
LU-17869 llapi: Fixed function header comments

Updated input parameter descriptions for
`llapi_layout_v2_sanity` and `llapi_layout_sanity` functions.

Test-parameters: trivial
Fixes: ee7dfc5ad1 ("LU-17025 llapi: Verify stripe pool name")
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: I72f4973d8be70ad60d088ea0e18d1e961f01cd50
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55180
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
3 weeks agoLU-15781 ldiskfs: support 5.15.0-106+ ubuntu kernels 78/55078/3
James Simmons [Wed, 15 May 2024 14:22:33 +0000 (10:22 -0400)]
LU-15781 ldiskfs: support 5.15.0-106+ ubuntu kernels

Starting with 5.15.0-106 kernels the ext4-prealloc patch no
long applies. Update ext4-prealloc.patch so it can build
again.

Test-Parameters: trivial
Change-Id: I958c64842c5e1dc8b974e8a188fa18541d458ab5
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17356 quota: fix qmt_pool_new_conn 19/55019/3
Sergey Cheremencev [Mon, 6 May 2024 13:05:36 +0000 (16:05 +0300)]
LU-17356 quota: fix qmt_pool_new_conn

Wrong argument passed into qmt_dom from
qmt_pool_new_conn caused a panic:

  qmt_sarr_get_idx()) ASSERTION( arr_idx <
    qpi->qpi_sarr.osts.op_count && arr_idx >= 0 )
    failed: idx invalid 0 op_count 0

Add conf-sanity_33d that reproduces above
assertion without the fix.

Fixes: 67f90e4288 ("LU-17034 quota: lqeg_arr memmory corruption")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I48801f1fb7e69097cbfbe083f1d31a4639d4bf4d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55019
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17653 sec: avoid unlocking unlocked page 34/54734/6
Shaun Tancheff [Thu, 11 Apr 2024 14:03:30 +0000 (22:03 +0800)]
LU-17653 sec: avoid unlocking unlocked page

If a page is unlocked by @cl_2queue_disown after explictly write the
newly modified page, the following page_unlock must not be performed.

Track the page locked state and do not unlock pages which
are not locked in ll_io_zero_page()

Fixes: adf46db962 ("LU-12275 sec: support truncate for encrypted file")
Test-Parameters: testlist=sanity-sec clientdistro=el9.3 env=ONLY=48a,ONLY_REPEAT=10
Test-Parameters: testlist=sanity-sec clientdistro=el9.3
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I6e121920c7e86e4d0004def77b0ce066ae2ba81a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-10003 lnet: migrate fail nid to Netlink 51/55051/5
James Simmons [Thu, 23 May 2024 21:25:30 +0000 (17:25 -0400)]
LU-10003 lnet: migrate fail nid to Netlink

We have the ability to make peers fail when they reach a specific
threshold using an ioctl that currently only uses small NIDs.
Move to Netlink to be able to use large NIDs. Also the Netlink
code is written to support more than one peer at a time even if
the original user land tool only supports setting one peer at a
time.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I8e5b38fcb582624530d208fac731183488662138
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55051
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17450 test: disable test 56x 56xa 56xb in sanity 62/54962/3
Hongchao Zhang [Mon, 15 Apr 2024 19:48:38 +0000 (03:48 +0800)]
LU-17450 test: disable test 56x 56xa 56xb in sanity

Add the interop tests 56x, 56xa, 56xb into always_except before
the patch https://review.whamcloud.com/53997 in LU-17525 is landed.

Test-Parameters: trivial testlist=sanity
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I99fa7be9dc7f50113d463aea4b321502b31d7348
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54962
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16025 llite: allow unaligned DIO reaching EOF 18/54718/10
Bobi Jam [Wed, 10 Apr 2024 09:19:53 +0000 (17:19 +0800)]
LU-16025 llite: allow unaligned DIO reaching EOF

Direct IO requires file offset and iov_iter count be page aligned, if
server does not support unaligned DIO.

For old servers, they do not have OBD_CONNECT2_UNALIGNED_DIO support,
and be deemed as not supporting unaligned DIO.

Since mirror resync would use direct IO to read data from a mirror,
and if the file size is not page aligned, the last read iov_iter
would be truncated by commit 4468f6c9d9 and would contain unaligned
iov_iter count, so it would fail with old servers.

This patch fixes this interop issue by allowing unaligned DIO
reaching the end of the file.

Test-Parameters: testlist=sanity-sec serverversion=EXA6
Fixes: 7194eb6431 ("LU-13805 clio: bounce buffer for unaligned DIO")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I229e193c3f0df0c21284991809573e312d18a556
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54718
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17676 build: configure should prefer to ask if 71/54571/2
Shaun Tancheff [Tue, 26 Mar 2024 08:04:24 +0000 (15:04 +0700)]
LU-17676 build: configure should prefer to ask if

In general configure messages should ask 'if <something>'
as configure is asking the question and trying to automatically
determine the answer.

If most cases prefer 'if <something>'

This updates configure messages to ask if ...

Test-Parameters: trivial
HPE-bug-id: LUS-12117
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I11a42583faf2f88194c93a9aeea3b64f0d95f0eb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17836 build: allow builds without libpthread 62/55062/2
Shaun Tancheff [Thu, 9 May 2024 10:10:54 +0000 (17:10 +0700)]
LU-17836 build: allow builds without libpthread

Configure currently allows for --disable-libpthread it is not
frequently used but may be needed for some users.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I32049bab8e0f278b4c80fe37839c8c90c45d4c74
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55062
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16327 build: Ubuntu jammy 5.19 server support 25/50225/18
Shaun Tancheff [Sat, 2 Mar 2024 13:08:15 +0000 (20:08 +0700)]
LU-16327 build: Ubuntu jammy 5.19 server support

Ubuntu 5.19 server ldiskfs series is close to the
with mainline LTS 6.1.38 kernel.

Updated for Jammy 5.19.0-46-generic kernel
   ext4-mballoc-extra-checks.patch
   ext4-prealloc.patch

Tested with Ubuntu-hwe-5.19-5.19.0-46.47_22.04.1
Ubuntu Jammy 5.19.0-46-generic kernel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iff2f3b29a7cf4778abb69505143ca2ea32022edf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 weeks agoLU-15496 tests: fix sanity/398c to use proper OSC name 32/55132/4
Andreas Dilger [Thu, 16 May 2024 19:57:42 +0000 (21:57 +0200)]
LU-15496 tests: fix sanity/398c to use proper OSC name

For ppc64le and aarch64 clients, the OSC import instance name does
not have "ffff" at the start, so use the proper device name for this
subtest.

Clean up the rest of test_398c to meet modern test code style.

Test-Parameters: trivial testlist=sanity env=ONLY=398c clientarch=ppc64le clientdistro=el8.8
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8c72fa9b13eace009f39daf82454221eba6761b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55132
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 target: remove some LCONSOLE_ERROR_MSG() 13/55113/2
Timothy Day [Wed, 15 May 2024 03:41:59 +0000 (03:41 +0000)]
LU-17844 target: remove some LCONSOLE_ERROR_MSG()

Replace LCONSOLE_ERROR_MSG() with LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8de0221e29c8ec70759eea38a67001f283f6fe39
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55113
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17848 osd: deduplicate osd_fid_init()/fini()/alloc() 10/55110/6
Timothy Day [Tue, 14 May 2024 20:59:16 +0000 (20:59 +0000)]
LU-17848 osd: deduplicate osd_fid_init()/fini()/alloc()

These functions are identical in the two OSD implementations. This
can be moved to lustre/fid/ and made generic.

These functions are forced to live in fid.ko rather than obdclass.ko
due to module dependency issues.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If97ca5615d9bdfe0fe9886686e9ce3ec2b740f7d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17850 build: prefer LINUXRELEASE over uname -r 08/55108/2
Jian Yu [Tue, 14 May 2024 17:50:20 +0000 (10:50 -0700)]
LU-17850 build: prefer LINUXRELEASE over uname -r

In a container or chroot environment, "uname -r" reports
the host instead of the target kernel version. We should
use the LINUXRELEASE variable which is configured in
config/lustre-build-linux.m4 with the value from UTS_RELEASE.

Change-Id: Iaa48027f5ae873e1298695a264db1c351d9eac5c
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: ake sandgren <ake.sandgren@hpc2n.umu.se>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17848 osd: purge key_rec() from dt API 99/55099/2
Timothy Day [Tue, 14 May 2024 02:19:28 +0000 (02:19 +0000)]
LU-17848 osd: purge key_rec() from dt API

This is a pointless function pointer field that has
spawned a number of pointless function implementations.
Even the documentation has no idea why this exists.

Remove everything to do with key_rec().

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7f84853a3fa285bf2ac53661b30384d099be1b91
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 lnet: remove a few LCONSOLE_ERROR_MSG() in api-ni.c 98/55098/2
Timothy Day [Tue, 14 May 2024 01:23:34 +0000 (01:23 +0000)]
LU-17844 lnet: remove a few LCONSOLE_ERROR_MSG() in api-ni.c

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I629ef2ceaa51dc1422d87dc056de2c46079438c0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 lnet: remove a few LCONSOLE_ERROR_MSG() 97/55097/2
Timothy Day [Tue, 14 May 2024 01:12:05 +0000 (01:12 +0000)]
LU-17844 lnet: remove a few LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I6c1d46449487127545d785a9fdc368005197d3e2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17847 sec: wake up for rsc entry 94/55094/2
Yang Sheng [Mon, 13 May 2024 14:44:16 +0000 (22:44 +0800)]
LU-17847 sec: wake up for rsc entry

We should wake up the waiter after rsc do_upcall.
Otherwise it may be stuck for a long time.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I87d1e5a9687056c8ee2428aad45dafda16247de2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 lnds: remove a few LCONSOLE_ERROR_MSG() 85/55085/3
Timothy Day [Mon, 13 May 2024 03:51:16 +0000 (03:51 +0000)]
LU-17844 lnds: remove a few LCONSOLE_ERROR_MSG()

I doubt these magic numbers help anyone.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7c2505ec0eb7fc6524a13d4bf330a72188a26b4e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55085
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16518 lnet: fix incorrectly initialized variables 84/55084/2
Timothy Day [Mon, 13 May 2024 03:39:08 +0000 (03:39 +0000)]
LU-16518 lnet: fix incorrectly initialized variables

Clang 12 complained about an uninitialized 'off' in
brw_test.c, fixed by removing the dual declaration.

Also, init 'rc' in yaml_import_global_settings().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I893149110120975c91839e73241b311a53c6e195
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55084
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17490 tests: update .gitignore 82/55082/2
Timothy Day [Mon, 13 May 2024 03:27:11 +0000 (03:27 +0000)]
LU-17490 tests: update .gitignore

Otherwise, we'll see this monitor_lustrefs binary in the
build tree.

Fixes: 7101742 ("LU-17490 tests: verify fanotify works for lustre")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I129c12515e607e97ab42917220a439ebb1823e8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17646 llapi: lustreapi: add FID in error messages 74/55074/2
Alexandre Ioffe [Sat, 11 May 2024 01:28:05 +0000 (18:28 -0700)]
LU-17646 llapi: lustreapi: add FID in error messages

Use llapi_fd2fid() to print FID in llapi_lease_set() and
llapi_lease_check() error messages.

Test-Parameters: trivial
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iac97ea721860652e304c674007ac7646d183e2fd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17841 kfilnd: Race between hello and tagged RMA 72/55072/2
Chris Horn [Fri, 3 May 2024 19:22:12 +0000 (13:22 -0600)]
LU-17841 kfilnd: Race between hello and tagged RMA

A race exists between processing an incoming hello and initiating the
RMA for bulk operations that can result in RKEY re-use.

Initiator:
Posts tagged receive with RKEY based on peerA::kp_local_session_key X
and tn_mr_key Y
Bulk request (1) sent to target
Some earlier transaction fails:
 - Deletes peerA::kp_local_session_key X
 - Creates peerA::kp_local_session_key Z
 - HELLO request send to peerA

Target:
Processes HELLO request - updates kp_remote_session_key from X to Z.
Handles bulk request (1)
Performs RMA using session key Z and tn_mr_key Y, but completion is
delayed

Initiator:
Bulk request (1) hits timeout
 - Tagged receive canceled, and tn_mr_key Y is released
Posts tagged receive with RKEY based on peerA::kp_local_session_key Z
and tn_mr_key Y
Bulk request (2) sent to target

Target:
RMA for (1) is completed using the RKEY for (2)

The solution is to create a new bulk request message that contains
the session key used to set up the tagged buffer on the initiator.
This is compared against the session key exchanged during hello
handshake prior to initiating the RMA. If there's a mismatch
then the RMA is failed and the transaction is finalized. The session
key stored in the new bulk request is also used to generate the RKEY
rather than using the session key stored in the kfilnd_peer. This is
a protocol change so the KFILND_MSG_VERSION is bumped.

During testing it was found that the kfilnd_msg::version was not
being set correctly for immediate and bulk messages. To allow interop
the kfilnd_msg::version must be set to the handshaked negotiated
version that is stored in kfilnd_peer::kp_version. This has been
fixed. This issue only impacts kfilnd peers with message version > 1,
so backwards compatability between versions 1 and 2 will work
correctly.

The KFILND_TN_DEBUG macro is modified to print additional information
that was useful when debugging this issue.

Lastly, the TN_EVENT_TAG_TX_OK was missing from tn_event_to_str(), so
this is added.

HPE-bug-id: LUS-12317
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0b52a8367cd45b7587ba9ec3fa5212f548bebb57
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17840 kfilnd: Race between peer del RKEY reuse 71/55071/2
Chris Horn [Wed, 1 May 2024 16:33:33 +0000 (11:33 -0500)]
LU-17840 kfilnd: Race between peer del RKEY reuse

kfilnd_peer object deletion is a two step process. First a flag
(kfilnd_peer::kp_remove_peer = 1) is atomically set in the object to
mark it for removal via a call to kfilnd_peer_del(). Then, the next
caller of kfilnd_peer_put() will atomically modify this flag
(kfilnd_peer::kp_remove_peer = 2) again to denote that it is removing
the peer from the rhashtable before actually removing the object.

The window between marking a peer for deletion and removing it from
the peer cache allows a race where an RKEY may be re-used. For
example:

Thread 1: Posts tagged receive with RKEY based on
      peerA::kp_local_session_key X and tn_mr_key Y
Thread 1: Cancels tagged receive
Thread 1: kfilnd_peer_del() -> peerA::kp_remove_peer = 1
Thread 2: kfilnd_peer_put() -> peerA::kp_remove_peer = 2
Thread 1: kfilnd_peer_put() -> kfilnd_tn_finalize() -> releases
tn_mr_key Y
Thread 3: allocates tn_mr_key Y
Thread 3: Fetches peerA with kp_local_session_key X
Thread 2: Removes peerA from rhashtable

At this point, thread 3 has the same RKEY used by thread 1.

The fix is to check on the peer lookup path whether a peer found in
the rhashtable has been marked for removal. If it has then we perform
the lookup again. We do this in a loop until either no peer is found,
or a peer is found that has not been marked for removal.

To reduce the size of this window, the process for kfilnd_peer
deletion is modified so that the first thread to call
kfilnd_peer_del() will also remove the peer from the rhashtable.

HPE-bug-id: LUS-12312
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibbbb38cd5ee2d90956791f8350dafbee5fe5d888
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55071
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17839 kfilnd: Wait for hello response to mark peer uptodate 70/55070/2
Chris Horn [Wed, 15 Nov 2023 19:22:24 +0000 (12:22 -0700)]
LU-17839 kfilnd: Wait for hello response to mark peer uptodate

We need to ensure that a target peer has processed a hello request
from the sender before initiating network transactions. This can be
positively affirmed iif we receive a hello response message from
the target.

There are two issues where messages may be dropped because hello
request or response has not been processed.

Issue 1 - Race:
A@kfi -> HELLO REQ -> B@kfi
A@kfi <- HELLO REQ <- B@kfi
A@kfi processes HELLO REQ, marks B@kfi uptodate
A@kfi -> MSG -> B@kfi
A@kfi -> HELLO RSP -> B@kfi

MSG is dropped by B@kfi because it did not process A@kfi's HELLO REQ
or RSP.

Issue 2 - HELLO target already considers originator as uptodate
A@kfi -> HELLO REQ -> B@kfi
B@kfi processes HELLO REQ
A@kfi <- MSG <- B@kfi
A@kfi <- HELLO RSP <- B@kfi

MSG is dropped by A@kfi because it did not process B@kfi's HELLO RSP.

We resolve the first race by waiting for the hello responses to
be processed before marking the peer as uptodate. To ensure that
we will always receive a hello response, the target of a hello request
must initiate its own handshake with the originator. When we receive
a hello request from a new peer then instead of setting the peer state
to KP_STATE_UPTODATE we instead set it to KP_STATE_WAIT_RSP. We can
process RX events for peer in this state, but sends to this peer will
be throttled until we receive a hello response from it.

To resolve the second race we need an additional change to allow
TN_EVENT_RX_OK events to be replayed until the hello response is
received and processed. However, this could result in state changes
that invalidate RX_OK events on replay. Thus, this race will remain
open.

Add CFS_KFI_REPLAY_RX_HELLO_REQ fail_loc to delay the processing of
an incoming hello request.

Add CFS_KFI_FAIL_MSG_TYPE_EAGAIN to delay the sending of specified
message types.

HPE-bug-id: LUS-11673
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaaa6b4a533dbcf13cd2a8c1365a89ba521d70af0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17838 kfilnd: Prevent simultaneous hellos 69/55069/2
Chris Horn [Tue, 14 Nov 2023 16:35:15 +0000 (09:35 -0700)]
LU-17838 kfilnd: Prevent simultaneous hellos

There is a race condition with checking, setting and clearing the
kp_hello_pending flag that can result in multiple hello requests being
sent for the same peer. If no hello response is received after the
LND timeout then multiple threads can race with each other in
clearing the kp_hello_pending flag and posting a new hello request
message.

Thread 1: sets kp_hello_pending and posts hello request message
<No hello response received after LND timeout>
Thread 2: Clears kp_hello_pending, then sets kp_hello_sending
Thread 3: Clears kp_hello_pending, then sets kp_hello_sending
Thread 2/3: Both post hello request message

To resolve this issue we change kp_hello_pending from a simple binary
to instead track three states of a hello request: KP_HELLO_NONE,
KP_HELLO_INIT, and KP_HELLO_SENT. State is NONE when there is no
hello in the process of being sent. State is INIT when a thread is
allocating a HELLO request in preparation for sending. State is SENT
when the HELLO request is being posted. Now, when some threads detect
that we have not received hello response after LND timeout seconds
then only one of them will be able to transition to the hello state
from SENT -> NONE.

Add CFS_KFI_REPLAY_IDLE_EVENT fail_loc that can be used to delay
processing of TNs in the idle state depending on the TN event
value specified in fail_val.

HPE-bug-id: LUS-11974
Test-Parameters: trivial
Fixes: 11a32d886b ("LU-16213 kfilnd: Allow one HELLO in-flight per peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4dddf57971848a80a550df7523d55ad03f4a083e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55069
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17837 kfilnd: Set dev_cpt 68/55068/2
Ron Gredvig [Fri, 20 Oct 2023 19:46:48 +0000 (19:46 +0000)]
LU-17837 kfilnd: Set dev_cpt

The dev_cpt value was not being set by kfilnd.

Query the kfabric provider to get the low level
device. Using the device, determine the dev_cpt.

This change is backwards compatible with older
versions of the kfabric provider. If the query
is not supported the dev_cpt is set to
CFS_CPT_ANY.

HPE-bug-id: LUS-11352
Test-Parameters: trivial
Signed-off-by: Ron Gredvig <ron.gredvig@hpe.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id8af36b7aa5e89969de93dc8db9c0bba03236140
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55068
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-15988 osp: don't print nid on -ESTALE 49/55049/2
Lai Siyao [Fri, 3 May 2024 00:27:04 +0000 (20:27 -0400)]
LU-15988 osp: don't print nid on -ESTALE

Osp_send_update_req() should not access import upon -ESTALE, because
this MDT may be in umount.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibd869e4e8da4f90ffd608a36d866264d5d552d0e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55049
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17000 obdclass: Add NULL check for parms under class_exp2cliimp 30/55030/3
Arshad Hussain [Tue, 7 May 2024 05:29:03 +0000 (01:29 -0400)]
LU-17000 obdclass: Add NULL check for parms under class_exp2cliimp

This patch adds NULL pointer check for parameters
passed under class_exp2cliimp()

Test-Parameters: trivial
CoverityID: 424699 ("Dereference before null check")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie7d96c10086959a3f31b290d56621261da480a36
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55030
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17817 llapi: avoid potential NULL component 28/55028/4
Rajeev Mishra [Mon, 6 May 2024 20:12:54 +0000 (20:12 +0000)]
LU-17817 llapi: avoid potential NULL component

Avoid potential NULL dereference for component issue in
llapi_layout_file_open() and llapi_layout_file_comp_add()

CoverityID: 425352 ("Dereferencing 'comp', which is known to be NULL")
HPE-bug-id: LUS-12326
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: Id773fdbf031a2d11256140590f570f90da46ec3a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55028
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17816 llapi: ensure pool name is nul terminated 18/55018/2
Shaun Tancheff [Mon, 6 May 2024 09:26:22 +0000 (16:26 +0700)]
LU-17816 llapi: ensure pool name is nul terminated

strncpy() usage is inconsistent about the size of pool name
and sometimes for get to ensure a nul byte is placed at the
end of the copy.

CoverityID: 397181 ("Buffer not null terminated (BUFFER_SIZE)")

Also cleanup a case of checking that an unsigned value >= 0

CoverityID: 397820 ("Unsigned compared against 0 (NO_EFFECT)")

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idec7adaf89c9dabc0275687c4a069fc8fa63e7a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55018
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17504 libcfs: safer LIBCFS_ALLOC 15/55015/2
Shaun Tancheff [Mon, 6 May 2024 05:11:15 +0000 (12:11 +0700)]
LU-17504 libcfs: safer LIBCFS_ALLOC

Make the LIBCFS_ALLOC() family of macros safer by adding
parenthesis around arguments such as (size) to avoid uninteded
expansion.

CoverityID: 415056 ("Integer handling issues")

Fixes: 718e3f3e68 ("LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I9701f87025bc5ce038a6bf34413b64a3f019d998
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55015
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17815 tests: skip conf-sanity.sh test_5h 12/55012/4
Emoly Liu [Mon, 6 May 2024 03:15:37 +0000 (11:15 +0800)]
LU-17815 tests: skip conf-sanity.sh test_5h

Skip conf-sanity.sh test_5h because it always caused test_102 and
test_108 failure in recent interop testing.

Test-Parameters: trivial serverbuildno=170 serverjob=lustre-b2_12 serverdistro=el7.9 testlist=conf-sanity env=ONLY="5h 102 108",HONOR_EXCEPT=y
Test-Parameters: trivial testlist=conf-sanity

Fixes: d1b5146eda ("LU-12206 mdt: mdt_init0 failure handling")

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id6ffe8b5d88e1d79883cbf2d84d73796945fc734
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17791 build: use external o2ib path for ko2iblnd.ko 84/54984/2
Shaun Tancheff [Thu, 2 May 2024 09:20:49 +0000 (16:20 +0700)]
LU-17791 build: use external o2ib path for ko2iblnd.ko

The O2IBPATH variable was split into INT_O2IBPATH used
for in-kernel o2iblnd and EXT_O2IBPATH for the external
o2iblnd driver.

Correct a case where the transtion from @O2IBPATH@ to
@EXT_O2IBPATH@ was missed when support for multiple lnds
deb packaging was initially added.

Fixes: 95287378fab ("LU-16967 build: Separate lnet LND deb packaging")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I94ff393a437c6875cda9db266ab636fd88871188
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17756 lod: add tunable lod.*.max_stripes_per_mdt 45/54945/4
Lai Siyao [Thu, 25 Apr 2024 08:15:49 +0000 (04:15 -0400)]
LU-17756 lod: add tunable lod.*.max_stripes_per_mdt

Add a tunable lod.*.max_stripes_per_mdt for directory overstriping.
The default value is LMV_MAX_STRIPES_PER_MDT(5).

Add sanity 300uh 300ui.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id8199f01f5e2d62ead6bf43d239eee8ec1e4cbb5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54945
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17431 utils: adapt dynamic use in nodemap_cmd 00/55000/2
Sebastien Buisson [Tue, 30 Apr 2024 16:08:22 +0000 (18:08 +0200)]
LU-17431 utils: adapt dynamic use in nodemap_cmd

In nodemap_cmd(), try to detect if we are running on an MGS
before using the dynamic parameter.

Test-Parameters: trivial
Fixes: fecc3bd4e2 ("LU-17431 utils: add 'dynamic' parameter to nodemap_cmd")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I63a727491c839e457e44eaf1f4b4d11b164fd8b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17431 utils: fix various ret codes in lctl 01/54501/6
Sebastien Buisson [Wed, 13 Mar 2024 13:19:25 +0000 (14:19 +0100)]
LU-17431 utils: fix various ret codes in lctl

When nodemap_cmd() returns an error, use errno to print
correct return code.
Make get_mgs_device() return an errno in case of failure.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I74f6e27fc17158bf454f0d8be490a087aa137079
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-17431 nodemap: sanity check ioctl user buffer 28/54928/4
Sebastien Buisson [Fri, 26 Apr 2024 14:49:17 +0000 (16:49 +0200)]
LU-17431 nodemap: sanity check ioctl user buffer

In server_iocontrol_nodemap(), user data is copied into a struct
lustre_cfg. Then this data must be sanity checked, by calling
lustre_cfg_sanity_check().

CoverityID: 425252 ("Passing tainted expression lcfg->lcfg_buflens to lustre_cfg_string")
CoverityID: 397130 ("Passing tainted expression lcfg->lcfg_buflens")
Fixes: 72734cf178 ("LU-17431 ptlrpc: move nodemap related ioctls to ptlrpc")

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I268b53fc0e977716ffd1985d145dc27b6acccf94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17649 ptlrpc: fix -EACCES connection error handling 48/54448/13
Mikhail Pershin [Mon, 18 Mar 2024 15:37:02 +0000 (18:37 +0300)]
LU-17649 ptlrpc: fix -EACCES connection error handling

Connection errors -EACCES and -EROFS leave import in
intermediate state. It is still active as well as pinger
over it but has obd_no_recov set. That allows import to
recover after all if server security is updated. But even
in FULL state any RPC over import gets -ESHUTDOWN as
obd_no_recov is set

Meanwhile obd_no_recov is not supposed to be used in that
way, it reflects particular mount option and should not
be recovered ever. So patch sets import to deactive state
instead, making import not operational too but with
option to be activated manually or remounted

Server connections like LWP, MDT-OST and MDT-MDT are
excluded and are never deactivated. Such errors are
considered as temporary until remote target updates own
security as required or administrative intervention will
restart target as needed.

In both cases console message is issued.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib83e1b0ac541823ec236591f08145340d6f6bf04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54448
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16314 tests: enable debug_raw_pointers on mount 54/54254/8
Shaun Tancheff [Wed, 17 Apr 2024 08:44:12 +0000 (15:44 +0700)]
LU-16314 tests: enable debug_raw_pointers on mount

When the MGS is mounted:
  do_facet mgs "$LCTL set_param -P debug_raw_pointers=Y"

So debug_raw_pointers need only be set once instead of
enabled and distabled for each test.

Switching kptr_restrict for every node on every test (twice)
does not add value when testing on dedicated test VMs.

This adds a KPTR_ON_MOUNT to allow a less restrictive setting
during test-framework setupall()/cleanall().

The initial kptr restrict values are persisted to and restored
from a well-known temporary file $TMP/kptr-$PPID-env

The patch enables KPTR_ON_MOUNT by default.

HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4d8975f26e57ea064608663f309400d09406d500
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 weeks agoLU-17342 o2ib: build without Module.symvers 58/53358/3
Timothy Day [Thu, 7 Dec 2023 05:12:57 +0000 (05:12 +0000)]
LU-17342 o2ib: build without Module.symvers

When building against an external kernel tree, the
configure script fails if there isn't a Module.symvers
available. This prevents us from using the
'modules_prepare' make target on the kernel tree.
ko2iblnd.ko can be build even without Module.symvers.
Hence, downgrade this message from an error to a
warning.

Also, don't fail if ko2iblnd can't be built. Just
emit a warning.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8bca7f945c753fdac3aa5d9889d3347613baf059
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16819 build: use mofed path based on target kernel 37/50937/9
Ake Sandgren [Thu, 11 May 2023 06:48:32 +0000 (08:48 +0200)]
LU-16819 build: use mofed path based on target kernel

Instead of using "uname -r", which limits builds to the currently
running kernel, use the target kernel which is available in
LINUXRELEASE, if the directory is available.
Building for a specific kernel is common practice when using DKMS.

Test-Parameters: trivial
Signed-off-by: Ake Sandgren <ake.sandgren@hpc2n.umu.se>
Change-Id: Ifce912061a74fc5b7435cd940105190f0c3cd544
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50937
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16350 ldiskfs: Server support for LTS linux v6.1 60/52260/13
Shaun Tancheff [Thu, 25 Apr 2024 15:38:20 +0000 (22:38 +0700)]
LU-16350 ldiskfs: Server support for LTS linux v6.1

Keep LTS kernel support and very recent kernel
ldiskfs series. Squash older series and drop
any unused patches.

Dropping 5.8 and 5.9 non LTS kernel series
Adding patches with kernel version that originated
the change
   linux-5.18/ext4-lookup-dotdot.patch
   linux-6.0/ext4-data-in-dirent.patch
   linux-6.0/ext4-pdirop.patch
   linux-6.1/ext4-dont-check-before-replay.patch
   linux-6.1/ext4-mballoc-extra-checks.patch
   linux-6.1/ext4-prealloc.patch
refresh linux-5.16/ext4-misc.patch to use strscpy instead of strlcpy

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Id747e200f5d3f50475094ee5ad948c389cce3184
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52260
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-11085 ldlm: convert ldlm extent locks to linux extent-tree 92/41792/17
Mr NeilBrown [Fri, 21 Aug 2020 00:28:53 +0000 (10:28 +1000)]
LU-11085 ldlm: convert ldlm extent locks to linux extent-tree

As Linux has a fully customizable extent tree implementation, use that
instead of the one in lustre.  This removes the need to store the
extent endpoints in the lock twice, thus recovering some of the space
wasted in a previous patch.

It also allows iteration loops to be in-line rather than requiring a
callback - though in some cases we keep the callback.

Note that interval_expand() will not expand the lower boundary down if
the tree is not empty.  We now make that explicit in the loop in
ldlm_extent_internal_policy_granted().  Consequently testing of
'conflicting > 4' is irrelevant.

Linux extent-trees does not have a direct equivalent to
interval_is_overlapped(), however we can use extent_iter_first() to
achieve the same effect.

We ask for the first interval in the tree that covers the range of the
given interval with extent_iter_first().  If nothing is returned, then
nothing in the tree overlaps the interval and interval_is_overlapped()
would return false.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie28c6fb0d40d2c92c7067c7a79f48ee1fc633ce9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41792
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-11085 ldlm: move interval_insert call from ldlm_lock to ldlm_extent 21/34021/18
NeilBrown [Fri, 9 Aug 2019 17:10:03 +0000 (13:10 -0400)]
LU-11085 ldlm: move interval_insert call from ldlm_lock to ldlm_extent

Moving this call results in all interval-tree handling code
being in the one file. This will simplify conversion to
use Linux interval trees.

The addition of 'struct cb' is a little ugly, but will be gone
is a subsequent patch.

Change-Id: I7b392cc57b69969f4bb3c4b51fa406ed643a37b3
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/34021
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-17865 osc: fiemap deadlock fix 63/55163/3
Alexander Zarochentsev [Mon, 20 May 2024 18:33:18 +0000 (18:33 +0000)]
LU-17865 osc: fiemap deadlock fix

A fiemap call may deadlock due to wrongly requesting an ldlm lock at
server while the same lock is cached and pinned at the client. Two PR
lock requests are compatible so the deadlock also needs a concurrent
write lock.

ll_fiemap_info_key is shared between osc_object_fiemap()
calls, once OBD_FL_SRVLOCK flag is set, it is reused for
all subsequent RPCs regardless of the local lock caching status.

HPE-bug-id: LUS-12353
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I6e76bc5e4549ed887b8f6177432acf90f9ec614d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55163
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-6142 socklnd: SPDX for sockets LND 14/55114/2
Timothy Day [Wed, 15 May 2024 03:51:51 +0000 (03:51 +0000)]
LU-6142 socklnd: SPDX for sockets LND

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ifb655ba3ad59fb467e288916e4229968450e9788
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55114
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17851 ldiskfs: restart long fallocate tx 11/55111/3
Alexander Zarochentsev [Mon, 29 Apr 2024 17:37:34 +0000 (17:37 +0000)]
LU-17851 ldiskfs: restart long fallocate tx

__ext4_journal_ensure_credits() may allow a long fs operation
like fallocate to run for too long, if the initial credits
estimation is enough high.
The fix is to force tx restart if tx state is not T_RUNNING.

HPE-bug-id: LUS-12311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib03d78739997caa6d13690b41ef7d01609a3623b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55111
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16938 utils: setstripe overstripe multiple OST count 92/54192/13
Rajeev Mishra [Fri, 9 Feb 2024 16:49:45 +0000 (16:49 +0000)]
LU-16938 utils: setstripe overstripe multiple OST count

Add an option to "lfs setstripe -C" to specify stripe counts
that are a multiple of the number of OSTs in the filesystem.
Using "-C -1" will create one stripe on all (available) OSTs,
as with "-c -1", to avoid too many stripes.  Using "-C -2"
will create two stripes on each OST, etc.

The maximum multiplier is currently "-C -32", which will
create 32 stripes per OST. It is still possible to specify
a large positive stripe count directly to  "-C" for testing
purposes and to maintain compatibility with current usage.

HPE-bug-id:LUS-11793
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: Ib0462d7a9b71853419ea7c30741bb35d576f0d71
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-13802 llite: add hybrid IO switch proc stats 96/52596/28
Patrick Farrell [Wed, 13 Mar 2024 14:50:40 +0000 (10:50 -0400)]
LU-13802 llite: add hybrid IO switch proc stats

Hybrid IO switching proc stats are useful for telling us if
and why we switched to DIO.  They're also helpful for
writing tests.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I68649474cf11ffc445574fcca105a81fd6ecd458
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-13802 llite: add read & write switch thresholds 95/52595/35
Patrick Farrell [Mon, 1 Apr 2024 15:30:29 +0000 (11:30 -0400)]
LU-13802 llite: add read & write switch thresholds

The main criteria for switching to from buffered IO to
hybrid is IO size.  This adds that switching.  The correct
size for cutover is not the same for read and write, so we
have separate checks for read and write.

These checks are elaborated on in further patches, adding
different thresholds based on the backing storage type.

Adding the switching thresholds is what really enables
hybrid IO, so we have to adjust a number of tests which
assume buffered IO.

There are a few obscure hang bugs which have been difficult
to track down, and we are past feature freeze, so this patch
now leaves hybrid IO disabled by default.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I491cd7b2bdafe8bb2c1a4d692442a62154324bec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17525 tests: fix sanity hash 2.15 interop 76/55076/5
Andreas Dilger [Sat, 11 May 2024 05:38:29 +0000 (23:38 -0600)]
LU-17525 tests: fix sanity hash 2.15 interop

Fix test version checks for interop testing for DNE directory hash
usage in sanity with 2.15 servers.  This incorrectly was assuming
that the CRUSH2 dir hash was included in the 2.15.0 release, but it
was not backported to that branch, and only landed in 2.15.51.

Exclude UDIO interop failures, which are fixed via LU-17525.

Fixes: 1ac4b9598a ("LU-15720 dne: add crush2 hash type")
Test-Parameters: trivial testlist=sanity serverversion=2.15.4 serverdistro=el8.9 env=SANITY_EXCEPT="56 119 398"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If2097ebc30c7c4dbce88af7774ce3c0e8fb3cb75
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55076
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
4 weeks agoLU-17783 statahead: disable batch statahead for old server 17/55017/4
Qian Yingjin [Mon, 6 May 2024 08:16:19 +0000 (04:16 -0400)]
LU-17783 statahead: disable batch statahead for old server

Disable the batch statahead for the old server that does not
support MDS_BATCH batch RPC.

Fixes: 4435d0121f ("LU-14139 statahead: batched statahead processing")
Test-Parameters: testlist=sanity serverjob=lustre-b_es6_0 serverbuildno=638 clientdistro=el9.3 serverdistro=el8.8 env=ONLY=123
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I79fba4204e0ed44e2bc9a4c4f2758d087f0e406b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55017
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
4 weeks agoLU-17867 ko2iblnd: gcc bug work around 72/55172/4
James Simmons [Wed, 22 May 2024 14:53:24 +0000 (10:53 -0400)]
LU-17867 ko2iblnd: gcc bug work around

Gcc 11 reports
 error: array subscript 'struct sockaddr_in6[0]' is partly
 outside array bounds of 'struct sockaddr[1]'

due to a bug in gcc that it becomes confused with the union.
To work around this we move to struct sockaddr_storage from
struct sockaddr.

Test-Parameters: trivial
Change-Id: I586042d6e3c59be8c63e2821659cf9d3bcdac8e3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 weeks agoLU-17662 osd-zfs: Support for ZFS 2.2.3 30/54530/9
Shaun Tancheff [Mon, 6 May 2024 03:06:31 +0000 (10:06 +0700)]
LU-17662 osd-zfs: Support for ZFS 2.2.3

ZFS commit zfs-2.2.99-269-g9b1677fb5
   dmu: Allow buffer fills to fail
Adds a boolean_t to dmu_buf_will_fill() and dmu_buf_fill_done()

Lustre always uses B_FALSE for this argument.

Also re-arrange and split some configure macros so we can all
the zfs and ldiskfs tests can be run in the same parallel pass.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I71a4723bfa8ce62ae6f270e26ab149bf98278d3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17477 tests: conf-sanity/48 with debug=0 99/53799/13
Alex Zhuravlev [Wed, 24 Jan 2024 07:52:20 +0000 (10:52 +0300)]
LU-17477 tests: conf-sanity/48 with debug=0

conf-sanity/48 takes quite long setting 4,5K ACLs.
debug=0 improves this significantly.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=48
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ifa39b9efc80b41050a13323474dd19b865cc6273
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 fid: rename ptlrpc_req_finished for component fid 94/54994/2
Arshad Hussain [Thu, 2 May 2024 11:28:21 +0000 (07:28 -0400)]
LU-16741 fid: rename ptlrpc_req_finished for component fid

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
fid component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: If5bf08719ab9be8255f1145fa7bcdfebd68da52c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 fld: rename ptlrpc_req_finished for component fld 93/54993/2
Arshad Hussain [Thu, 2 May 2024 11:24:57 +0000 (07:24 -0400)]
LU-16741 fld: rename ptlrpc_req_finished for component fld

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
fld component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7229ccdb4a6440700c120a5d75edd018252b0b8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ldlm: rename ptlrpc_req_finished for component ldlm 92/54992/2
Arshad Hussain [Thu, 2 May 2024 11:21:02 +0000 (07:21 -0400)]
LU-16741 ldlm: rename ptlrpc_req_finished for component ldlm

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
ldlm component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0daff368ed1b4448f236e7f8f17e1534b3db5e58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 lfsck: rename ptlrpc_req_finished for component lfsck 91/54991/2
Arshad Hussain [Thu, 2 May 2024 11:15:06 +0000 (07:15 -0400)]
LU-16741 lfsck: rename ptlrpc_req_finished for component lfsck

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
lfsck component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I57fa0bac6ecf03a6143ca8342d0fb753dc815d60
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54991
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 quota: rename ptlrpc_req_finished for component quota 90/54990/2
Arshad Hussain [Thu, 2 May 2024 11:11:06 +0000 (07:11 -0400)]
LU-16741 quota: rename ptlrpc_req_finished for component quota

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
quota component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7e671d68be8c0209a7439dc9762b5b10039aa0a3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54990
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 mgc: rename ptlrpc_req_finished for component mgc 89/54989/2
Arshad Hussain [Thu, 2 May 2024 11:07:12 +0000 (07:07 -0400)]
LU-16741 mgc: rename ptlrpc_req_finished for component mgc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
mgc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7b7fac8b3cfc30b6b6e92f68018b494d24390a7c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54989
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ptlrpc: rename ptlrpc_req_finished for component ptlrpc 88/54988/2
Arshad Hussain [Thu, 2 May 2024 10:57:31 +0000 (06:57 -0400)]
LU-16741 ptlrpc: rename ptlrpc_req_finished for component ptlrpc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
ptlrpc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic41d76ace564132a369288676398bc881048f851
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 mdc: rename ptlrpc_req_finished for component mdc 87/54987/2
Arshad Hussain [Thu, 2 May 2024 10:49:26 +0000 (06:49 -0400)]
LU-16741 mdc: rename ptlrpc_req_finished for component mdc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
mdc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I46de8facbafcabbeb5c12daefcc5172f6c9bafd5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 osp: rename ptlrpc_req_finished for component osp 86/54986/2
Arshad Hussain [Thu, 2 May 2024 10:40:02 +0000 (06:40 -0400)]
LU-16741 osp: rename ptlrpc_req_finished for component osp

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
osp component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0da0f922be2a062459c14585f910ef2a6c425b14
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54986
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17797 lnet: avoid use after free of lnet ifaces 75/54975/2
Shaun Tancheff [Wed, 1 May 2024 04:39:26 +0000 (11:39 +0700)]
LU-17797 lnet: avoid use after free of lnet ifaces

Durning inet4 / inet6 enumeration the array of nids can be
reallocated for freed.

When the array is freed the originating reference should be
nulled to avoid a possible use after free.

CoverityID: 425360 ("USE_AFTER_FREE")

Test-Parameters: trivial
Fixes: ab6c8bd18 ("LU-16822 lnet: always initialize IPv6 at start up")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ifd751e0c2f0095b33f8b2cd8dd58cfd8572c5ff4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54975
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17795 lnet: unused return code in lnet_peer_data_present 71/54971/2
Serguei Smirnov [Tue, 30 Apr 2024 17:55:29 +0000 (10:55 -0700)]
LU-17795 lnet: unused return code in lnet_peer_data_present

Coverity check detected an issue with the return code from the call to
lnet_peer_set_primary_nid() in the code added by LU-17379 patch.
Fix it.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: ae6d37 ("LU-17379 lnet: parallelize peer discovery via LNetAddPeer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I8b9df330200ff2732efd2a54d8de910463993fae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17788 ptlrpc: restore watchdog revival message 42/54942/12
Andreas Dilger [Sat, 27 Apr 2024 02:48:15 +0000 (20:48 -0600)]
LU-17788 ptlrpc: restore watchdog revival message

Restore the "Service thread pid NNN completed after SSS.mmm
seconds.  This likely indicates the system was overloaded"
message that was lost during ptlrpc watchdog restructuring.

Do not rate limit this message, so that it is possible to see
when all threads are restored, even if their corresponding
"Service thread pid NNN was inactive" message was throttled.

Update recovery-small test_10a to check for these messages,
so that they are not removed again in the future.

Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Fixes: fc9de679a4 ("LU-9859 libcfs: add watchdog for ptlrpc service threads.")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c7e96fb7f73ca5562a6f5ad780a79ffc83ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54942
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>