Whamcloud - gitweb
fs/lustre-release.git
2 months agoNew tag 2.15.61 2.15.61 v2_15_61
Oleg Drokin [Sat, 17 Feb 2024 07:29:48 +0000 (02:29 -0500)]
New tag 2.15.61

Change-Id: I2df53b16d604cc066e9118f4e404a649e177e7fd
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17413 llite: protect check in ll_merge_md_attr() 39/53639/2
Alex Zhuravlev [Wed, 10 Jan 2024 19:09:18 +0000 (22:09 +0300)]
LU-17413 llite: protect check in ll_merge_md_attr()

striping can apply in a concurrent process, so the check for striping
should be serialized against any concurrent process.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iffac2f1f9b53abc26705d70a30c2201b48156ac8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53639
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17498 tests: show NIDs in node summary page 00/52500/4
Andreas Dilger [Mon, 25 Sep 2023 17:53:18 +0000 (11:53 -0600)]
LU-17498 tests: show NIDs in node summary page

Instead of only showting the network type for each node, list
show the full NID in the YAML file to help with debugging and
identifying nodes in the logs.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7ee39b08c5cae5a3f9ee4ea4dbee001a6d889fbb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52500
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lee Ochoa <lochoa@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17287 tests: remove trap 0 27/53127/4
Alex Zhuravlev [Tue, 14 Nov 2023 05:53:00 +0000 (08:53 +0300)]
LU-17287 tests: remove trap 0

.. from destroy_test_pools() as this interrupts current trap
chain making stack_trap useless.

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If978389a140f21ac520ef21b505378b8f64d8f73
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53127
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-16296 tests: sanity-flr/36c to save on writes 25/49025/8
Alex Zhuravlev [Thu, 3 Nov 2022 09:36:40 +0000 (12:36 +0300)]
LU-16296 tests: sanity-flr/36c to save on writes

there is no need to write 600MB as this may take significant
time if used with HDD.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ic6001aaba7f349a14ade1c720d175430370dd7e9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49025
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-11990 tests: enable conf-sanity 66 77/53877/3
Alexander Boyko [Mon, 15 Jan 2024 16:30:23 +0000 (11:30 -0500)]
LU-11990 tests: enable conf-sanity 66

The test was skipped from running beacuse it produces fails
for alone MGS. Since LU-13356 it is fixed, add it to running.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Idb684bb2780832f089fba1441d3b9375e9740431
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53877
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17495 build: cleanup configure messages 74/53874/2
Shaun Tancheff [Thu, 1 Feb 2024 07:24:48 +0000 (14:24 +0700)]
LU-17495 build: cleanup configure messages

Convert some remaining configure checks to use
  LB2_MSG_LINUX_TEST_RESULT

Also drop the undefined macro LC_CONFIG_HEALTH_CHECK_WRITE

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If0ae4f7549d5e1a46d6a5ce99d40ebcbd76c5e85
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53874
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17486 ldiskfs: fix race in ext4_destroy_inode 68/53868/2
Alex Zhuravlev [Wed, 31 Jan 2024 05:16:12 +0000 (08:16 +0300)]
LU-17486 ldiskfs: fix race in ext4_destroy_inode

ext4_i_callback() can race with the access to i_reserved_data_blocks
in ext4_destroy_inode() when used with preemption-enabled kernel.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I69c6bcfbb24e6c07d28ebcd2bdd9d9e6f06ec8d1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17475 tests: Do not pass IP to do_node in wait_nm_sync 38/53838/2
Chris Horn [Sat, 13 Jan 2024 17:06:10 +0000 (11:06 -0600)]
LU-17475 tests: Do not pass IP to do_node in wait_nm_sync

If do_node() resolves to pdsh then the ':' in an IPv6 NID is
misinterpreted as specifying an rcmd module. Avoid the issue by
passing the node hostname instead of IP.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I511308e3fb5247a85dec7f20a0ff4f3da2de4f3a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53838
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17474 tests: Update sanity 215 for IPv6 36/53836/2
Chris Horn [Sat, 13 Jan 2024 04:16:29 +0000 (22:16 -0600)]
LU-17474 tests: Update sanity 215 for IPv6

Update regexes to handle IPv6 NIDs.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie8e8cba0294ac241fddeb5af9c75799d67bb6638
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53836
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17467 build: Expand CUDA source detection logic 32/53832/2
Jean-Baptiste Skutnik [Thu, 25 Jan 2024 18:52:26 +0000 (21:52 +0300)]
LU-17467 build: Expand CUDA source detection logic

Fix the configure logic not handling the package disabling (variable
set to 'no') for the CUDA and GDS source paths

Signed-off-by: Jean-Baptiste Skutnik <jb.skutnik@gmail.com>
Change-Id: Icb96274a6df2508f8e3010daef0ba1d17b4471dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17471 osd: add symlink for brw_stats 29/53829/10
Hongchao Zhang [Fri, 26 Jan 2024 13:43:36 +0000 (21:43 +0800)]
LU-17471 osd: add symlink for brw_stats

Add symlink at /proc/fs/lustre/osd-*/*/brw_stats to
/sys/kernel/debug/lustre/osd-*/*/brw_stats to fix
the compatible issue of the previous utils that are
still using the old proc entry.

Test-Parameters: testlist=sanity env=ONLY=0f serverversion=2.15.4
Fixes: 8a84c7f9c7d6 ("LU-14927 osd: share brw_stats code between OSD back ends.")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ie86b2b384e3b91f98ead00b6325ddeb020e47aa5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53829
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17465 nodemap: change squash default value to 65534 02/53802/3
Sebastien Buisson [Tue, 23 Jan 2024 09:07:25 +0000 (10:07 +0100)]
LU-17465 nodemap: change squash default value to 65534

Initially, default values for nodemap.squash_uid/gid/projid were set
to 99, to match user 'nobody'. But on newer systems, nobody has
changed to 65534 and 99 no longer exists.
It is safe to use 65534 in all cases, as even on older systems it
exists and corresponds to 'nfsnobody'.

Test-Parameters: testlist=sanity env=ONLY=432 serverversion=2.15
Test-Parameters: testlist=sanity env=ONLY=432 clientversion=2.15
Test-Parameters: testlist=sanity-quota env=ONLY=75 serverversion=2.15
Test-Parameters: testlist=sanity-quota env=ONLY=75 clientversion=2.15
Test-Parameters: testlist=sanity-selinux env=ONLY=21 serverversion=2.15
Test-Parameters: testlist=sanity-selinux env=ONLY=21 clientversion=2.15
Test-Parameters: testlist=sanity-sec env=ONLY="7 8 9 10 11 12 13 14 15 16 18 19 20 21 22 23 24 25 26 27 32 33 34 35 36 55 61 64" serverversion=2.15
Test-Parameters: testlist=sanity-sec env=ONLY="7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 32 33 34 35 36 55 61 64" clientversion=2.15
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2e20fda0fdc0d5bfdf964a890bfbd0b54b943cf4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53802
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
2 months agoLU-17459 lod: incorrect assert in lod_statfs_and_check() 83/53783/3
Alex Zhuravlev [Tue, 23 Jan 2024 17:02:14 +0000 (20:02 +0300)]
LU-17459 lod: incorrect assert in lod_statfs_and_check()

the assertion must be done once we're sure this target
has not been counted/marked as active.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I56ae3fad92b8518f6aba2c880ecdac55f53cb689
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53783
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17216 ofd: skip sanity/70a on old OSTs 70/53770/4
Timothy Day [Tue, 23 Jan 2024 03:33:27 +0000 (03:33 +0000)]
LU-17216 ofd: skip sanity/70a on old OSTs

OSTs older than 2.15.59 won't have enable_health_write.
So skip the sanity/70a that requires it.

Test-Parameters: trivial
Test-Parameters: testlist=sanity clientversion=2.15 env=ONLY=70a,ONLY_REPEAT=10
Test-Parameters: testlist=sanity serverversion=2.15 env=ONLY=70a,ONLY_REPEAT=10
Fixes: e383791 ("LU-17216 ofd: make enable_health_write tunable")
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I320f6911e7b7064d49761a022c462b7c20f3a2e1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53770
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
2 months agoLU-17452 tests: fix interop sanityn tests with b2_15 59/53759/3
Etienne AUJAMES [Mon, 22 Jan 2024 10:44:11 +0000 (11:44 +0100)]
LU-17452 tests: fix interop sanityn tests with b2_15

sanityn 77q and 77r require server fixes to pass.
The patch adds server version check in tests.

Fixes: 44cc782 ("LU-9859 ptlrpc: simplifying expression parsing in nrs_tbf")
Fixes: c098c09 ("LU-14976 nrs: change nrs policies at run time")
Test-Parameters: trivial
Test-Parameters: clientversion=2.15.4 testlist=sanityn
Test-Parameters: serverversion=2.15.4 testlist=sanityn
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I91b30284e9a3c24c9709215f509ca75923214c5b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53759
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
2 months agoLU-8191 llverfs: fix non-static functions 54/53754/5
Timothy Day [Mon, 22 Jan 2024 02:51:49 +0000 (02:51 +0000)]
LU-8191 llverfs: fix non-static functions

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in llverfs.c static.

Making functions new_file() and new_dir() static
causes new format truncation errors. Check the
return of snprintf() to silence these.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ieccf1e40c1da627571a7a95adbb85599185f1342
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53754
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17438 utils: fix build for wirecheck 16/53716/3
Etienne AUJAMES [Wed, 17 Jan 2024 16:51:58 +0000 (17:51 +0100)]
LU-17438 utils: fix build for wirecheck

Fix wirecheck compilation and regenerate wiretest files.

Fixes: 6a20bdc ("LU-11376 lov: new foreign LOV format")
Fixes: 15d44e7 ("LU-12682 llite: fake symlink type of foreign file/dir")
Fixes: aebb405 ("LU-10499 pcc: use foreign layout for PCCRO on server side")
Fixes: 0ea23e0 ("LU-13307 nodemap: have nodemap_add_member support large NIDs")
Test-Parameters: trivial testlist=sanity env=ONLY=58
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I3a312136da00ba726887660575f6558faf167241
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53716
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17421 build: Update check for arc_prune_func_t parameters 64/53664/6
Brian Atkinson [Fri, 12 Jan 2024 00:36:59 +0000 (17:36 -0700)]
LU-17421 build: Update check for arc_prune_func_t parameters

In OpenZFS 2.2.1 the code for arc_prune_async() was unified so that
FreeBSD and Linux did not have their own implementation versions of
the same code. Part of this update changed first parameter for the
arc_prune_func_t to be an uint64_t.

Without this patch, Lustre would not build with ZFS 2.2.1 because of
a failure for incompatible pointer types for the arc_prunte_func_t
function pointer passed to arc_add_prune_callback().

Test-Parameters: trivial
Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Change-Id: Iaa03cc9421f27a8517ce04817f04102de9adb86a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53664
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 months agoLU-16913 quota: notify newest lqe in qmt_set_id_notify 37/53637/2
Sergey Cheremencev [Wed, 10 Jan 2024 18:56:03 +0000 (21:56 +0300)]
LU-16913 quota: notify newest lqe in qmt_set_id_notify

It is possible that lqe_locate may call lqe_find inside
qmt_pool_lqes_lookup_spec and insert the 2nd lqe into
lqs_hash during processing the previous one. Do not add the
1st lqe to be processed by qmt_reba_thread in qmt_id_lock_notify,
as this lqe will be freed in the end of lqe_locate_find due
to the race with the 2nd that is already exist in lqs_hash.
This fix should potentially fix the following assertion:

  (qmt_lock.c:950:qmt_id_lock_glimpse()) ASSERTION( lqe->lqe_gl ) failed:
  (qmt_lock.c:950:qmt_id_lock_glimpse()) LBUG

Fixes: 09f9fb3211 ("LU-11023 quota: quota pools for OSTs")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I3a3114d880077c87e61fccf4f32e3845bd42d842
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-15496 tests: add debugging to sanity/398c 62/53462/6
Andreas Dilger [Thu, 14 Dec 2023 14:23:12 +0000 (07:23 -0700)]
LU-15496 tests: add debugging to sanity/398c

Dump the rpc_stats to help understand why the test is failing.

Test-Parameters: trivial testlist=sanity clientarch=ppc64le env=ONLY=398c,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5ed1b7133eddd242b234a05a670e152e4ca359b7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53462
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17276 ldlm: add interval in flock 47/53447/11
Yang Sheng [Wed, 13 Dec 2023 20:30:36 +0000 (04:30 +0800)]
LU-17276 ldlm: add interval in flock

Add necessary changes for using interval tree in flock.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I94c416b4215b863b54eccfe7025f2976fe40181a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53447
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17334 lmv: handle object created on newly added MDT 63/53363/6
Lai Siyao [Thu, 7 Dec 2023 12:39:09 +0000 (07:39 -0500)]
LU-17334 lmv: handle object created on newly added MDT

When a new MDT is added to a filesystem without no_create, then a new
object is created on the MDT relatively quickly after it is added to
the filesystem, in particular because the new MDT would be preferred
by QOS space balancing due to lots of free space. However, it might
take a few seconds for the addition of the new MDT to be propagated
across all of the clients, so there is a risk that one client creates
a directory on an MDT that a client is not yet aware of, which returns
an error to the application immediately.

This patch fixes the issue by adding lmv_tgt_retry() that will retry
to use the MDT and wait for some number of seconds for the filesystem
layout to be updated if the MDT index an existing file/directory is
not found.

Commands that depend on user input, like 'lfs mkdir -i' and 'lfs df'
and round-robin MDT allocation will continue to use lmv_tgt() which
doesn't retry in case user specifies wrong MDT index, otherwise it can
hang the command for an extended period of time.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idb0cf65e95f665628d6799298732b7a06cde4a86
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53363
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17297 grant: move tgt_grant_sanity_check() calls 71/53171/4
Vladimir Saveliev [Fri, 17 Nov 2023 15:30:06 +0000 (18:30 +0300)]
LU-17297 grant: move tgt_grant_sanity_check() calls

Call tgt_grant_sanity_check() in ofd_obd_disconnect() and in
mdt_obd_disconnect() after call to tgt_grant_discard().

Otherwise, sum of grants does not match to total grant counter which
is reported as LustreError:
    ofd_obd_disconnect: tot_granted 0 != fo_tot_granted 8388608

This is because on stale export eviction
class_disconnect_stale_exports() moves stale exports to separate list
but does not update obd's grant counters.

Test to illustrate the issue is included.

HPE-bug-id: LUS-11469
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0b4568b88a2fe7b50f4eac50b4b064d7afbc7a75
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17271 kfilnd: Allocate tn_mr_key before kfilnd_peer 29/53029/4
Chris Horn [Tue, 7 Nov 2023 22:19:26 +0000 (15:19 -0700)]
LU-17271 kfilnd: Allocate tn_mr_key before kfilnd_peer

A race exists between kfilnd_peer and tn_mr_key allocation that could
result in RKEY re-use and data corruption.

Thread 1: Posts tagged receive with RKEY based on
          peerA::kp_local_session_key X and tn_mr_key Y
Thread 2: Fetches peerA with kp_local_session_key X
Thread 1: Cancels tagged receive, marks peerA for removal, and
          releases tn_mr_key Y
Thread 2: allocates tn_mr_key Y
At this point, thread 2 has the same RKEY used by thread 1.

The fix is to always allocate the tn_mr_key before looking up the
peer, and always mark peers for removal before releasing tn_mr_key.
This commit modifies the TN allocation to ensure the tn_mr_key is
allocated before looking up the target peer.

HPE-bug-id: LUS-11972
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I2e0948ae4fe7c5dfb86e297a3437213f193bf67c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17271 kfilnd: Protect RKEY for bulk Put/Get 28/53028/3
Chris Horn [Tue, 7 Nov 2023 21:14:42 +0000 (14:14 -0700)]
LU-17271 kfilnd: Protect RKEY for bulk Put/Get

The initiator of a bulk Put/Get generates an RKEY based on the the
values of the struct kfilnd_tn::tn_mr_key and
struct kfilnd_peer::kp_local_session_key. kp_local_session_key is
assigned at peer creation, and tn_mr_key is assigned when the
kfilnd_tn is allocated.

A bulk Put/Get can fail in various ways such that the target of the
operation may have a reference to the RKEY, but the originator cannot
know the state of the operation at the target. In these cases, the
initiator must ensure that the RKEY is not re-used. To accomplish
this, we need to delete the target peer from the originator's peer
cache to ensure that subsequent bulk Put/Get operations will use
a new kp_local_session_key, and thus avoid re-using any old RKEY
values.

HPE-bug-id: LUS-11972
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If270a2df745ee88c35addc8194cdb160cb373c3e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53028
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17270 kfilnd: Check status of TAG_RX_OK in WAIT_COMP 27/53027/2
Chris Horn [Tue, 7 Nov 2023 17:36:29 +0000 (10:36 -0700)]
LU-17270 kfilnd: Check status of TAG_RX_OK in WAIT_COMP

When the target of a bulk Get/Put drops the message it sends
ENODATA back to the initiator via immediate data. This status needs to
be accounted for while the transaction is in the TN_STATE_WAIT_COMP
state, otherwise it can be lost if the TN_EVENT_TAG_RX_OK event
arrives before the TN_EVENT_TX_OK event.

HPE-bug-id: LUS-11971
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I52d6ea52746cbc14a86478fcccb32b25badd3b0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53027
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16822 tests: Skip tests lacking large NID support 27/53727/4
Chris Horn [Tue, 5 Dec 2023 04:25:01 +0000 (22:25 -0600)]
LU-16822 tests: Skip tests lacking large NID support

Test 230 - Needs lctl conn_list but this does not support large NIDs.

Tests 204-207,209-213,216,218,231,302,500 - These test cases use
commands that do not support large NIDs (drop rules, printing recovery
queue, etc.), or do not work properly (lnetctl import/export,
lctl which_nid).

Tests 101,103 - These tests exercise NID ranges, so they do not
work with large NIDs.

Tests 100,102,105-106 - These test cases need to be re-written to
specify valid IPv6 NIDs.

Test 220 - Calls lst which does not support large NIDs.

Test 250 - Uses ksocklnd-config but this does not support IPv6.

Test 208 and 255 use ip2nets and routes parameters that do not support
large NIDs.

Test 214 - If the destination NID to the ping commands is IPv6, then
the fake interface cannot be cleaned up.

Some places where drop or delay rules were added did not check for
success or failure. This has been corrected.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I7f251a82aa2eee304419a765df728a014b9c9e27
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53727
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16967 build: Separate lnet LND deb packaging 97/52397/6
Shaun Tancheff [Fri, 26 Jan 2024 17:57:35 +0000 (00:57 +0700)]
LU-16967 build: Separate lnet LND deb packaging

Enable separate packaging of lnet lnd kernel modules into
separate packages with build profile multiple-lnds:

  lustre-lnet-module-socklnd for socklnd.ko
  lustre-lnet-module-gnilnd for kgnilnd.ko, profile gnilnd
  lustre-lnet-module-kfilnd for kkfilnd.ko, profile kfilnd
  lustre-lnet-module-o2iblnd for o2iblnd.ko, profile ext_o2ib
  lustre-lnet-module-in-kernel-o2iblnd for ko2iblnd.ko,
     profile int_o2ib

Test-Parameters: trivial
HPE-bug-id: LUS-11711
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3a5ca03fa410238f66083289db0899c8b4bfab5c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16498 obdclass: change uc_lock to rwlock 95/52395/13
Sebastien Buisson [Thu, 14 Sep 2023 16:00:04 +0000 (18:00 +0200)]
LU-16498 obdclass: change uc_lock to rwlock

Change the upcall cache uc_lock to a read-write lock so that threads
can get the read lock to do concurrent lookups in the upcall cache,
and only grab the write lock in the rare case when a new entry is
added or old entries are expired. That reduces serialization between
server threads during normal operation, and avoids all of the threads
spinning for some time if the requested key (UID or gss context) is
not in the cache at all, before they sleep.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I812400104fd2115d19386fb4a03bb3ce99c49383
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52395
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16314 debug: Enable optional unhashed pointers 77/51877/15
Shaun Tancheff [Wed, 18 Oct 2023 09:27:10 +0000 (04:27 -0500)]
LU-16314 debug: Enable optional unhashed pointers

This patch takes a page out of the kernel trace debug
playbook to rewrite format strings and change %p -> %px
on-the-fly when:

   libcfs_debug_raw_pointers

is enabled.

The module parameter can be viewed and modified by root
via lctl:
    lctl get_param debug_raw_pointers
    lctl set_param debug_raw_pointers=1

Since nothing uses the return value from libcfs_debug_msg
change it to void.

Use percpu pre-allocated buffers for holding modified
format strings to avoid kmalloc/kfree as well as avoid
bloating stack usage.

HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I63d90d614ce4435b07f5e84991a12ae7351ac2bb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51877
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
2 months agoLU-16314 lnet: Migrate LASSERTF %p to %px 31/51231/7
Shaun Tancheff [Tue, 6 Jun 2023 04:07:44 +0000 (11:07 +0700)]
LU-16314 lnet: Migrate LASSERTF %p to %px

This change covers libcfs and lnet and converts LASSERTF
statements to explicitly use %px.

Use %px to explicitly report the non-hashed pointer value
messages printed when a kernel panic is imminent. When
analyzing a crash dump the associated kernel address can
be used to determine the system state that lead to the
system crash.

As crash dumps can and are provided by customers from
production systems the use of the kernel command line
parameter:
    no_hash_pointers
is not always possible.

Ref: Documentation/core-api/printk-formats.rst

Test-Parameters: trivial
HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4d0c956e1b914cea9517b632d46f1714bcd43a85
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51231
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-16314 llite: Migrate LASSERTF %p to %px 13/51213/8
Shaun Tancheff [Tue, 6 Jun 2023 03:44:53 +0000 (10:44 +0700)]
LU-16314 llite: Migrate LASSERTF %p to %px

This change covers lustre/ec through lustre/mgs and
converts LASSERTF statements to explicitly use %px.

Use %px to explicitly report the non-hashed pointer value
messages printed when a kernel panic is imminent. When
analyzing a crash dump the associated kernel address can
be used to determine the system state that lead to the
system crash.

As crash dumps can and are provided by customers from
production systems the use of the kernel command line
parameter:
    no_hash_pointers
is not always possible.

Ref: Documentation/core-api/printk-formats.rst

Test-Parameters: trivial
HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I708d9ef60c63f5b4006c7986599a2f39fc9e5fdf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51213
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16314 obdclass: Migrate LASSERTF %p to %px 05/49405/10
Shaun Tancheff [Thu, 25 May 2023 12:01:32 +0000 (07:01 -0500)]
LU-16314 obdclass: Migrate LASSERTF %p to %px

This change covers lustre/obdclass through lustre/target and
converts LASSERTF statements to explicitly use %px.

Use %px to explicitly report the non-hashed pointer value
messages printed when a kernel panic is imminent. When
analyzing a crash dump the associated kernel address can
be used to determine the system state that lead to the
system crash.

As crash dumps can and are provided by customers from
production systems the use of the kernel command line
parameter:
    no_hash_pointers
is not always possible.

Ref: Documentation/core-api/printk-formats.rst

Test-Parameters: trivial
HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia256dc1f74f976640ec82746a5d761ef662f45ae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49405
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17426 tests: add crossdir parallel rename test 38/53738/12
Andreas Dilger [Fri, 19 Jan 2024 03:44:33 +0000 (20:44 -0700)]
LU-17426 tests: add crossdir parallel rename test

Add sanityn test_81d to test cross-dir (same-MDT) parallel rename
if the MDT supports this functionality.

Test-Parameters: trivial testlist=sanityn
Test-Parameters: testlist=sanityn serverversion=2.15 env=SANITYN_EXCEPT="77q 77r"
Test-Parameters: testlist=sanityn env=ONLY=81d,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=ONLY=81d,ONLY_REPEAT=10 mdtcount=2
Test-Parameters: testlist=sanityn env=ONLY=81d,ONLY_REPEAT=10 mdtcount=4
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic8717e6865a9c6c9698186f4fdf34c1f4f74083f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53738
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-11112 lnet: improve error msg in lnet_sock_create() 58/32758/6
Karsten Weiss [Fri, 29 Jun 2018 15:22:09 +0000 (17:22 +0200)]
LU-11112 lnet: improve error msg in lnet_sock_create()

The kernel_bind() call in lnet_sock_create() may fail due to
problems with the local port, or the local IP address. Make sure
to include both items when indicating a fatal error.

Test-Parameters: trivial
Signed-off-by: Karsten Weiss <karsten.weiss@atos.net>
Signed-off-by: Daniel Kobras <d.kobras@science-computing.de>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I43c1f089d3b12e61c18c97e532b6872a6c8cf272
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/32758
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 uapi: Fix style issues for lustre_disk.h 19/53919/2
Arshad Hussain [Mon, 5 Feb 2024 05:23:36 +0000 (10:53 +0530)]
LU-6142 uapi: Fix style issues for lustre_disk.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_disk.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I20cc1784602b7e95a5c1541851684d3c2199be1b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53919
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 ldlm: Fix style issues for lustre_dlm_flags.h 17/53917/2
Arshad Hussain [Mon, 5 Feb 2024 05:44:51 +0000 (11:14 +0530)]
LU-6142 ldlm: Fix style issues for lustre_dlm_flags.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_dlm_flags.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Icf57d55fe3806e1990d9f88d78787137171f83a3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53917
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 fid: Fix style issues for lustre_fid.h 15/53915/2
Arshad Hussain [Mon, 5 Feb 2024 08:54:46 +0000 (14:24 +0530)]
LU-6142 fid: Fix style issues for lustre_fid.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_fid.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8128bf72826187cc3d941c25e16e42e0736d8e3e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 fld: Fix style issues for lustre_fld.h 14/53914/2
Arshad Hussain [Mon, 5 Feb 2024 09:14:44 +0000 (14:44 +0530)]
LU-6142 fld: Fix style issues for lustre_fld.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_fld.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic02ff3ec463d9115de5483e4333e935406c0ed93
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53914
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 llog: Fix style issues for lustre_log.h 12/53912/2
Arshad Hussain [Mon, 5 Feb 2024 11:04:33 +0000 (16:34 +0530)]
LU-6142 llog: Fix style issues for lustre_log.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_log.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I86650ff994c851a91c109c359251e80dd761b245
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53912
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 build: Fix style issues for lustre_compat.h 07/53907/2
Arshad Hussain [Mon, 5 Feb 2024 04:47:51 +0000 (10:17 +0530)]
LU-6142 build: Fix style issues for lustre_compat.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_compat.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I2e91f4faba98d3862d8180f622478518c4b8b0c3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53907
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 obdclass: Fix style issues for lprocfs_status.h 06/53906/2
Arshad Hussain [Sun, 4 Feb 2024 22:50:06 +0000 (04:20 +0530)]
LU-6142 obdclass: Fix style issues for lprocfs_status.h

This patch fixes issues reported by checkpatch
for file lustre/include/lprocfs_status.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I99d364f698e4319e83d129a2a0c529d6f7ce8dec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53906
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 clio: Fix style issues for cl_object.h 02/53902/5
Arshad Hussain [Sat, 3 Feb 2024 22:56:09 +0000 (04:26 +0530)]
LU-6142 clio: Fix style issues for cl_object.h

This patch fixes issues reported by checkpatch
for file lustre/include/cl_object.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0332d07569ed3c4ddff1ed6514918b2afdd48179
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 utils: Fix style issues for folder lustre/include/lustre 00/53900/4
Arshad Hussain [Sat, 3 Feb 2024 21:38:05 +0000 (03:08 +0530)]
LU-6142 utils: Fix style issues for folder lustre/include/lustre

This patch fixes issues reported by checkpatch
for all files under folder lustre/include/lustre/

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I65e87c8343bc0b90b71684827d8ca2bd7efd652e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53900
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-6142 fld: Fix style issues for lproc_fld.c 88/53888/2
Arshad Hussain [Fri, 2 Feb 2024 13:16:41 +0000 (18:46 +0530)]
LU-6142 fld: Fix style issues for lproc_fld.c

This patch fixes issues reported by checkpatch
for file lustre/fld/lproc_fld.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I27e3719cce73f04460fffa5b0583603adb1cd05f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53888
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 fid: Fix style issues for fid_lib/lproc_fid.c 87/53887/3
Arshad Hussain [Fri, 2 Feb 2024 12:19:47 +0000 (17:49 +0530)]
LU-6142 fid: Fix style issues for fid_lib/lproc_fid.c

This patch fixes issues reported by checkpatch
for file lustre/fid/fid_lib.c and lustre/fid/lproc_fid.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I829027962170e9494e6ff726804a579dd987dc17
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17489 utils: fix 'lfs getname' ioctl breakage 67/53867/3
Andreas Dilger [Wed, 31 Jan 2024 00:06:36 +0000 (17:06 -0700)]
LU-17489 utils: fix 'lfs getname' ioctl breakage

The removal of the explicit ioctl(OBD_IOC_GETNAME_OLD) fallback
in llapi_file_fget_lov_uuid() caused 'lfs getname' to break when
running against an 2.14 mountpoint missing OBD_IOC_GETDTNAME.

Change this to call llapi_ioctl(OBD_IOC_GETDTNAME) which handles
this compatibility mapping internally.

Also fix the include header ordering to ensure LUSTRE_VERSION_CODE
is defined before including lustre_ioctl_old.h.

Test-Parameters: trivial
Fixes: 0f38a0b9db ("LU-13107 uapi: remove obsolete ioctls")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I944acd498a42ba4882c8391e27f4156cb63ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10003 lnet: Update lctl ping to work with large NIDs 35/53835/3
Chris Horn [Sat, 13 Jan 2024 03:50:00 +0000 (21:50 -0600)]
LU-10003 lnet: Update lctl ping to work with large NIDs

jt_ptl_ping()/lnet_parse_nid() updated to use the large struct
lnet_processid. This allows lctl ping to work with large NIDs. If we
need to fallback to the old_api then the large lnet_processid is
converted to the old struct lnet_process_id.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0fb05ed631d0432e6c6caba0f64fc377f785bc2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17464 lod: set llc_ostlist to NULL after free 97/53797/2
Bobi Jam [Wed, 24 Jan 2024 06:15:49 +0000 (14:15 +0800)]
LU-17464 lod: set llc_ostlist to NULL after free

Default LOV striping could free component entry llc_ostlist if needed
e.g. expand component entries, without set it to NULL it could be
double allocated/freed later.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I25824cb61dd47ba284403039259593b88d25fa9d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53797
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-10391 nrs: use struct lnet_nid for nrs_crrn_hashfn() 99/53399/4
James Simmons [Sun, 10 Dec 2023 15:10:53 +0000 (10:10 -0500)]
LU-10391 nrs: use struct lnet_nid for nrs_crrn_hashfn()

In the move to large NIDs nrs_crrn_hashfn() was missed. Update
the hash value generated to use nidhash() since our value
is much larger then 64 bits of lnet_nid_t.

Test-Parameters:trivial testlist=sanityn envdefinitions=ONLY=77b
Fixes: 36a199db2b ("LU-10391 ptlrpc: change cc_nid in nrs to be struct lnet_nid")
Change-Id: I19d7150c773db4755b3b8a18791f1411e13fe2b3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53399
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17479 utils: Update lnet tools to support PyYAML format 45/53845/3
James Simmons [Mon, 29 Jan 2024 00:45:40 +0000 (19:45 -0500)]
LU-17479 utils: Update lnet tools to support PyYAML format

The current cYAML implementation can't handle PyYAML indentation
style. The reason is the underlying libyaml library creates
different yaml events / tokens for the PyYAML format. I attempted
to inject the missing yaml tokens from the PyYAML format but that
failed to work. Also the tokens with the PyYAML produced the
wrong scalar strings. I looked at moving to yaml events instead
of tokens but that required a large change. The simplest change
was to capture the YAML config input and place it into a locally
allocated buffer. Then alter the location of '-' which changed
the YAML config from PyYAML to something cYAML can handle. For
this to work I needed to move the YAML config data handling out
of cYAML_build_tree() to jt_import(). The reason was that
lustre_yaml_cb_helper() is called more than once and for stdin
it can only be read once.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ic8529ae264c9cbe6872da9a9e3421db78f8ea371
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53845
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10391 lnet: Fix lnetctl peer set --all 91/53791/5
Chris Horn [Tue, 23 Jan 2024 21:54:05 +0000 (14:54 -0700)]
LU-10391 lnet: Fix lnetctl peer set --all

When the --all flag is specified as part of the lnetctl peer set
command,  a primary NID argument is not sent to the peer doit command.
The "peer set" case uses NLM_F_REPLACE flag, so in this case the
primary NID argument is optional.

Added test cases to validate behavior of the lnetctl peer set
command.

Test-Parameters: trivial
Fixes: 8a0fdfa0b2 ("LU-10391 lnet: migrate peer NI control to Netlink")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I58de030b061280e837de27611bc701a3affab0f3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53791
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 ldlm: Fix style issues for lustre_sec.h 53/51853/3
Arshad Hussain [Wed, 2 Aug 2023 12:18:22 +0000 (17:48 +0530)]
LU-6142 ldlm: Fix style issues for lustre_sec.h

This patch fixes issues reported by checkpatch
for file lustre/include/lustre_sec.h

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic4a8d3c5188e79dee2347db9b9d951afdee72630
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51853
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 ldlm: Fix style issues for sec_gss.c 10/51810/4
Arshad Hussain [Mon, 31 Jul 2023 06:39:32 +0000 (12:09 +0530)]
LU-6142 ldlm: Fix style issues for sec_gss.c

This patch fixes issues reported by checkpatch
for file lustre/ptlrpc/gss/sec_gss.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia56b8f9881aab705e27daaa7539c9c1388f4bb5d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51810
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17189 o2ib: assign tx_gpu properly 02/52702/7
Jinshan Xiong [Thu, 12 Oct 2023 22:55:58 +0000 (15:55 -0700)]
LU-17189 o2ib: assign tx_gpu properly

tx_gpu is not assigned or initialized properly.

Test-Parameters: trivial
Fixes: f792297212 ("LU-16211 o2iblnd: Avoid NULL md deref")
Signed-off-by: Jinshan Xiong <jinshan.xiong@gmail.com>
Change-Id: I5e14d66f41f6194203fec7832493efd432b54c36
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52702
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17285 obdclass: remove debug from obd_get_at_* 06/53106/4
Alex Zhuravlev [Mon, 13 Nov 2023 04:48:40 +0000 (07:48 +0300)]
LU-17285 obdclass: remove debug from obd_get_at_*

Removing debugging CDEBUG() from obd_get_at_*() helpers.
Messages like the following:
00010000:00100000:0.0:1699657756.854481:0:27978:0:
(ldlm_request.c:181:ldlm_cp_timeout()) NULL obd

Fixes: 0f2bc318d7 ("LU-15246 ptlrpc: per-device adaptive timeout parameters")
Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iaaa964d4e3ca62de5fb273f865d7cd4c3a4f9a29
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 utils: In mydaemon() check after calling open() 58/53758/2
Arshad Hussain [Mon, 22 Jan 2024 10:33:02 +0000 (16:03 +0530)]
LU-17000 utils: In mydaemon() check after calling open()

This patch adds check after calling open() in function
mydaemon() instead of directly using the value

Test-Parameters: trivial kerberos=true testlist=sanity-krb5
CoverityID: 397666 ("Argument cannot be negative")
Fixes: d2d56f38da0 ("make HEAD from b_post_cmd3")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic59414977029221e8618c5bb3320e95d39d9cded
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53758
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 utils: Remove check for errno != 0 42/53742/2
Arshad Hussain [Fri, 19 Jan 2024 07:44:21 +0000 (13:14 +0530)]
LU-17000 utils: Remove check for errno != 0

In lustre_rename_fsname() after calls to system calls
like open/read/write/lseek there is a check if global
errno is not equal to zero. This is noop and not
required, as these system calls do reset errno to 0
on success.

The side effect of this check was that it appeared
that the errno could be 0 which would leave 'ret' as
negative. This would never happen, but was causing
Coverity to complain.

This patch fixes this coverity issue by removing
the comparison of errno != 0 which is not required.

Test-Parameters: trivial testlist=conf-sanity,sanityn
Fixes: d0c6e97fa53 ("LU-8900 snapshot: rename filesysetem fsname")
CoverityID: 397153 ("Argument cannot be negative")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7257434e6546f56f30841f49a3bde35e80360bb8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53742
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17370 utils: simplify lfs-mirror-extend help text 19/53719/2
Alexandre Ioffe [Thu, 4 Jan 2024 03:34:40 +0000 (19:34 -0800)]
LU-17370 utils: simplify lfs-mirror-extend help text

Add list of lfs setstripe command line options
to help text of lfs mirror extend.
Simplify syntax of lfs mirror extend help text.
Update corresponding lfs-mirror-extend man page.
On man pages make left side adjustment and disable hyphenation:
'.nh', '.ad l' to prevent hyphenation of keywords

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I6cffcdb9651062e169f53868827646b876a82cb5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53719
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17430 tests: fix interop sanity-hsm testing with b2_15 94/53694/3
Etienne AUJAMES [Wed, 17 Jan 2024 10:28:35 +0000 (11:28 +0100)]
LU-17430 tests: fix interop sanity-hsm testing with b2_15

Add a MDS server check for sanity-hsm test 114 and 409a. Those tests
require fixes on MDS server side.

Test-Parameters: trivial testlist=sanity-hsm
Test-Parameters: serverversion=2.15.4 testlist=sanity-hsm
Test-Parameters: clientversion=2.15.4 testlist=sanity-hsm
Fixes: b13a5b351e ("LU-16188 mdt: fix incompatible HSM request handling")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I8bb5fb93e38f2428432fd3469317d3d13899a107
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53694
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17000 utils: Add check after calling sysconf(_SC_PAGESIZE) 93/53693/2
Arshad Hussain [Wed, 17 Jan 2024 09:34:25 +0000 (15:04 +0530)]
LU-17000 utils: Add check after calling sysconf(_SC_PAGESIZE)

Calling sysconf(_SC_PAGESIZE) could return -1
on error. This patch adds check after calling
sysconf() instead of directly using the value

Test-Parameters: trivial testlist=sanity-flr
CoverityID: 397578 ("Argument cannot be negative")
CoverityID: 397246 ("Argument cannot be negative")
CoverityID: 397320 ("Argument cannot be negative")
CoverityID: 397671 ("Argument cannot be negative")
CoverityID: 397826 ("Argument cannot be negative")
CoverityID: 397898 ("Argument cannot be negative")
CoverityID: 397917 ("Argument cannot be negative")
CoverityID: 399702 ("Argument cannot be negative")
Fixes: 0561c144 (LU-13397 lfs: mirror extend/copy keeps sparseness)
Fixes: a5905b2a (LU-11245 flr: lfs mirror dump command)
Fixes: f1daa8fc (LU-10287 flr: lfs mirror verify command)
Fixes: 0e5c12ac (LU-10916 lfs: improve lfs mirror resync)
Fixes: 5d7c4fa6 (LU-9771 flr: mirror read and write)
Fixes: 5999c0b8 (LU-9771 flr: resync support and test tool)
Fixes: 9b44cf70 (LU-13224 utils: expose llapi_param* functions)
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I19d13528f63d4586a17aaa9d15313872f8c40c94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53693
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 utils: Fix check after return from fopen() 86/53686/2
Arshad Hussain [Tue, 16 Jan 2024 10:22:54 +0000 (15:52 +0530)]
LU-17000 utils: Fix check after return from fopen()

Return from fopen() for option 't' (trace) was
checked with a different variable. This patch
fixes this.

Test-Parameters: trivial
CoverityID: 397651 ("Copy-paste error")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I04bd207582d421cc3744e7b0f4298e738502edbe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-930 doc: update lfs-setstripe.1 man page 72/53672/3
Alex Deiter [Sun, 14 Jan 2024 19:40:42 +0000 (23:40 +0400)]
LU-930 doc: update lfs-setstripe.1 man page

Update the lfs-setstripe.1 man page to describe
maximum value for the mirror_count option.

Test-Parameters: trivial
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: I1879eabef5c397730e0795858e6c6a103d6a2259
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53672
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 months agoLU-17414 lnet: Use POSIX error number for libnetconfig 57/53657/3
Arshad Hussain [Thu, 11 Jan 2024 09:35:02 +0000 (15:05 +0530)]
LU-17414 lnet: Use POSIX error number for libnetconfig

Currently liblnetconfig.c is returning custom define
LUSTRE_CFG_RC_* numbers which can be confusing to users.
This patch redefines LUSTRE_CFG_RC_* to use POSIX
error number to be consistent.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I585d1dfd80d07160e5cdeef784920414132bcaf8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17242 debug: use dump_stack() where possible 25/53625/2
Timothy Day [Tue, 9 Jan 2024 17:17:10 +0000 (17:17 +0000)]
LU-17242 debug: use dump_stack() where possible

In some cases, libcfs_debug_dumpstack() can fail to output a
stack trace - either because the needed symbols are not exported
or those symbols can't be resolved at runtime. This seems to
occur more often with newer kernels. The messages appears only
as:

 Lustre: ldlm_cb01_002: service thread pid 57876 was inactive for
   40.494 seconds. The thread might be hung, or it might only be
   slow and will resume later. Dumping the stack trace for
   debugging purposes:
 Pid: 57876, comm: ldlm_cb01_002 6.1.70 #1 SMP PREEMPT_DYNAMIC
   Thu Jan  4 18:52:41 UTC 2024
 Call Trace TBD:

with no stack trace (seen on CentOS 8.5 with ml 6.1.70).

For reference, the runtime symbol lookup was added and updated in:

 b49ce7a ("LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK")
 58ac9d3 ("LU-14099 build: Fix for unconfigured arch_stackwalk")

First, add a message when the symbol can't be resolved correctly.
This makes it much easier to understand why the stack trace is
missing.

Second, replace libcfs_debug_dumpstack(NULL) with dump_stack().
When the task_struct is NULL, libcfs uses the current
task_struct. This replicates the functionality of dump_stack().
Using dump_stack() is more reliable, more in line with kernel
style, and not likely to be un-exported in the future.

Finally, in lustre/osc/osc_object.c the stack isn't dumped since
there is already an LBUG().

There only remains one user of libcfs_debug_dumpstack() which
uses a task_struct other than current. This can be cleaned up
in a future patch.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I196c1da7e39b1a694c0cb67ecfaab58ab3e4662c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53625
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17400 uapi: Fix incorrect snamelen return value 24/53624/5
Josh Samuelson [Mon, 8 Jan 2024 19:03:52 +0000 (13:03 -0600)]
LU-17400 uapi: Fix incorrect snamelen return value

The sname char array is limited by the struct
changelog_rec.cr_namelen value and has no '\0' character allocated
to it, so strlen() will overrun the char array till it finds the next
'\0' char.

This issue can be seen on the client side when "lfs changelog"
is run and 08RENME record types are present.

Pointer arithmetic was used between sname and name to avoid the
GCC 11 warnings mentioned in 6331eadbd6.

Added Andreas's safety/range check code to changelog_rec_sname.

Fixes: 6331eadbd6 ("LU-15420 uapi: avoid gcc-11 -Werror=stringop-overread")
Signed-off-by: Josh Samuelson <josh@1up.unl.edu>
Change-Id: Ie0817dfdd1d02e06b9399e66f1affaadb9e156c4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53624
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 lnet: don't assign unused return codes 08/53608/4
Arshad Hussain [Mon, 8 Jan 2024 09:02:24 +0000 (14:32 +0530)]
LU-17000 lnet: don't assign unused return codes

In lnet_peer_discovery() return from lnet_peer_ping_failed()
and lnet_peer_push_failed() is unused and return value of
former get quashed without getting used.

Remove rc assignment and cast function to void to make it
clear the return code can be ignored.

Test-Parameters: trivial
CoverityID: 412758 ("Unused Value")
CoverityID: 412759 ("Unused Value")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I02d5e883fc02814d5dbe307b78f028703023db52
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53608
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-9457 test: improve sanity 253 48/53548/3
Lai Siyao [Tue, 19 Dec 2023 08:24:07 +0000 (03:24 -0500)]
LU-9457 test: improve sanity 253

Improve sanity test_253: set high watermark to 50M, and fill OST with
fallocate.

Test-Parameters: trivial
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity,sanity,sanity,sanity,sanity,sanity,sanity env=EXCEPT=77c
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I85139d7fc0697d08c21bdb19432b40c8dab82ee9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53548
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16796 lnet: Change sfw_session to use refcount_t 38/53438/4
Arshad Hussain [Wed, 13 Dec 2023 09:20:47 +0000 (14:50 +0530)]
LU-16796 lnet: Change sfw_session to use refcount_t

This patch changes struct sfw_session to use
refcount_t instead of atomic_t

This patch also address checkpatch errors.

Test-Parameters: trivial testlist=lnet-selftest
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ifa77b8d9280756ce52c8f59d1d193a866f0ba8a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53438
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16796 ldlm: Change struct ldlm_resource to use refcount_t 16/53416/4
Arshad Hussain [Tue, 12 Dec 2023 07:16:19 +0000 (12:46 +0530)]
LU-16796 ldlm: Change struct ldlm_resource to use refcount_t

This patch changes struct ldlm_resource and
struct nrs_tbf_client to use refcount_t instead of atomic_t

This patch also only changes spaces to tabs which were close
to lines of code being changed.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic15f27bc6281725f00bddc465668f81291aad6ec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17354 osp: don't reset sequence client 06/53406/15
Alex Zhuravlev [Mon, 11 Dec 2023 15:15:40 +0000 (18:15 +0300)]
LU-17354 osp: don't reset sequence client

do not reset sequence client if sequence allocation returned an
error, instead try to to get sequence later upon reconnection.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie23b688e4f93651c4615d77a9686c44a150d3961
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53406
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17000 contrib: script to prepare coverity builds 00/53400/4
Timothy Day [Sun, 10 Dec 2023 22:58:09 +0000 (22:58 +0000)]
LU-17000 contrib: script to prepare coverity builds

Add script 'coverity-run' to semi-automate running
and submitting Coverity builds for Lustre. This
should make it much easier to reproducibly submit
builds to Coverity - and serve as an example of
how the Coverity build process works. It should
also provide more transparency in how builds are
being prepared for Coverity.

Add a Vagrantfile for the Vagrant VM used during
the build process.

Update in-tree Documentation.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I050b10d9df0e4e4c1b8bcc91a3c296c11f27ffef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53400
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17229 tests: rely on IR for replay-dual 33 67/53267/2
Etienne AUJAMES [Tue, 28 Nov 2023 13:26:15 +0000 (14:26 +0100)]
LU-17229 tests: rely on IR for replay-dual 33

test 33 seems to fail with a combined MDT0000 and MGT.

This patch failover MDT0001 instead of MDT0000 to keep the IR working
on the MGS.

Test-Parameters: testlist=replay-dual env=ONLY="33",ONLY_REPEAT=50
Test-Parameters: testlist=replay-dual
Test-Parameters: testlist=replay-dual
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ibf317283b005c103c5f28b7343a808fd25f992a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53267
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17308 mgs: move pool_cmd check to the kernel 02/53202/12
Etienne AUJAMES [Tue, 21 Nov 2023 19:01:43 +0000 (20:01 +0100)]
LU-17308 mgs: move pool_cmd check to the kernel

Several checks for pool_cmd need to be done before touching the MGS
configuration.

e.g: the following case should be denied before adding a destroy
record in the MGS configurations:
 - The pool does not exist
 - The pool is not empty (OSTs still in the pool)

This work is done in userspace (check_pool_cmd) by checking the client
lov parameters for pools. But nothing guarantees those parameters to
be in sync. So, only the MGS configuration should be trusted for that.

This patch move those checks in the kernel. There are several reasons
for this:
 - It guarantees the pool configurations consistency even if an
   external tool is used.
 - For standalone MGS, it limits the overhead of reading the
   configuration several times.

This patch add a "-n|--nowait" option for pool_cmd to skip waiting
for pool updates on the clients. This is useful when doing a lot of
pool_cmd in a raw. And this avoids cancelling clients CONFIG lock
each times (because of mgc_requeue_timeout_min).

e.g:
  lctl pool_destroy -n lustre.old
  lctl pool_new -n lustre.test
  lctl pool_add -n lustre.test OST0001
  ...
  lctl pool_add lustre.test OST0010

check_pool_cmd_result() is modified to compute the client wait delay
with mgc_requeue_timeout_min.

Add a regression test "ost-pools 2f".

Test-Parameters: testlist=ost-pools
Test-Parameters: testlist=ost-pools
Test-Parameters: testlist=ost-pools env=ONLY=2f,ONLY_REPEAT=50
Test-Parameters: testlist=ost-pools env=ONLY=2f,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ifbc49b5667bf17253716052a7480114936c65149
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53202
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17173 tests: fix security related tests 12/53012/13
Sebastien Buisson [Mon, 13 Nov 2023 10:03:38 +0000 (11:03 +0100)]
LU-17173 tests: fix security related tests

Several cleanups required in security related tests.

In sanity-krb5, in order to get proper access to keyrings, use su -
instead of runas to initialize process more completely.
Also fix use of 'lfs flushctx', as some tests do not call it properly.
And in test_8, avoid waiting arbitrarily and change fail_loc to just
sleep once.

In sanity-krb5 and sanity-sec, fix parameters passed to
start_gss_daemons().

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4598ae5a7d28afbc39d7cc2d0afd1096d877d03b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-16566 sptlrpc: remove rq_sepol from ptlrpc_request 45/52845/9
Etienne AUJAMES [Thu, 26 Oct 2023 19:28:55 +0000 (21:28 +0200)]
LU-16566 sptlrpc: remove rq_sepol from ptlrpc_request

This patch remove rq_sepol from ptlrpc_request to reduce the memory
consumption on the servers.

rq_sepol field is 327 bytes long allocated for each request and this
is rarely used (it needs SELinux activated with the send_sepol
feature).

The patch store the SELinux policy status string in a separate object.
The pointer is stored in ptlrpc_sec->ps_sepol and protected by RCU
(mostly read-only, the SELinux policy should rarely change).

When the policy status needs to be packed in a request, we take a
reference to the current ps_sepol object and release it after the
packing. If the policy has changed in the meantime, the object used
will be free after.

A read operation is added to srpc_sepol parameter to return the
SELinux policy string cached in Lustre.

Test-Parameters: testlist=sanity-selinux env=ONLY=21,ONLY_REPEAT=50
Test-Parameters: testlist=sanity-selinux env=ONLY=21,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I80fb76c97885c4b2987eb7f91a9bfe6e0e6e6c70
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52845
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17173 utils: cleanup lfs flushctx 04/52604/17
Sebastien Buisson [Mon, 13 Nov 2023 10:02:24 +0000 (11:02 +0100)]
LU-17173 utils: cleanup lfs flushctx

When lfs flushctx is called without mount points, build the list of
all mounts first, and then call the ioctl to flush associated
contexts. Otherwise fetching the mount points unfortunately refreshes
the contexts being flushed, because the mount points are being
accessed.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I75b9efe4c65ce66f5f692f9e49a28fde705d0140
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17173 gss: user keys go to user keyring 71/52771/14
Sebastien Buisson [Fri, 20 Oct 2023 08:27:14 +0000 (10:27 +0200)]
LU-17173 gss: user keys go to user keyring

Keys for root, that are used for Lustre internal processing, are
stored in the session keyring. That way they can be found by all
Lustre processes in userspace and in the kernel.
For end user keys, it is better to store them in the user keyring.
This simplifies key management, makes them shared accross all user
sessions, and avoids unfortunate key leak if lfs flushctx is not
called at user logout.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibb3d326e89dcacc89e77eca76cdb773861d3a8a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17078 ldlm: do not spin up thread for local cancels 92/52192/5
Patrick Farrell [Thu, 31 Aug 2023 00:07:03 +0000 (20:07 -0400)]
LU-17078 ldlm: do not spin up thread for local cancels

When doing lockless IO on the client, the server is
responsible for taking LDLM locks for each IO.

Currently, the server sends these locks to a separate
thread for cancellation.  This behavior is necessary on the
client where a lock may protect a large number of cached
pages, so cancelling it in a user thread may introduce
unacceptable delays.  But the server doesn't have cached
pages, so it makes more sense for the server to do the
cancellation in the same thread.

We do this by not spinning up an ldlm_bl thread for
cancellations of local (server side only) locks.

This improves 4K DIO random read performance by about 9%.

Without patch, maximum server IOPs on 4K reads:
2864k IOPS

With patch:
3118k IOPS

This is the maximum performance achieved with many clients
and client threads doing 4K random AIO reads from different
files.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia996732780d278c5d0bc290c5484e3bc325a347a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17029 lustre.spec.in: match rpm macro openEuler for openEuler Linux 54/51954/4
Xinliang Liu [Mon, 7 Aug 2023 10:18:49 +0000 (10:18 +0000)]
LU-17029 lustre.spec.in: match rpm macro openEuler for openEuler Linux

So that it can handle openEuler derived OSes, because different
derived OS has different vendor name, like KylinOS's vendor name
is Kylin.

Change-Id: I12ceda5bf9d1f17a75d4adddbad292fd1ae9967b
Test-Parameters: trivial
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17016 mdd: no EXDEV for parent dir projid mismatch 68/51868/17
Andreas Dilger [Fri, 4 Aug 2023 05:01:42 +0000 (23:01 -0600)]
LU-17016 mdd: no EXDEV for parent dir projid mismatch

Don't return EXDEV if the parent directory projid of a renamed
directory does not match the projid of the target dir.  Only the
projid of the source directory itself and the target matter.

Rename variables in mdd_rename_sanity_check() and mdd_rename()
so the object and attribute variable names are consistent.

Improve console error messages to contain more useful information.
Replace spaces with tabs in affected functions.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7aa53f6d168926719ad9fd5df3c760e6c73ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
2 months agoLU-17131 ldiskfs: Add Ubuntu 20.04.5 release 5.15 kernel 14/52414/6
Shaun Tancheff [Thu, 18 Jan 2024 04:30:34 +0000 (11:30 +0700)]
LU-17131 ldiskfs: Add Ubuntu 20.04.5 release 5.15 kernel

Add support for Ubuntu 20.04.5 5.15 kernel similar to el9.2
with updated patches:
    ext4-corrupted-inode-block-bitmaps-handling-patches.patch
    ext4-data-in-dirent.patch
    ext4-dont-check-before-replay.patch
    ext4-inode-version.patch
    ext4-mballoc-extra-checks.patch
    ext4-prealloc.patch
    ext4-filename-encode.patch

Tested with tag Ubuntu-hwe-5.15-5.15.0-91.101_20.04.1

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic1b4b0f25a9ac984186cf4f37b5a73d93af93ebd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52414
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17131 ldiskfs: Refresh ubuntu 5.11 server 13/52413/6
Shaun Tancheff [Sun, 14 Jan 2024 00:50:37 +0000 (17:50 -0700)]
LU-17131 ldiskfs: Refresh ubuntu 5.11 server

Refresh ext4-pdirop and ext4-delayed-iput,
Add
  ext4-filename-encode support
  ext4-add-periodic-superblock-update

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Icd066a4f507842312924f7c7818208d8f07c8c70
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52413
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17383 statahead: quit statahead with a long time wait 35/53535/5
Qian Yingjin [Fri, 22 Dec 2023 09:16:07 +0000 (04:16 -0500)]
LU-17383 statahead: quit statahead with a long time wait

If the thread is not doing stat for more than a time threshold
(@sbi->ll_sa_timeout, 30 seconds by default) then it probably does
not care too much about performance, or is no longer using this
directory.
Quit the statahead thread with a long time wait in this case.

This patch also fixes defects reported by Coverity Scan for
Lustre.

Fixes: e10bf68d7c3 ("LU-14361 statahead: regularized fname statahead pattern")
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia7c478268fe12eeefa6dfae1b3c94451f010d1d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17426 mdt: relax same MDT file rename lock 26/53726/5
Lai Siyao [Tue, 16 Jan 2024 01:33:22 +0000 (20:33 -0500)]
LU-17426 mdt: relax same MDT file rename lock

Allow cross-directory rename of regular files (strictly, any
non-directory) on the same MDT without holding the BigFilesystemLock
(BFL), as file renames cannot change the directory hierarchy.

This should improve the performance for these rename operations, and
reduce contention between local MDT file renames in different parts of
the directory tree.

Add "mdt.*.enable_parallel_rename_crossdir" parameter to disable
cross-directory file renames if there is an issue with this change.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I511b392e46c46140cac6aa3ede02bfe793729f7f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53726
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-13906 build: Conditionally require kmod-zfs-devel 56/46356/13
Shaun Tancheff [Fri, 6 Oct 2023 08:03:09 +0000 (03:03 -0500)]
LU-13906 build: Conditionally require kmod-zfs-devel

Server with zfs support requires either kmod-zfs-devel
or a configure that points to the required headers and
library files via configure.

Here we check the configure arguments for '--with-zfs-obj='
if the zfs path is specified for configure the package
requirement is not needed.

Otherwise require the kmod-zfs-devel package and require
one of libzfs-devel, libzfs4-devel or libzfs5-devel

HPE-bug-id: LUS-9743, LUS-10363
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia12239ac7e3912ff50ec7c8e2ceb888862afbc34
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46356
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
2 months agoLU-17385 tests: sanity-lfsck 23d fix and enable 91/53591/4
Alexander Zarochentsev [Thu, 4 Jan 2024 19:07:18 +0000 (19:07 +0000)]
LU-17385 tests: sanity-lfsck 23d fix and enable

lfsck "-t layout -o" requests lfsck runs on all MTDs,
the test needs to wait them all before the next
test starts.

Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Test-Parameters: trivial testlist=sanity-lfsck mdscount=2 mdtcount=4
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ida0bf876b60a73258a5a9bf392f96383c88adcb9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53591
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17444 utils: fix fd leak after conversion to llapi_root_path_open 36/53736/3
Dominique Martinet [Thu, 18 Jan 2024 20:46:10 +0000 (05:46 +0900)]
LU-17444 utils: fix fd leak after conversion to llapi_root_path_open

Conversions to llapi_root_path_open missed a few close() calls, leading
to fd leaks.

These should be obvious enough to regroup in a single commit.

Fixes: 7154244354e3 ("LU-16786 utils: Replace open call to WANT_FD")
Change-Id: I3af25ef2981367bfaea7f5280972f84bee09a5c2
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53736
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-17364 llite: don't use stale page. 50/53550/8
Alexey Lyashkov [Mon, 25 Dec 2023 11:52:35 +0000 (14:52 +0300)]
LU-17364 llite: don't use stale page.

using stale page for write might confuse a read path,
which expect any IO page have PG_uptodate flag set,
and it caused an panic with removing from IO.

Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ia01129ceaecf53d8d9f301c26cd2d65122f6a267
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53550
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-14361 statahead: increase the initial statahead count 34/53634/2
Qian Yingjin [Wed, 10 Jan 2024 09:25:21 +0000 (04:25 -0500)]
LU-14361 statahead: increase the initial statahead count

In this patch, we increase the initial stat-ahead count from the
default 8 to 64 during the fname statahead pattern test sanity/
test_123i. The origial starting statahead count is too small, may
result in that the statahead thread quits wrongly. This will fail
sanity/test_123i fairly often.

We also imporve aheadmany and use it to generate the fname stat()
workload to verify that fname statahead pattern works correctly.

Test-Parameters: mdtcount=4 mdscount=2 testlist=sanity env=ONLY=123i,ONLY_REPEAT=100
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I7d13120a9480ea5b2e53963789074429c414ff90
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53634
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17284 mdt: revalidate object for migration 87/53087/6
Alex Zhuravlev [Sat, 11 Nov 2023 12:21:48 +0000 (15:21 +0300)]
LU-17284 mdt: revalidate object for migration

if the source object is remote, then we should revlidate it
once the object's ldlm lock is granted. otherwise we can't
use the object's attributes:
lu_object_attr())
ASSERTION( ((o)->lo_header->loh_attr & LOHA_EXISTS) != 0

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9896cdd011f858091ac68b50b74e2f1f027f7331
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 months agoLU-16637 llite: tolerate fresh page cache pages after truncate 54/53554/4
Andrew Perepechko [Tue, 26 Dec 2023 17:02:12 +0000 (20:02 +0300)]
LU-16637 llite: tolerate fresh page cache pages after truncate

Truncate called by ll_layout_refesh() can race with a fast read
or tiny write, which can add an uninitialized non-uptodate page
into the page cache.

We want to avoid expensive locking for this rare case so if there
is any leftover in the cache after truncate, just check that
the pages are not uptodate, not dirty and do not have any
filesystem-specific information attached to them.

Change-Id: I8cadc022a3d1822a585f32e1a765e59ad0ff434d
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11937
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53554
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-13307 nodemap: have nodemap_add_member support large NIDs 35/53135/12
James Simmons [Sun, 7 Jan 2024 15:13:38 +0000 (08:13 -0700)]
LU-13307 nodemap: have nodemap_add_member support large NIDs

Currently when mounting lustre using IPv6 address fails with

Lustre: 27361:0:(nodemap_handler.c:395:nodemap_add_member())
  lustre-MDT0000: error adding to nodemap, no valid NIDs found
LustreError: 11-0: lustre-MDT0000-osp-MDT0003:
  operation mds_connect to node 0@lo failed: rc = -22

This was due to no nodemap being set so the ptlrpc layer was not
seeing any new peers. Adding minimal support to nodemap allows
mounting.

Change-Id: If9cfe88ec92afc3f14788f3f3ded8387a1b5d8c7
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16861 obdfilter: Exclude quotes when getting NIDs 20/53620/9
Arshad Hussain [Tue, 9 Jan 2024 06:12:57 +0000 (11:42 +0530)]
LU-16861 obdfilter: Exclude quotes when getting NIDs

In get_targets(), when getting NIDs the quotes were also included.
Exclude quotes when generating NIDs as they are not required.

Use $LCTL instead of $lctl, and make it also work in Janitor testing.

Test-Parameters: trivial testlist=obdfilter-survey
Fixes: 9ef9906d7 ("LU-6863 tests: change obdfilter-survey.sh for CLIENTONLY mode")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8642539fc6b396f1339e20e4fef8bc78cda2d969
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17054 lnet: use GFP_KERNEL for alloc w/o spinlock 96/53596/5
Andreas Dilger [Thu, 4 Jan 2024 21:32:13 +0000 (14:32 -0700)]
LU-17054 lnet: use GFP_KERNEL for alloc w/o spinlock

Do not use genradix_ptr_alloc(GFP_ATOMIC) when not allocating
under a spinlock in lnet_cpt_of_nid_show_start(), since this
puts unnecessary strain on the atomic memory pools.  This
function grabs mutex_lock(&the_lnet.ln_api_mutex) so the caller
cannot be holding a spinlock at the time.

Fix minor code style issues in this function.

Fixes: 466e25a6a3 ("LU-17054 lnet: Change cpt-of-nid to get result from kernel")
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I091959940bffadc380bff9329bb83e8b099ed63f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17394 libcfs: print cfs_fail_val when fail_loc hit 85/53585/3
Andreas Dilger [Thu, 4 Jan 2024 05:20:35 +0000 (22:20 -0700)]
LU-17394 libcfs: print cfs_fail_val when fail_loc hit

Add some more information to the console message when fail_loc is hit.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I99fe4524f3764b068c96965c0b86bd4d7b341707
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53585
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
3 months agoLU-17172 lov: include FID in some lov asserts 02/52602/3
John L. Hammond [Thu, 4 Nov 2021 16:12:57 +0000 (11:12 -0500)]
LU-17172 lov: include FID in some lov asserts

Include the file FID in the assertions in lov_entry() and
lov_mirror_entry(). Use these two functions more consistently in the
lov layer.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I65978fe409842289c158021fb1b8042916d90e23
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16791 utils: ZFS 2.2 const prop args 19/52519/7
Brian Atkinson [Tue, 26 Sep 2023 18:35:43 +0000 (12:35 -0600)]
LU-16791 utils: ZFS 2.2 const prop args

ZFS 2.2 now expects const char * from certain interfaces in
sys/nvpair.h. I updated the build system to detect if this is the case
and if so update the paramters passed to certain functions in
libmount_utils_zfs.c to account for these changes.

Without this patch, Lustre master would not build with ZFS master and
the 2.2 release candidates.

Signed-off-by: Brian Atkinson <batkinson@lanl.gov>
Change-Id: I0469eeff6dafa6c276fc616381530b6b679d9da1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52519
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Akash B <akash-b@hpe.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
3 months agoLU-17351 ldiskfs: RHEL 9.3 ldiskfs server 94/53394/7
Shaun Tancheff [Thu, 4 Jan 2024 23:34:37 +0000 (15:34 -0800)]
LU-17351 ldiskfs: RHEL 9.3 ldiskfs server

Updated patch series for el9.3 needs an updated
ext4-data-in-dirent

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.2 testlist=sanity

Test-Parameters: trivial mdtcount=4 mdscount=2 \
  clientdistro=el9.2 serverdistro=el9.3 testlist=sanity

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el9.3 serverdistro=el9.3 \
  testgroup=full-part-3

HPE-bug-id: LUS-12050
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Iac9731570422c57ef494602b1a40ac0b3d87d991
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53394
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>