Whamcloud - gitweb
fs/lustre-release.git
3 weeks agoLU-17653 sec: avoid unlocking unlocked page 34/54734/6
Shaun Tancheff [Thu, 11 Apr 2024 14:03:30 +0000 (22:03 +0800)]
LU-17653 sec: avoid unlocking unlocked page

If a page is unlocked by @cl_2queue_disown after explictly write the
newly modified page, the following page_unlock must not be performed.

Track the page locked state and do not unlock pages which
are not locked in ll_io_zero_page()

Fixes: adf46db962 ("LU-12275 sec: support truncate for encrypted file")
Test-Parameters: testlist=sanity-sec clientdistro=el9.3 env=ONLY=48a,ONLY_REPEAT=10
Test-Parameters: testlist=sanity-sec clientdistro=el9.3
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I6e121920c7e86e4d0004def77b0ce066ae2ba81a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-10003 lnet: migrate fail nid to Netlink 51/55051/5
James Simmons [Thu, 23 May 2024 21:25:30 +0000 (17:25 -0400)]
LU-10003 lnet: migrate fail nid to Netlink

We have the ability to make peers fail when they reach a specific
threshold using an ioctl that currently only uses small NIDs.
Move to Netlink to be able to use large NIDs. Also the Netlink
code is written to support more than one peer at a time even if
the original user land tool only supports setting one peer at a
time.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I8e5b38fcb582624530d208fac731183488662138
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55051
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17450 test: disable test 56x 56xa 56xb in sanity 62/54962/3
Hongchao Zhang [Mon, 15 Apr 2024 19:48:38 +0000 (03:48 +0800)]
LU-17450 test: disable test 56x 56xa 56xb in sanity

Add the interop tests 56x, 56xa, 56xb into always_except before
the patch https://review.whamcloud.com/53997 in LU-17525 is landed.

Test-Parameters: trivial testlist=sanity
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I99fa7be9dc7f50113d463aea4b321502b31d7348
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54962
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16025 llite: allow unaligned DIO reaching EOF 18/54718/10
Bobi Jam [Wed, 10 Apr 2024 09:19:53 +0000 (17:19 +0800)]
LU-16025 llite: allow unaligned DIO reaching EOF

Direct IO requires file offset and iov_iter count be page aligned, if
server does not support unaligned DIO.

For old servers, they do not have OBD_CONNECT2_UNALIGNED_DIO support,
and be deemed as not supporting unaligned DIO.

Since mirror resync would use direct IO to read data from a mirror,
and if the file size is not page aligned, the last read iov_iter
would be truncated by commit 4468f6c9d9 and would contain unaligned
iov_iter count, so it would fail with old servers.

This patch fixes this interop issue by allowing unaligned DIO
reaching the end of the file.

Test-Parameters: testlist=sanity-sec serverversion=EXA6
Fixes: 7194eb6431 ("LU-13805 clio: bounce buffer for unaligned DIO")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I229e193c3f0df0c21284991809573e312d18a556
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54718
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17676 build: configure should prefer to ask if 71/54571/2
Shaun Tancheff [Tue, 26 Mar 2024 08:04:24 +0000 (15:04 +0700)]
LU-17676 build: configure should prefer to ask if

In general configure messages should ask 'if <something>'
as configure is asking the question and trying to automatically
determine the answer.

If most cases prefer 'if <something>'

This updates configure messages to ask if ...

Test-Parameters: trivial
HPE-bug-id: LUS-12117
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I11a42583faf2f88194c93a9aeea3b64f0d95f0eb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17836 build: allow builds without libpthread 62/55062/2
Shaun Tancheff [Thu, 9 May 2024 10:10:54 +0000 (17:10 +0700)]
LU-17836 build: allow builds without libpthread

Configure currently allows for --disable-libpthread it is not
frequently used but may be needed for some users.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I32049bab8e0f278b4c80fe37839c8c90c45d4c74
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55062
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16327 build: Ubuntu jammy 5.19 server support 25/50225/18
Shaun Tancheff [Sat, 2 Mar 2024 13:08:15 +0000 (20:08 +0700)]
LU-16327 build: Ubuntu jammy 5.19 server support

Ubuntu 5.19 server ldiskfs series is close to the
with mainline LTS 6.1.38 kernel.

Updated for Jammy 5.19.0-46-generic kernel
   ext4-mballoc-extra-checks.patch
   ext4-prealloc.patch

Tested with Ubuntu-hwe-5.19-5.19.0-46.47_22.04.1
Ubuntu Jammy 5.19.0-46-generic kernel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iff2f3b29a7cf4778abb69505143ca2ea32022edf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50225
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 weeks agoLU-15496 tests: fix sanity/398c to use proper OSC name 32/55132/4
Andreas Dilger [Thu, 16 May 2024 19:57:42 +0000 (21:57 +0200)]
LU-15496 tests: fix sanity/398c to use proper OSC name

For ppc64le and aarch64 clients, the OSC import instance name does
not have "ffff" at the start, so use the proper device name for this
subtest.

Clean up the rest of test_398c to meet modern test code style.

Test-Parameters: trivial testlist=sanity env=ONLY=398c clientarch=ppc64le clientdistro=el8.8
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8c72fa9b13eace009f39daf82454221eba6761b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55132
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 target: remove some LCONSOLE_ERROR_MSG() 13/55113/2
Timothy Day [Wed, 15 May 2024 03:41:59 +0000 (03:41 +0000)]
LU-17844 target: remove some LCONSOLE_ERROR_MSG()

Replace LCONSOLE_ERROR_MSG() with LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8de0221e29c8ec70759eea38a67001f283f6fe39
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55113
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17848 osd: deduplicate osd_fid_init()/fini()/alloc() 10/55110/6
Timothy Day [Tue, 14 May 2024 20:59:16 +0000 (20:59 +0000)]
LU-17848 osd: deduplicate osd_fid_init()/fini()/alloc()

These functions are identical in the two OSD implementations. This
can be moved to lustre/fid/ and made generic.

These functions are forced to live in fid.ko rather than obdclass.ko
due to module dependency issues.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If97ca5615d9bdfe0fe9886686e9ce3ec2b740f7d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17850 build: prefer LINUXRELEASE over uname -r 08/55108/2
Jian Yu [Tue, 14 May 2024 17:50:20 +0000 (10:50 -0700)]
LU-17850 build: prefer LINUXRELEASE over uname -r

In a container or chroot environment, "uname -r" reports
the host instead of the target kernel version. We should
use the LINUXRELEASE variable which is configured in
config/lustre-build-linux.m4 with the value from UTS_RELEASE.

Change-Id: Iaa48027f5ae873e1298695a264db1c351d9eac5c
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: ake sandgren <ake.sandgren@hpc2n.umu.se>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17848 osd: purge key_rec() from dt API 99/55099/2
Timothy Day [Tue, 14 May 2024 02:19:28 +0000 (02:19 +0000)]
LU-17848 osd: purge key_rec() from dt API

This is a pointless function pointer field that has
spawned a number of pointless function implementations.
Even the documentation has no idea why this exists.

Remove everything to do with key_rec().

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7f84853a3fa285bf2ac53661b30384d099be1b91
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 lnet: remove a few LCONSOLE_ERROR_MSG() in api-ni.c 98/55098/2
Timothy Day [Tue, 14 May 2024 01:23:34 +0000 (01:23 +0000)]
LU-17844 lnet: remove a few LCONSOLE_ERROR_MSG() in api-ni.c

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I629ef2ceaa51dc1422d87dc056de2c46079438c0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 lnet: remove a few LCONSOLE_ERROR_MSG() 97/55097/2
Timothy Day [Tue, 14 May 2024 01:12:05 +0000 (01:12 +0000)]
LU-17844 lnet: remove a few LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I6c1d46449487127545d785a9fdc368005197d3e2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17847 sec: wake up for rsc entry 94/55094/2
Yang Sheng [Mon, 13 May 2024 14:44:16 +0000 (22:44 +0800)]
LU-17847 sec: wake up for rsc entry

We should wake up the waiter after rsc do_upcall.
Otherwise it may be stuck for a long time.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I87d1e5a9687056c8ee2428aad45dafda16247de2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17844 lnds: remove a few LCONSOLE_ERROR_MSG() 85/55085/3
Timothy Day [Mon, 13 May 2024 03:51:16 +0000 (03:51 +0000)]
LU-17844 lnds: remove a few LCONSOLE_ERROR_MSG()

I doubt these magic numbers help anyone.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7c2505ec0eb7fc6524a13d4bf330a72188a26b4e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55085
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16518 lnet: fix incorrectly initialized variables 84/55084/2
Timothy Day [Mon, 13 May 2024 03:39:08 +0000 (03:39 +0000)]
LU-16518 lnet: fix incorrectly initialized variables

Clang 12 complained about an uninitialized 'off' in
brw_test.c, fixed by removing the dual declaration.

Also, init 'rc' in yaml_import_global_settings().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I893149110120975c91839e73241b311a53c6e195
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55084
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17490 tests: update .gitignore 82/55082/2
Timothy Day [Mon, 13 May 2024 03:27:11 +0000 (03:27 +0000)]
LU-17490 tests: update .gitignore

Otherwise, we'll see this monitor_lustrefs binary in the
build tree.

Fixes: 7101742 ("LU-17490 tests: verify fanotify works for lustre")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I129c12515e607e97ab42917220a439ebb1823e8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17646 llapi: lustreapi: add FID in error messages 74/55074/2
Alexandre Ioffe [Sat, 11 May 2024 01:28:05 +0000 (18:28 -0700)]
LU-17646 llapi: lustreapi: add FID in error messages

Use llapi_fd2fid() to print FID in llapi_lease_set() and
llapi_lease_check() error messages.

Test-Parameters: trivial
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iac97ea721860652e304c674007ac7646d183e2fd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17841 kfilnd: Race between hello and tagged RMA 72/55072/2
Chris Horn [Fri, 3 May 2024 19:22:12 +0000 (13:22 -0600)]
LU-17841 kfilnd: Race between hello and tagged RMA

A race exists between processing an incoming hello and initiating the
RMA for bulk operations that can result in RKEY re-use.

Initiator:
Posts tagged receive with RKEY based on peerA::kp_local_session_key X
and tn_mr_key Y
Bulk request (1) sent to target
Some earlier transaction fails:
 - Deletes peerA::kp_local_session_key X
 - Creates peerA::kp_local_session_key Z
 - HELLO request send to peerA

Target:
Processes HELLO request - updates kp_remote_session_key from X to Z.
Handles bulk request (1)
Performs RMA using session key Z and tn_mr_key Y, but completion is
delayed

Initiator:
Bulk request (1) hits timeout
 - Tagged receive canceled, and tn_mr_key Y is released
Posts tagged receive with RKEY based on peerA::kp_local_session_key Z
and tn_mr_key Y
Bulk request (2) sent to target

Target:
RMA for (1) is completed using the RKEY for (2)

The solution is to create a new bulk request message that contains
the session key used to set up the tagged buffer on the initiator.
This is compared against the session key exchanged during hello
handshake prior to initiating the RMA. If there's a mismatch
then the RMA is failed and the transaction is finalized. The session
key stored in the new bulk request is also used to generate the RKEY
rather than using the session key stored in the kfilnd_peer. This is
a protocol change so the KFILND_MSG_VERSION is bumped.

During testing it was found that the kfilnd_msg::version was not
being set correctly for immediate and bulk messages. To allow interop
the kfilnd_msg::version must be set to the handshaked negotiated
version that is stored in kfilnd_peer::kp_version. This has been
fixed. This issue only impacts kfilnd peers with message version > 1,
so backwards compatability between versions 1 and 2 will work
correctly.

The KFILND_TN_DEBUG macro is modified to print additional information
that was useful when debugging this issue.

Lastly, the TN_EVENT_TAG_TX_OK was missing from tn_event_to_str(), so
this is added.

HPE-bug-id: LUS-12317
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0b52a8367cd45b7587ba9ec3fa5212f548bebb57
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17840 kfilnd: Race between peer del RKEY reuse 71/55071/2
Chris Horn [Wed, 1 May 2024 16:33:33 +0000 (11:33 -0500)]
LU-17840 kfilnd: Race between peer del RKEY reuse

kfilnd_peer object deletion is a two step process. First a flag
(kfilnd_peer::kp_remove_peer = 1) is atomically set in the object to
mark it for removal via a call to kfilnd_peer_del(). Then, the next
caller of kfilnd_peer_put() will atomically modify this flag
(kfilnd_peer::kp_remove_peer = 2) again to denote that it is removing
the peer from the rhashtable before actually removing the object.

The window between marking a peer for deletion and removing it from
the peer cache allows a race where an RKEY may be re-used. For
example:

Thread 1: Posts tagged receive with RKEY based on
      peerA::kp_local_session_key X and tn_mr_key Y
Thread 1: Cancels tagged receive
Thread 1: kfilnd_peer_del() -> peerA::kp_remove_peer = 1
Thread 2: kfilnd_peer_put() -> peerA::kp_remove_peer = 2
Thread 1: kfilnd_peer_put() -> kfilnd_tn_finalize() -> releases
tn_mr_key Y
Thread 3: allocates tn_mr_key Y
Thread 3: Fetches peerA with kp_local_session_key X
Thread 2: Removes peerA from rhashtable

At this point, thread 3 has the same RKEY used by thread 1.

The fix is to check on the peer lookup path whether a peer found in
the rhashtable has been marked for removal. If it has then we perform
the lookup again. We do this in a loop until either no peer is found,
or a peer is found that has not been marked for removal.

To reduce the size of this window, the process for kfilnd_peer
deletion is modified so that the first thread to call
kfilnd_peer_del() will also remove the peer from the rhashtable.

HPE-bug-id: LUS-12312
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibbbb38cd5ee2d90956791f8350dafbee5fe5d888
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55071
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17839 kfilnd: Wait for hello response to mark peer uptodate 70/55070/2
Chris Horn [Wed, 15 Nov 2023 19:22:24 +0000 (12:22 -0700)]
LU-17839 kfilnd: Wait for hello response to mark peer uptodate

We need to ensure that a target peer has processed a hello request
from the sender before initiating network transactions. This can be
positively affirmed iif we receive a hello response message from
the target.

There are two issues where messages may be dropped because hello
request or response has not been processed.

Issue 1 - Race:
A@kfi -> HELLO REQ -> B@kfi
A@kfi <- HELLO REQ <- B@kfi
A@kfi processes HELLO REQ, marks B@kfi uptodate
A@kfi -> MSG -> B@kfi
A@kfi -> HELLO RSP -> B@kfi

MSG is dropped by B@kfi because it did not process A@kfi's HELLO REQ
or RSP.

Issue 2 - HELLO target already considers originator as uptodate
A@kfi -> HELLO REQ -> B@kfi
B@kfi processes HELLO REQ
A@kfi <- MSG <- B@kfi
A@kfi <- HELLO RSP <- B@kfi

MSG is dropped by A@kfi because it did not process B@kfi's HELLO RSP.

We resolve the first race by waiting for the hello responses to
be processed before marking the peer as uptodate. To ensure that
we will always receive a hello response, the target of a hello request
must initiate its own handshake with the originator. When we receive
a hello request from a new peer then instead of setting the peer state
to KP_STATE_UPTODATE we instead set it to KP_STATE_WAIT_RSP. We can
process RX events for peer in this state, but sends to this peer will
be throttled until we receive a hello response from it.

To resolve the second race we need an additional change to allow
TN_EVENT_RX_OK events to be replayed until the hello response is
received and processed. However, this could result in state changes
that invalidate RX_OK events on replay. Thus, this race will remain
open.

Add CFS_KFI_REPLAY_RX_HELLO_REQ fail_loc to delay the processing of
an incoming hello request.

Add CFS_KFI_FAIL_MSG_TYPE_EAGAIN to delay the sending of specified
message types.

HPE-bug-id: LUS-11673
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaaa6b4a533dbcf13cd2a8c1365a89ba521d70af0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17838 kfilnd: Prevent simultaneous hellos 69/55069/2
Chris Horn [Tue, 14 Nov 2023 16:35:15 +0000 (09:35 -0700)]
LU-17838 kfilnd: Prevent simultaneous hellos

There is a race condition with checking, setting and clearing the
kp_hello_pending flag that can result in multiple hello requests being
sent for the same peer. If no hello response is received after the
LND timeout then multiple threads can race with each other in
clearing the kp_hello_pending flag and posting a new hello request
message.

Thread 1: sets kp_hello_pending and posts hello request message
<No hello response received after LND timeout>
Thread 2: Clears kp_hello_pending, then sets kp_hello_sending
Thread 3: Clears kp_hello_pending, then sets kp_hello_sending
Thread 2/3: Both post hello request message

To resolve this issue we change kp_hello_pending from a simple binary
to instead track three states of a hello request: KP_HELLO_NONE,
KP_HELLO_INIT, and KP_HELLO_SENT. State is NONE when there is no
hello in the process of being sent. State is INIT when a thread is
allocating a HELLO request in preparation for sending. State is SENT
when the HELLO request is being posted. Now, when some threads detect
that we have not received hello response after LND timeout seconds
then only one of them will be able to transition to the hello state
from SENT -> NONE.

Add CFS_KFI_REPLAY_IDLE_EVENT fail_loc that can be used to delay
processing of TNs in the idle state depending on the TN event
value specified in fail_val.

HPE-bug-id: LUS-11974
Test-Parameters: trivial
Fixes: 11a32d886b ("LU-16213 kfilnd: Allow one HELLO in-flight per peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4dddf57971848a80a550df7523d55ad03f4a083e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55069
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17837 kfilnd: Set dev_cpt 68/55068/2
Ron Gredvig [Fri, 20 Oct 2023 19:46:48 +0000 (19:46 +0000)]
LU-17837 kfilnd: Set dev_cpt

The dev_cpt value was not being set by kfilnd.

Query the kfabric provider to get the low level
device. Using the device, determine the dev_cpt.

This change is backwards compatible with older
versions of the kfabric provider. If the query
is not supported the dev_cpt is set to
CFS_CPT_ANY.

HPE-bug-id: LUS-11352
Test-Parameters: trivial
Signed-off-by: Ron Gredvig <ron.gredvig@hpe.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id8af36b7aa5e89969de93dc8db9c0bba03236140
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55068
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-15988 osp: don't print nid on -ESTALE 49/55049/2
Lai Siyao [Fri, 3 May 2024 00:27:04 +0000 (20:27 -0400)]
LU-15988 osp: don't print nid on -ESTALE

Osp_send_update_req() should not access import upon -ESTALE, because
this MDT may be in umount.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibd869e4e8da4f90ffd608a36d866264d5d552d0e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55049
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17000 obdclass: Add NULL check for parms under class_exp2cliimp 30/55030/3
Arshad Hussain [Tue, 7 May 2024 05:29:03 +0000 (01:29 -0400)]
LU-17000 obdclass: Add NULL check for parms under class_exp2cliimp

This patch adds NULL pointer check for parameters
passed under class_exp2cliimp()

Test-Parameters: trivial
CoverityID: 424699 ("Dereference before null check")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie7d96c10086959a3f31b290d56621261da480a36
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55030
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17817 llapi: avoid potential NULL component 28/55028/4
Rajeev Mishra [Mon, 6 May 2024 20:12:54 +0000 (20:12 +0000)]
LU-17817 llapi: avoid potential NULL component

Avoid potential NULL dereference for component issue in
llapi_layout_file_open() and llapi_layout_file_comp_add()

CoverityID: 425352 ("Dereferencing 'comp', which is known to be NULL")
HPE-bug-id: LUS-12326
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: Id773fdbf031a2d11256140590f570f90da46ec3a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55028
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17816 llapi: ensure pool name is nul terminated 18/55018/2
Shaun Tancheff [Mon, 6 May 2024 09:26:22 +0000 (16:26 +0700)]
LU-17816 llapi: ensure pool name is nul terminated

strncpy() usage is inconsistent about the size of pool name
and sometimes for get to ensure a nul byte is placed at the
end of the copy.

CoverityID: 397181 ("Buffer not null terminated (BUFFER_SIZE)")

Also cleanup a case of checking that an unsigned value >= 0

CoverityID: 397820 ("Unsigned compared against 0 (NO_EFFECT)")

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idec7adaf89c9dabc0275687c4a069fc8fa63e7a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55018
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17504 libcfs: safer LIBCFS_ALLOC 15/55015/2
Shaun Tancheff [Mon, 6 May 2024 05:11:15 +0000 (12:11 +0700)]
LU-17504 libcfs: safer LIBCFS_ALLOC

Make the LIBCFS_ALLOC() family of macros safer by adding
parenthesis around arguments such as (size) to avoid uninteded
expansion.

CoverityID: 415056 ("Integer handling issues")

Fixes: 718e3f3e68 ("LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I9701f87025bc5ce038a6bf34413b64a3f019d998
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55015
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17815 tests: skip conf-sanity.sh test_5h 12/55012/4
Emoly Liu [Mon, 6 May 2024 03:15:37 +0000 (11:15 +0800)]
LU-17815 tests: skip conf-sanity.sh test_5h

Skip conf-sanity.sh test_5h because it always caused test_102 and
test_108 failure in recent interop testing.

Test-Parameters: trivial serverbuildno=170 serverjob=lustre-b2_12 serverdistro=el7.9 testlist=conf-sanity env=ONLY="5h 102 108",HONOR_EXCEPT=y
Test-Parameters: trivial testlist=conf-sanity

Fixes: d1b5146eda ("LU-12206 mdt: mdt_init0 failure handling")

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id6ffe8b5d88e1d79883cbf2d84d73796945fc734
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17791 build: use external o2ib path for ko2iblnd.ko 84/54984/2
Shaun Tancheff [Thu, 2 May 2024 09:20:49 +0000 (16:20 +0700)]
LU-17791 build: use external o2ib path for ko2iblnd.ko

The O2IBPATH variable was split into INT_O2IBPATH used
for in-kernel o2iblnd and EXT_O2IBPATH for the external
o2iblnd driver.

Correct a case where the transtion from @O2IBPATH@ to
@EXT_O2IBPATH@ was missed when support for multiple lnds
deb packaging was initially added.

Fixes: 95287378fab ("LU-16967 build: Separate lnet LND deb packaging")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I94ff393a437c6875cda9db266ab636fd88871188
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17756 lod: add tunable lod.*.max_stripes_per_mdt 45/54945/4
Lai Siyao [Thu, 25 Apr 2024 08:15:49 +0000 (04:15 -0400)]
LU-17756 lod: add tunable lod.*.max_stripes_per_mdt

Add a tunable lod.*.max_stripes_per_mdt for directory overstriping.
The default value is LMV_MAX_STRIPES_PER_MDT(5).

Add sanity 300uh 300ui.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id8199f01f5e2d62ead6bf43d239eee8ec1e4cbb5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54945
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17431 utils: adapt dynamic use in nodemap_cmd 00/55000/2
Sebastien Buisson [Tue, 30 Apr 2024 16:08:22 +0000 (18:08 +0200)]
LU-17431 utils: adapt dynamic use in nodemap_cmd

In nodemap_cmd(), try to detect if we are running on an MGS
before using the dynamic parameter.

Test-Parameters: trivial
Fixes: fecc3bd4e2 ("LU-17431 utils: add 'dynamic' parameter to nodemap_cmd")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I63a727491c839e457e44eaf1f4b4d11b164fd8b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17431 utils: fix various ret codes in lctl 01/54501/6
Sebastien Buisson [Wed, 13 Mar 2024 13:19:25 +0000 (14:19 +0100)]
LU-17431 utils: fix various ret codes in lctl

When nodemap_cmd() returns an error, use errno to print
correct return code.
Make get_mgs_device() return an errno in case of failure.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I74f6e27fc17158bf454f0d8be490a087aa137079
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-17431 nodemap: sanity check ioctl user buffer 28/54928/4
Sebastien Buisson [Fri, 26 Apr 2024 14:49:17 +0000 (16:49 +0200)]
LU-17431 nodemap: sanity check ioctl user buffer

In server_iocontrol_nodemap(), user data is copied into a struct
lustre_cfg. Then this data must be sanity checked, by calling
lustre_cfg_sanity_check().

CoverityID: 425252 ("Passing tainted expression lcfg->lcfg_buflens to lustre_cfg_string")
CoverityID: 397130 ("Passing tainted expression lcfg->lcfg_buflens")
Fixes: 72734cf178 ("LU-17431 ptlrpc: move nodemap related ioctls to ptlrpc")

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I268b53fc0e977716ffd1985d145dc27b6acccf94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17649 ptlrpc: fix -EACCES connection error handling 48/54448/13
Mikhail Pershin [Mon, 18 Mar 2024 15:37:02 +0000 (18:37 +0300)]
LU-17649 ptlrpc: fix -EACCES connection error handling

Connection errors -EACCES and -EROFS leave import in
intermediate state. It is still active as well as pinger
over it but has obd_no_recov set. That allows import to
recover after all if server security is updated. But even
in FULL state any RPC over import gets -ESHUTDOWN as
obd_no_recov is set

Meanwhile obd_no_recov is not supposed to be used in that
way, it reflects particular mount option and should not
be recovered ever. So patch sets import to deactive state
instead, making import not operational too but with
option to be activated manually or remounted

Server connections like LWP, MDT-OST and MDT-MDT are
excluded and are never deactivated. Such errors are
considered as temporary until remote target updates own
security as required or administrative intervention will
restart target as needed.

In both cases console message is issued.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib83e1b0ac541823ec236591f08145340d6f6bf04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54448
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16314 tests: enable debug_raw_pointers on mount 54/54254/8
Shaun Tancheff [Wed, 17 Apr 2024 08:44:12 +0000 (15:44 +0700)]
LU-16314 tests: enable debug_raw_pointers on mount

When the MGS is mounted:
  do_facet mgs "$LCTL set_param -P debug_raw_pointers=Y"

So debug_raw_pointers need only be set once instead of
enabled and distabled for each test.

Switching kptr_restrict for every node on every test (twice)
does not add value when testing on dedicated test VMs.

This adds a KPTR_ON_MOUNT to allow a less restrictive setting
during test-framework setupall()/cleanall().

The initial kptr restrict values are persisted to and restored
from a well-known temporary file $TMP/kptr-$PPID-env

The patch enables KPTR_ON_MOUNT by default.

HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4d8975f26e57ea064608663f309400d09406d500
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 weeks agoLU-17342 o2ib: build without Module.symvers 58/53358/3
Timothy Day [Thu, 7 Dec 2023 05:12:57 +0000 (05:12 +0000)]
LU-17342 o2ib: build without Module.symvers

When building against an external kernel tree, the
configure script fails if there isn't a Module.symvers
available. This prevents us from using the
'modules_prepare' make target on the kernel tree.
ko2iblnd.ko can be build even without Module.symvers.
Hence, downgrade this message from an error to a
warning.

Also, don't fail if ko2iblnd can't be built. Just
emit a warning.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8bca7f945c753fdac3aa5d9889d3347613baf059
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16819 build: use mofed path based on target kernel 37/50937/9
Ake Sandgren [Thu, 11 May 2023 06:48:32 +0000 (08:48 +0200)]
LU-16819 build: use mofed path based on target kernel

Instead of using "uname -r", which limits builds to the currently
running kernel, use the target kernel which is available in
LINUXRELEASE, if the directory is available.
Building for a specific kernel is common practice when using DKMS.

Test-Parameters: trivial
Signed-off-by: Ake Sandgren <ake.sandgren@hpc2n.umu.se>
Change-Id: Ifce912061a74fc5b7435cd940105190f0c3cd544
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50937
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16350 ldiskfs: Server support for LTS linux v6.1 60/52260/13
Shaun Tancheff [Thu, 25 Apr 2024 15:38:20 +0000 (22:38 +0700)]
LU-16350 ldiskfs: Server support for LTS linux v6.1

Keep LTS kernel support and very recent kernel
ldiskfs series. Squash older series and drop
any unused patches.

Dropping 5.8 and 5.9 non LTS kernel series
Adding patches with kernel version that originated
the change
   linux-5.18/ext4-lookup-dotdot.patch
   linux-6.0/ext4-data-in-dirent.patch
   linux-6.0/ext4-pdirop.patch
   linux-6.1/ext4-dont-check-before-replay.patch
   linux-6.1/ext4-mballoc-extra-checks.patch
   linux-6.1/ext4-prealloc.patch
refresh linux-5.16/ext4-misc.patch to use strscpy instead of strlcpy

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Id747e200f5d3f50475094ee5ad948c389cce3184
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52260
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-11085 ldlm: convert ldlm extent locks to linux extent-tree 92/41792/17
Mr NeilBrown [Fri, 21 Aug 2020 00:28:53 +0000 (10:28 +1000)]
LU-11085 ldlm: convert ldlm extent locks to linux extent-tree

As Linux has a fully customizable extent tree implementation, use that
instead of the one in lustre.  This removes the need to store the
extent endpoints in the lock twice, thus recovering some of the space
wasted in a previous patch.

It also allows iteration loops to be in-line rather than requiring a
callback - though in some cases we keep the callback.

Note that interval_expand() will not expand the lower boundary down if
the tree is not empty.  We now make that explicit in the loop in
ldlm_extent_internal_policy_granted().  Consequently testing of
'conflicting > 4' is irrelevant.

Linux extent-trees does not have a direct equivalent to
interval_is_overlapped(), however we can use extent_iter_first() to
achieve the same effect.

We ask for the first interval in the tree that covers the range of the
given interval with extent_iter_first().  If nothing is returned, then
nothing in the tree overlaps the interval and interval_is_overlapped()
would return false.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie28c6fb0d40d2c92c7067c7a79f48ee1fc633ce9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41792
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-11085 ldlm: move interval_insert call from ldlm_lock to ldlm_extent 21/34021/18
NeilBrown [Fri, 9 Aug 2019 17:10:03 +0000 (13:10 -0400)]
LU-11085 ldlm: move interval_insert call from ldlm_lock to ldlm_extent

Moving this call results in all interval-tree handling code
being in the one file. This will simplify conversion to
use Linux interval trees.

The addition of 'struct cb' is a little ugly, but will be gone
is a subsequent patch.

Change-Id: I7b392cc57b69969f4bb3c4b51fa406ed643a37b3
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/34021
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-17865 osc: fiemap deadlock fix 63/55163/3
Alexander Zarochentsev [Mon, 20 May 2024 18:33:18 +0000 (18:33 +0000)]
LU-17865 osc: fiemap deadlock fix

A fiemap call may deadlock due to wrongly requesting an ldlm lock at
server while the same lock is cached and pinned at the client. Two PR
lock requests are compatible so the deadlock also needs a concurrent
write lock.

ll_fiemap_info_key is shared between osc_object_fiemap()
calls, once OBD_FL_SRVLOCK flag is set, it is reused for
all subsequent RPCs regardless of the local lock caching status.

HPE-bug-id: LUS-12353
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I6e76bc5e4549ed887b8f6177432acf90f9ec614d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55163
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-6142 socklnd: SPDX for sockets LND 14/55114/2
Timothy Day [Wed, 15 May 2024 03:51:51 +0000 (03:51 +0000)]
LU-6142 socklnd: SPDX for sockets LND

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ifb655ba3ad59fb467e288916e4229968450e9788
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55114
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17851 ldiskfs: restart long fallocate tx 11/55111/3
Alexander Zarochentsev [Mon, 29 Apr 2024 17:37:34 +0000 (17:37 +0000)]
LU-17851 ldiskfs: restart long fallocate tx

__ext4_journal_ensure_credits() may allow a long fs operation
like fallocate to run for too long, if the initial credits
estimation is enough high.
The fix is to force tx restart if tx state is not T_RUNNING.

HPE-bug-id: LUS-12311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib03d78739997caa6d13690b41ef7d01609a3623b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55111
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16938 utils: setstripe overstripe multiple OST count 92/54192/13
Rajeev Mishra [Fri, 9 Feb 2024 16:49:45 +0000 (16:49 +0000)]
LU-16938 utils: setstripe overstripe multiple OST count

Add an option to "lfs setstripe -C" to specify stripe counts
that are a multiple of the number of OSTs in the filesystem.
Using "-C -1" will create one stripe on all (available) OSTs,
as with "-c -1", to avoid too many stripes.  Using "-C -2"
will create two stripes on each OST, etc.

The maximum multiplier is currently "-C -32", which will
create 32 stripes per OST. It is still possible to specify
a large positive stripe count directly to  "-C" for testing
purposes and to maintain compatibility with current usage.

HPE-bug-id:LUS-11793
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: Ib0462d7a9b71853419ea7c30741bb35d576f0d71
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-13802 llite: add hybrid IO switch proc stats 96/52596/28
Patrick Farrell [Wed, 13 Mar 2024 14:50:40 +0000 (10:50 -0400)]
LU-13802 llite: add hybrid IO switch proc stats

Hybrid IO switching proc stats are useful for telling us if
and why we switched to DIO.  They're also helpful for
writing tests.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I68649474cf11ffc445574fcca105a81fd6ecd458
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-13802 llite: add read & write switch thresholds 95/52595/35
Patrick Farrell [Mon, 1 Apr 2024 15:30:29 +0000 (11:30 -0400)]
LU-13802 llite: add read & write switch thresholds

The main criteria for switching to from buffered IO to
hybrid is IO size.  This adds that switching.  The correct
size for cutover is not the same for read and write, so we
have separate checks for read and write.

These checks are elaborated on in further patches, adding
different thresholds based on the backing storage type.

Adding the switching thresholds is what really enables
hybrid IO, so we have to adjust a number of tests which
assume buffered IO.

There are a few obscure hang bugs which have been difficult
to track down, and we are past feature freeze, so this patch
now leaves hybrid IO disabled by default.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I491cd7b2bdafe8bb2c1a4d692442a62154324bec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17525 tests: fix sanity hash 2.15 interop 76/55076/5
Andreas Dilger [Sat, 11 May 2024 05:38:29 +0000 (23:38 -0600)]
LU-17525 tests: fix sanity hash 2.15 interop

Fix test version checks for interop testing for DNE directory hash
usage in sanity with 2.15 servers.  This incorrectly was assuming
that the CRUSH2 dir hash was included in the 2.15.0 release, but it
was not backported to that branch, and only landed in 2.15.51.

Exclude UDIO interop failures, which are fixed via LU-17525.

Fixes: 1ac4b9598a ("LU-15720 dne: add crush2 hash type")
Test-Parameters: trivial testlist=sanity serverversion=2.15.4 serverdistro=el8.9 env=SANITY_EXCEPT="56 119 398"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If2097ebc30c7c4dbce88af7774ce3c0e8fb3cb75
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55076
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
4 weeks agoLU-17783 statahead: disable batch statahead for old server 17/55017/4
Qian Yingjin [Mon, 6 May 2024 08:16:19 +0000 (04:16 -0400)]
LU-17783 statahead: disable batch statahead for old server

Disable the batch statahead for the old server that does not
support MDS_BATCH batch RPC.

Fixes: 4435d0121f ("LU-14139 statahead: batched statahead processing")
Test-Parameters: testlist=sanity serverjob=lustre-b_es6_0 serverbuildno=638 clientdistro=el9.3 serverdistro=el8.8 env=ONLY=123
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I79fba4204e0ed44e2bc9a4c4f2758d087f0e406b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55017
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
4 weeks agoLU-17867 ko2iblnd: gcc bug work around 72/55172/4
James Simmons [Wed, 22 May 2024 14:53:24 +0000 (10:53 -0400)]
LU-17867 ko2iblnd: gcc bug work around

Gcc 11 reports
 error: array subscript 'struct sockaddr_in6[0]' is partly
 outside array bounds of 'struct sockaddr[1]'

due to a bug in gcc that it becomes confused with the union.
To work around this we move to struct sockaddr_storage from
struct sockaddr.

Test-Parameters: trivial
Change-Id: I586042d6e3c59be8c63e2821659cf9d3bcdac8e3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 weeks agoLU-17662 osd-zfs: Support for ZFS 2.2.3 30/54530/9
Shaun Tancheff [Mon, 6 May 2024 03:06:31 +0000 (10:06 +0700)]
LU-17662 osd-zfs: Support for ZFS 2.2.3

ZFS commit zfs-2.2.99-269-g9b1677fb5
   dmu: Allow buffer fills to fail
Adds a boolean_t to dmu_buf_will_fill() and dmu_buf_fill_done()

Lustre always uses B_FALSE for this argument.

Also re-arrange and split some configure macros so we can all
the zfs and ldiskfs tests can be run in the same parallel pass.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I71a4723bfa8ce62ae6f270e26ab149bf98278d3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17477 tests: conf-sanity/48 with debug=0 99/53799/13
Alex Zhuravlev [Wed, 24 Jan 2024 07:52:20 +0000 (10:52 +0300)]
LU-17477 tests: conf-sanity/48 with debug=0

conf-sanity/48 takes quite long setting 4,5K ACLs.
debug=0 improves this significantly.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=48
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ifa39b9efc80b41050a13323474dd19b865cc6273
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 fid: rename ptlrpc_req_finished for component fid 94/54994/2
Arshad Hussain [Thu, 2 May 2024 11:28:21 +0000 (07:28 -0400)]
LU-16741 fid: rename ptlrpc_req_finished for component fid

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
fid component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: If5bf08719ab9be8255f1145fa7bcdfebd68da52c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 fld: rename ptlrpc_req_finished for component fld 93/54993/2
Arshad Hussain [Thu, 2 May 2024 11:24:57 +0000 (07:24 -0400)]
LU-16741 fld: rename ptlrpc_req_finished for component fld

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
fld component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7229ccdb4a6440700c120a5d75edd018252b0b8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ldlm: rename ptlrpc_req_finished for component ldlm 92/54992/2
Arshad Hussain [Thu, 2 May 2024 11:21:02 +0000 (07:21 -0400)]
LU-16741 ldlm: rename ptlrpc_req_finished for component ldlm

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
ldlm component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0daff368ed1b4448f236e7f8f17e1534b3db5e58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 lfsck: rename ptlrpc_req_finished for component lfsck 91/54991/2
Arshad Hussain [Thu, 2 May 2024 11:15:06 +0000 (07:15 -0400)]
LU-16741 lfsck: rename ptlrpc_req_finished for component lfsck

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
lfsck component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I57fa0bac6ecf03a6143ca8342d0fb753dc815d60
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54991
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 quota: rename ptlrpc_req_finished for component quota 90/54990/2
Arshad Hussain [Thu, 2 May 2024 11:11:06 +0000 (07:11 -0400)]
LU-16741 quota: rename ptlrpc_req_finished for component quota

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
quota component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7e671d68be8c0209a7439dc9762b5b10039aa0a3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54990
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 mgc: rename ptlrpc_req_finished for component mgc 89/54989/2
Arshad Hussain [Thu, 2 May 2024 11:07:12 +0000 (07:07 -0400)]
LU-16741 mgc: rename ptlrpc_req_finished for component mgc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
mgc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7b7fac8b3cfc30b6b6e92f68018b494d24390a7c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54989
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ptlrpc: rename ptlrpc_req_finished for component ptlrpc 88/54988/2
Arshad Hussain [Thu, 2 May 2024 10:57:31 +0000 (06:57 -0400)]
LU-16741 ptlrpc: rename ptlrpc_req_finished for component ptlrpc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
ptlrpc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic41d76ace564132a369288676398bc881048f851
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 mdc: rename ptlrpc_req_finished for component mdc 87/54987/2
Arshad Hussain [Thu, 2 May 2024 10:49:26 +0000 (06:49 -0400)]
LU-16741 mdc: rename ptlrpc_req_finished for component mdc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
mdc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I46de8facbafcabbeb5c12daefcc5172f6c9bafd5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 osp: rename ptlrpc_req_finished for component osp 86/54986/2
Arshad Hussain [Thu, 2 May 2024 10:40:02 +0000 (06:40 -0400)]
LU-16741 osp: rename ptlrpc_req_finished for component osp

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
osp component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0da0f922be2a062459c14585f910ef2a6c425b14
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54986
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17797 lnet: avoid use after free of lnet ifaces 75/54975/2
Shaun Tancheff [Wed, 1 May 2024 04:39:26 +0000 (11:39 +0700)]
LU-17797 lnet: avoid use after free of lnet ifaces

Durning inet4 / inet6 enumeration the array of nids can be
reallocated for freed.

When the array is freed the originating reference should be
nulled to avoid a possible use after free.

CoverityID: 425360 ("USE_AFTER_FREE")

Test-Parameters: trivial
Fixes: ab6c8bd18 ("LU-16822 lnet: always initialize IPv6 at start up")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ifd751e0c2f0095b33f8b2cd8dd58cfd8572c5ff4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54975
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17795 lnet: unused return code in lnet_peer_data_present 71/54971/2
Serguei Smirnov [Tue, 30 Apr 2024 17:55:29 +0000 (10:55 -0700)]
LU-17795 lnet: unused return code in lnet_peer_data_present

Coverity check detected an issue with the return code from the call to
lnet_peer_set_primary_nid() in the code added by LU-17379 patch.
Fix it.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: ae6d37 ("LU-17379 lnet: parallelize peer discovery via LNetAddPeer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I8b9df330200ff2732efd2a54d8de910463993fae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17788 ptlrpc: restore watchdog revival message 42/54942/12
Andreas Dilger [Sat, 27 Apr 2024 02:48:15 +0000 (20:48 -0600)]
LU-17788 ptlrpc: restore watchdog revival message

Restore the "Service thread pid NNN completed after SSS.mmm
seconds.  This likely indicates the system was overloaded"
message that was lost during ptlrpc watchdog restructuring.

Do not rate limit this message, so that it is possible to see
when all threads are restored, even if their corresponding
"Service thread pid NNN was inactive" message was throttled.

Update recovery-small test_10a to check for these messages,
so that they are not removed again in the future.

Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Fixes: fc9de679a4 ("LU-9859 libcfs: add watchdog for ptlrpc service threads.")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c7e96fb7f73ca5562a6f5ad780a79ffc83ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54942
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
5 weeks agoLU-17786 tests: use $TSTUSR instead of hard coding quota_usr 40/54940/2
James Simmons [Fri, 26 Apr 2024 22:26:46 +0000 (18:26 -0400)]
LU-17786 tests: use $TSTUSR instead of hard coding quota_usr

The bash function check_system_is_clean() hard codes the user.
For many external system due to security we can't create special
users so use $TSTUSR instead that can already exits for us.

Change-Id: I80d522f04bc813cd6d5aef000eeeb34d6ec81ebd
Fixes: 7e1fb1a296e ("LU-17179 tests: check the system is clean")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17504 build: fix lock_handle array-index-out-of-bounds 26/54926/5
Andreas Dilger [Sat, 27 Apr 2024 01:13:52 +0000 (18:13 -0700)]
LU-17504 build: fix lock_handle array-index-out-of-bounds

After Linux kernel patch "ubsan: Tighten UBSAN_BOUNDS on GCC"
(commit v6.4-rc2-1-g2d47c6956ab3), flexible trailing arrays
declared like 'lock_handle[2]' will generate warnings when
CONFIG_UBSAN & co. is enabled:

    UBSAN: array-index-out-of-bounds in ldlm_request.c:1282:18
    index 2 is out of range for type 'lustre_handle [2]'

The declaration lock_handle[LDLM_LOCKREQ_HANDLES] confuses the
compiler into thinking there are only two fields in lock_handle,
but the caller often allocates extra fields beyond this for more
locks to be cancelled due to Early Lock Cancellation or from LRU.

Rather than have a second flexible array after lustre_handle[2],
declare the whole array as flexible, and fix up the few sites
that are allocating this array to ensure LDLM_LOCKREQ_HANDLES
fields are allocated at a minimum.

This subtly changes the checks in wiretest.c due to the removal
of the 2 "base" handles in ldlm_request, but I believe this is not
changing the wire protocol because it still allocates those handles
directly, and I have verified interoperability with a 2.14.0 server.

Test-Parameters: testlist=runtests clientversion=2.14
Test-Parameters: testlist=runtests serverversion=2.14
Test-Parameters: testlist=runtests clientversion=2.15
Test-Parameters: testlist=runtests serverversion=2.15
Test-Parameters: testlist=runtests clientversion=EXA5
Test-Parameters: testlist=runtests serverversion=EXA5
Test-Parameters: testlist=runtests clientversion=EXA6
Test-Parameters: testlist=runtests serverversion=EXA6
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9695fb44f1b5c84bb750d2983cdd8b939e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54926
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17784 build: improve wiretest for flexible arrays 29/54929/2
Shaun Tancheff [Fri, 26 Apr 2024 11:24:34 +0000 (18:24 +0700)]
LU-17784 build: improve wiretest for flexible arrays

Flexible array checking can additionally probe that the size
of the array element is correct.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib7de3d156a2e77dfaf2e9ab1df8fab524c073610
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17741 gss: fix lsvcgss service for systemd 15/54915/3
Sebastien Buisson [Thu, 25 Apr 2024 16:42:44 +0000 (18:42 +0200)]
LU-17741 gss: fix lsvcgss service for systemd

Add a systemd unit file for lsvcgss service, so that the lsvcgssd
daemon can be handled correctly via systemctl.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5 clientdistro=el9.3 serverdistro=el9.3
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7581996e1e28567415da0827681841ac228ad6c5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17774 build: pass systemdsystemunitdir to "make debs" 02/54902/3
Jian Yu [Fri, 26 Apr 2024 17:10:03 +0000 (10:10 -0700)]
LU-17774 build: pass systemdsystemunitdir to "make debs"

This patch passes "--with-systemdsystemunitdir" configure
option to the configure command performed in "make debs".
It also updates debian/lustre-{client,server}-utils.install
with the detected/specified directory for systemd service files.

Test-Parameters: trivial clientdistro=ubuntu2204

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I7c36904ea0ed0f393a76b0fb0ad444b330dfa78c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17767 build: struct lsmcontext has slot or id member 81/54881/3
Sebastien Buisson [Tue, 23 Apr 2024 17:48:32 +0000 (10:48 -0700)]
LU-17767 build: struct lsmcontext has slot or id member

With Ubuntu 24.04 kernel 6.8.0-31-generic, the struct lsmcontext uses
a field named 'id' to identify the LSM module, instead of 'slot' in
previous kernel versions.

Fixes: 0e66489401 ("LU-16619 build: Ubuntu jammy 5.19 client support")
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5080e60614b42ed63103f93cae1f481851742d0b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17769 tests: run_one() repeats subtests for set duration 69/54869/6
Charlie Olmstead [Mon, 22 Apr 2024 16:37:12 +0000 (10:37 -0600)]
LU-17769 tests: run_one() repeats subtests for set duration

Implement ONLY_MINUTES=M environment variable to allow test runners
to execute a subtest for at least M minutes. Each time the subtest
completes, the duration is checked to see if it has exceeded
ONLY_MINUTES, therfore the parameter represents a minimum number
of minutes to run rather than an exact duration.

If, for some reason, both ONLY_REPEAT and ONLY_MINUTES are set,
the ONLY_REPEAT value takes precedence.

Test-Parameters: trivial testlist=sanity env=ONLY=73
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_REPEAT=10
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_MINUTES=5
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_REPEAT=100,ONLY_MINUTES=10
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_REPEAT=10,ONLY_MINUTES=10
Signed-off-by: Charlie Olmstead <charlie@whamcloud.com>
Change-Id: I4b454fd8582d2b875762ee15451150afb3117d15
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54869
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17000 misc: fix strscpy() Coverity warnings 65/54865/6
Arshad Hussain [Mon, 22 Apr 2024 09:25:50 +0000 (14:55 +0530)]
LU-17000 misc: fix strscpy() Coverity warnings

Fix warning reported for use of uninitialized vairable

CoverityID: 425254 ("Uninitialized scalar variable")

Fix warning reported when changing call from strlcpy()
to strscpy()

CoverityID: 425253 ("Unsigned compared against 0")
CoverityID: 425262 ("Unsigned compared against 0")
Fixes: 7a0517fa2 ("LU-17592 build: kernel 6.8 removed strlcpy()")

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Id3804c77a105e4776a0242db787dc1ca2528d9ca
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17761 tests: make sanity-compr sanity/sanityn return 0 55/54855/2
Jian Yu [Fri, 19 Apr 2024 18:54:04 +0000 (11:54 -0700)]
LU-17761 tests: make sanity-compr sanity/sanityn return 0

While running sanity-compr sanity/sanityn, if there was
sub-subtest failure, the sanity/sanityn test_cleanup would
be incorrectly marked as FAIL.

We should leave it to the individual sanity/sanityn subtests
to mark their failures, test_sanity() and test_sanityn()
should not also return an error.

Change-Id: I1fd645b80b92e583f1a564f85e6d2d6d871b8fa8
Test-Parameters: trivial testlist=sanity-compr
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54855
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-14391 lnet: optimize the Netlink packet size for routes 44/54844/12
James Simmons [Fri, 26 Apr 2024 17:15:02 +0000 (13:15 -0400)]
LU-14391 lnet: optimize the Netlink packet size for routes

Currently Netlink by default sets its maximum packet size
to send back to user land to 64K. Some sites setup many
routes, above ~430, which exceed this limit. We can avoid
this limitation by calculate about the actually size of
the netlink packet and setting cb->min_dump_alloc. The
new max is then 4GB which should be plenty (27K of routes)

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ica01f0cf290992a5d27b8ac2d09508d0a6e8151a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54844
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17455 scripts: add IPv6 support to ksocklnd-config 33/54833/4
Serguei Smirnov [Wed, 17 Apr 2024 21:15:22 +0000 (14:15 -0700)]
LU-17455 scripts: add IPv6 support to ksocklnd-config

Expand ksocklnd-config script to support IPv6.
For every interface listed as the argument, check if IPv6
address is configured and set up routing accordingly.
The change replicates existing behavior for IPv4:
   - if existing route is found for the interface,
     or skip_mr_routing is enabled, the script skips
     adding a new route and prints a warning
   - if default gateway is found on the same subnet,
     a source-based rule and route are added for the
     IP/interface using the gateway
   - if default gateway is not found, a source-based rule
     and a local route are added for the IP/interface

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I69e249f2858a201f1b108afa05cce9fdf4ee8c80
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-14535 utils: fix FORWARD_NULL issue from Coverity 27/54827/4
Hongchao Zhang [Sun, 14 Apr 2024 23:13:57 +0000 (07:13 +0800)]
LU-14535 utils: fix FORWARD_NULL issue from Coverity

Fixing the possible NULL pointer issued reported from Coverity

   case 'e':
CID 424708:    (FORWARD_NULL)
Passing null pointer "optarg" to "strtoul", which dereferences it.
      end_qid = strtoul(optarg, NULL, 0);
      break;

CoverityID: 424708 ("FORWARD NULL")

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Idfb5cb4c6fe63ec08dd9048742f3f280b125eb8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54827
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17625 statahead: avoid to use @sai after its has been freed 26/54826/3
Qian Yingjin [Wed, 17 Apr 2024 08:22:02 +0000 (04:22 -0400)]
LU-17625 statahead: avoid to use @sai after its has been freed

There is a race between a statahead thread startup and another
statahead reqeust trying to access the same statahead structure.
But the statahead thread startup was failed and free the statahead
structure too earlier. The user stat() request will use the
statahead structure which memory has been freed already wrongly...

In this patch, we repace the @ll_sai_free/@ll_sax_free with
@ll_sai_put/@ll_sax_put to avoid freeing the statahead structure
too eariler when they were still being used by user stat()
request.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3840be959160aed2887a91be81da05f796306cd9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17734 build: Debian: oblige --disable-tests if asked 64/54764/4
Ellis Wilson [Fri, 15 Oct 2021 20:23:25 +0000 (16:23 -0400)]
LU-17734 build: Debian: oblige --disable-tests if asked

Do not disable tests by default for debian-based builds, but permit
users to disable them if they choose by passing in --disable-tests.

Test-Parameters: trivial
Signed-off-by: Ellis Wilson <elliswilson@microsoft.com>
Change-Id: I90088e6e95fa9e46ae063dfc061a324293fde9a2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54764
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17714 gss: protect against revoked session keyring 06/54706/5
Sebastien Buisson [Mon, 8 Apr 2024 15:52:50 +0000 (17:52 +0200)]
LU-17714 gss: protect against revoked session keyring

In case the session keyring is revoked, request_key() still tries to
search it. Sadly this keyring is searched before the user keyring, so
it will return -EKEYREVOKED, and the user keyring, that does contain
the Lustre key, will not even be searched.
To work around this issue in the kernel implementation of request_key,
override the current process's credentials with no session keyring,
if we detect it has been revoked.

Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.9
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I64b6ac4693a47cf43d6fa1bf4e17bfb4907670fa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54706
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17714 gss: cleanup user keyring usage 92/54692/8
Sebastien Buisson [Mon, 8 Apr 2024 09:06:50 +0000 (11:06 +0200)]
LU-17714 gss: cleanup user keyring usage

User keys are linked to the user keyring. But we should not keep an
extra reference on the user keyring for every user key being created.
This leads to too many references on this keyring, and prevents proper
destroy in case the system wants to clean it up (because the user
logged off for instance).
And when unlinking a user key, we need to take care of the user
namespace, in order to fetch the real user keyring, and not the one
associated with the mapped uid in the user namespace.
Finally we must handle the case where the user key is explicitly
revoked via 'keyctl revoke' on the command line, by carrying out the
same cleanup as when 'lfs flushctx' is called. This properly drops
references on the key, and frees the security context associated with
the key.

Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.9
Fixes: 02b456e4a4 ("LU-17173 gss: user keys go to user keyring")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic168b68f8652689aa4402eaa4fcdbd852743d320
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54692
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17704 revert: "LU-17379 ptlrpc: fix check for callback discard" 86/54686/4
Andreas Dilger [Fri, 5 Apr 2024 22:42:48 +0000 (22:42 +0000)]
LU-17704 revert: "LU-17379 ptlrpc: fix check for callback discard"

This reverts commit a6886dba0ed8a622c9831cd33d310d933492c72d.
This is failing dbench intermittently in sanity-benchmark.

Change-Id: Id3720c79ca8dd9276e086aab5d3fcfe43ddd680a
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
5 weeks agoLU-17657 build: gcc 13 stricter enum checking 68/54468/6
Shaun Tancheff [Fri, 26 Apr 2024 15:25:19 +0000 (22:25 +0700)]
LU-17657 build: gcc 13 stricter enum checking

gcc 13 does not allow mixing of enum and integer
types between function declaration and implementation.

Cleanup a couple of instances where an enum is treated
as an uint32_t / __u32 and treat it as an enum type.

lustre/lov/lov_ea.c: In function 'lsme_unpack_comp':
lustre/lov/lov_ea.c:531:21: error: array subscript
   'struct lov_stripe_md_entry[0]' is partly outside array bounds
    of 'struct lov_stripe_md_entry[0]' [-Werror=array-bounds=]
  531 |                 lsme->lsme_magic = magic;

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8e2ef989ecbdebe5e13bcea0fbb210c4a14eb45e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54468
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17580 llite: Remove all referance of LOOKUP_CONTINUE 69/54169/7
Arshad Hussain [Sun, 25 Feb 2024 01:13:22 +0000 (06:43 +0530)]
LU-17580 llite: Remove all referance of LOOKUP_CONTINUE

Newer kernel (3.1 and beyond) LOOKUP_CONTINUE flag is
replaced/same as LOOKUP_PARENT flag. Can safely
remove any definations of LOOKUP_CONTINUE

Linux-commit: 49084c3bb2055c401f3493c13edae14d49128ca0
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I05eac0ec1321d230c7a215f95888d4040b7c670a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54169
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 weeks agoLU-13791 mdt: allow using symbolic capability names 18/54118/3
Andreas Dilger [Wed, 21 Feb 2024 00:59:25 +0000 (17:59 -0700)]
LU-13791 mdt: allow using symbolic capability names

Allow "mdt.*.enable_cap_mask" param set and print symbolic names,
similar to the "debug" and "subsystem_debug" parameters.  The
allowed parameter names are in the capabilities(7) man page, in
either upper or lowercase, like cap_chown, cap_dac_read_search,
etc. along with "all" to enable all capabilities if clients are
trusted.  For example:

    lctl set_param -P mdt.lfs-*.enable_cap_mask=+cap_dac_read_search

Since kernel_cap_t is a 64-bit value, enhance cfs_str2mask() to
take u64 mask arguments.  The calling libcfs_debug_str2mask()
sticks with "int mask" for now.

Split the core out from libcfs_debug_mask2str() into a new helper
function cfs_mask2str() so it can be called directly.

Fixes: 54f677651b ("LU-13791 mdt: parameter to tune capabilities")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3f71f61a17d4d3614e46a526c60e709d9eb825b3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54118
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17523 ldiskfs: sync series to include el8.4 92/53992/7
Shaun Tancheff [Tue, 5 Mar 2024 02:23:33 +0000 (09:23 +0700)]
LU-17523 ldiskfs: sync series to include el8.4

el8.4 .5 and .6 include:
  rhel8/ext4-deep-tree.patch
  rhel7.6/ext4-dquot-commit-speedup.patch
  rhel8/ext4-ext-merge.patch
  rhel8/ext4-mballoc-dense.patch

el8.6 include:
  rhel8/ext4-race-in-ext4-destroy-inode.patch
  rhel8/ext4-mballoc-dense.patch

el8.7 include:
  rhel8/ext4-deep-tree.patch
  rhel8/ext4-race-in-ext4-destroy-inode.patch
  rhel8/ext4-mballoc-dense.patch

el8.8 and .9 include:
  rhel8/ext4-limit-per-inode-preallocation-list.patch

el8.9 include:
  rhel8/ext4-race-in-ext4-destroy-inode.patch
  rhel8/ext4-mballoc-dense.patch

Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.9 serverdistro=el8.9 testlist=sanity
Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.8 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.7 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.6 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.5 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.4 testlist=sanity
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2f5515947a16dff7f2502ec281675f56b2470ea7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17483 gss: refresh req context with already existing one 59/53859/8
Sebastien Buisson [Tue, 30 Jan 2024 12:13:52 +0000 (13:13 +0100)]
LU-17483 gss: refresh req context with already existing one

When we are processing a request with a root GSS context that
has the PTLRPC_CTX_ERROR_BIT bit set, try to replace it with an
already existing context. Such a context can already be up-to-date
thanks to other authentication requests sent to failover NIDs while
the current request was in the delay list. This valid context can be
fetched from the struct ptlrpc_sec.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iff1cf727c4579cba6456e010aac6537cf888b0ae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53859
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-12885 mds: add enums for MDS_OPEN flags 69/36469/33
Andreas Dilger [Tue, 9 Apr 2024 08:22:07 +0000 (04:22 -0400)]
LU-12885 mds: add enums for MDS_OPEN flags

This patch is first of the series of patch that separates
kernel open flags from MDS open flags

The first step is to add enum mds_open_flags to the code to
make it easier to follow the logic. Rename it_flags to
it_open_flags and use enum mds_open_flags in the code so it
is clear that MDS_OPEN flags are being used.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I933a6e6102f947a9276cb6bf03826fd4a53ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/36469
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
5 weeks agoLU-11085 ldlm: save space in struct ldlm_lock 31/53931/9
Mr NeilBrown [Mon, 5 Feb 2024 22:46:49 +0000 (09:46 +1100)]
LU-11085 ldlm: save space in struct ldlm_lock

Moving the 'interval' handle into ldlm_lock has made the structure
bigger.  Compensate for this by shared space for fields only needs for
specific lock types.

i.e.  some fields are only needed for EXTENT locks, some for FLOCK
locks, some for PLAIN and IBITS which use "skiplists".

One x86_64 the reduces the size of ldlm_lock to what is was before the
previous patch.  A future patch will reduce it even more.

As extent and flock both used the interval tree node, they now have
different instances.  So the names in flock are changed.  Both of
these will disappear in future patches.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iec92a41c174e4884852ebf8fbb2cd50d4e165035
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53931
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 weeks agoLU-11085 ldlm: simplify use of interval-tree. 21/33221/28
NeilBrown [Wed, 25 Mar 2020 02:50:16 +0000 (22:50 -0400)]
LU-11085 ldlm: simplify use of interval-tree.

The interval tree used for keeping track of extent locks is currently
separate from those locks themselves.  A separate 'ldlm_interval'
structure is allocated and linked to all locks which have the same
extent.

This requires that the interval tree library handles an insert where
exactly the same interval already exists differently from any other
insert.  No other users of the interval tree library wants this, and
the library which is part of linux doesn't support it.  So it would be
good to remove this requirement.

This patch changes the library, removes the 'ldlm_interval' structure,
and stores each lock in the tree.  This substantially simplifies a lot
of code, but has some costs.

The ldlm_lock is now larger - it contains three pointers for the
rbtree where previously it had one, and it now has an extra copy of
the range start/end.  These will be resolved in later patches by
removing duplication and sharing space with other fields that aren't
used for extent locks.

The extent-tree can now be substantially larger as it now contains
every lock for a given extent rather than each extent only once.  As
the depth of the tree grows with the log of the number of elements,
this isn't an enormous cost, but it may still be measurable.  In
particular, locks that cover the full extent [0..MAX] are common and
can swamp other locks (citation needed).  Such locks can be easily
kept in a separate list.  This will restore some of the code
complexity, but is otherwise of little cost.

Linux-commit: 71236833ad7a98b69e6e675efefbdc04a74c1d4b

Change-Id: I6c82d971aabd02bb036ac0bd27a934d48e972895
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/33221
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
5 weeks agoLU-14810 lnet: ongoing push when discovery is stopped 84/54884/3
Cyril Bordage [Wed, 24 Apr 2024 02:21:53 +0000 (04:21 +0200)]
LU-14810 lnet: ongoing push when discovery is stopped

If a push is not completed when discovery thread is stopped, then we
still have ln_dc_handler used as md handler (from
lnet_peer_send_push). That leads to assert failure from
lnet_assert_handler_unused.

To fix that, we call lnet_assert_handler_unused only after the monitor
thread has been stopped. Thus, the patch for LU-17496 is not needed
anymore.

Fixes: 36b14a23a6 ("LU-17207 lnet: race b/w monitor thr stop and discovery push")
Test-Parameters: testlist=sanity-lnet env=ONLY="212 220",ONLY_REPEAT=100
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I426c37b12a3d29327a7295f528a5b875a9ac88a0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54884
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17745 llite: fix the umount panic due to BDI unregister 50/54850/4
Qian Yingjin [Fri, 19 Apr 2024 02:53:10 +0000 (22:53 -0400)]
LU-17745 llite: fix the umount panic due to BDI unregister

There is a regression in the patch for LU-16954 on the old RHEL
kernel (RHEL8.2). When the Lustre is unmounted, the client gets
a crash.

In LU-16954, to avoid the remount failure, we explicitly
unregister the sysfs for the @bdi on the new kernel such as Unbutu
2204 v5.15 kernel.
However, this is not needed for the old kernel such RHEL 8.2.
In this patch, we remove the explicit unregister for the old kenel
to avoid the client crash during unmount.

Fixes: dcc1dd39a6 ("LU-16954 llite: add SB_I_CGROUPWB on super block for cgroup")
Test-Parameters: clientdistro=ubuntu2204 testlist=sanity-sec
Test-Parameters: clientdistro=el8.9 testlist=sanity-sec
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ic6df572744bed8994c08fb1369cc9beccbe2d87a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54850
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-6142 osd-zfs: Fix style issues for osd_io.c 64/54264/5
Arshad Hussain [Mon, 4 Mar 2024 07:45:23 +0000 (02:45 -0500)]
LU-6142 osd-zfs: Fix style issues for osd_io.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_io.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia9153be34a1d583195e3ecfc56ca4ab279781566
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17743 ko2iblnd: move to struct lnet_nid 71/54771/6
James Simmons [Thu, 25 Apr 2024 23:00:24 +0000 (19:00 -0400)]
LU-17743 ko2iblnd: move to struct lnet_nid

Move all non wire data structures using lnet_nid_t to
struct lnet_nid. This is the first step to support
IPv6 / GUID.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I9d1281a1b7ab7bda566369be2bc5f07ba3ce17f9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-13814 osc: Remove osc delete for transient pages 79/52079/19
Patrick Farrell [Fri, 23 Feb 2024 16:16:42 +0000 (11:16 -0500)]
LU-13814 osc: Remove osc delete for transient pages

Transient pages do not need an extra reference for being
part of a transfer, because they are referenced throughout
by cl_io.  This requires a tweak to the page completion
behavior.

This allows us to remove osc_page_delete for transient
pages.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I96539731f972b19830b2e08bf0f1d1f1e9674241
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
5 weeks agoLU-13814 osc: specialize osc_page_delete 78/52078/20
Patrick Farrell [Fri, 23 Feb 2024 16:05:35 +0000 (11:05 -0500)]
LU-13814 osc: specialize osc_page_delete

Nearly all of osc_page_delete is only done for cacheable pages,
so make that explicit.  osc_lru_del() doesn't do anything because
transient pages can't go in the LRU.  In osc_teardown_async_page(),
the latter side of the if statement is a search in cache, so it
never finds the page, then the earlier part is a check that the
page isn't in an RPC.  That's not really possible for DIO pages
unless something is *really* off.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I998fc196c276aa97829f5b368e23aa4b7a797294
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
5 weeks agoLU-17524 llite: DIO and writev and readv syscalls 96/53996/19
Shaun Tancheff [Wed, 24 Apr 2024 22:24:44 +0000 (18:24 -0400)]
LU-17524 llite: DIO and writev and readv syscalls

Linux kernel v3.15-rc4-329-g62a8067a7f35
  bio_vec-backed iov_iter
Introduced iov_iter_get_pages_alloc

In kernels prior to iov_iter_get_pages_alloc the family
of iovec iter syscalls such as readv and writev fail to
interate over the the iovec segments.

In this case the iter() handler should submit the iovec
while looping over the segments.

Linux kernel v5.19-10287-gfcb14cb1bdac
  new iov_iter flavour - ITER_UBUF

This introduce user_backed_iter() and provide a user_backed_iter
for older kernels.

Fixes: 0006eb3644 ("LU-16328 llite: migrate_folio, vfs_setxattr")
Fixes: 044503492c ("LU-6260 llite: add support for new iter functionality")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idec6a956918a1744f2801ffce9b40acb2c074523
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53996
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
5 weeks agoLU-16822 tests: Update sanity-lnet router tests for IPv6 28/53728/6
Chris Horn [Thu, 25 Apr 2024 17:36:25 +0000 (13:36 -0400)]
LU-16822 tests: Update sanity-lnet router tests for IPv6

Modify sanity-lnet test cases that test routing to work with IPv6
NIDs.

test_100/102/105/106:
  - Modified to use setup_router_test() to create a real router and
    use the associated LNet configuration in their tests.
test_101/103:
  - These test cases exercise the NID range functionality. They are
    skipped under IPv6 config

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I47b23e9c63d74d937cae7c7b8b1b27dd383fc0dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53728
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 weeks agoNew tag 2.15.63 2.15.63 v2_15_63
Oleg Drokin [Thu, 2 May 2024 05:05:18 +0000 (01:05 -0400)]
New tag 2.15.63

Change-Id: I2ceb1e0afe9bd966555579b5d70bd263016884e2
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17504 build: fix gcc-13 [-Werror=stringop-overread] error 34/54834/6
Shaun Tancheff [Thu, 25 Apr 2024 17:57:36 +0000 (00:57 +0700)]
LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error

This patch fixes the following [-Werror=stringop-overread] and
[-Werror=attribute-warning] errors detected by gcc 13:

lustre/mgc/mgc_request.c:190:21: error: 'strcmp' reading 1 or
more bytes from a region of size 0 [-Werror=stringop-overread]
  190 | if (strcmp(logname, cld->cld_logname) == 0) {
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In function 'fortify_memcpy_chk',
    inlined from 'class_handle_ioctl' at
/root/lustre-release/lustre/obdclass/class_obd.c:381:3:
include/linux/fortify-string.h:528:25: error:
call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  528 |  __write_overflow_field(p_size_field, size);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f5a88b4cd64c9f4e67e568546baada371543b1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>