Whamcloud - gitweb
fs/lustre-release.git
4 weeks agoLU-17844 lnds: remove a few LCONSOLE_ERROR_MSG() 85/55085/3
Timothy Day [Mon, 13 May 2024 03:51:16 +0000 (03:51 +0000)]
LU-17844 lnds: remove a few LCONSOLE_ERROR_MSG()

I doubt these magic numbers help anyone.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I7c2505ec0eb7fc6524a13d4bf330a72188a26b4e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55085
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16518 lnet: fix incorrectly initialized variables 84/55084/2
Timothy Day [Mon, 13 May 2024 03:39:08 +0000 (03:39 +0000)]
LU-16518 lnet: fix incorrectly initialized variables

Clang 12 complained about an uninitialized 'off' in
brw_test.c, fixed by removing the dual declaration.

Also, init 'rc' in yaml_import_global_settings().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I893149110120975c91839e73241b311a53c6e195
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55084
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17490 tests: update .gitignore 82/55082/2
Timothy Day [Mon, 13 May 2024 03:27:11 +0000 (03:27 +0000)]
LU-17490 tests: update .gitignore

Otherwise, we'll see this monitor_lustrefs binary in the
build tree.

Fixes: 7101742 ("LU-17490 tests: verify fanotify works for lustre")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I129c12515e607e97ab42917220a439ebb1823e8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17646 llapi: lustreapi: add FID in error messages 74/55074/2
Alexandre Ioffe [Sat, 11 May 2024 01:28:05 +0000 (18:28 -0700)]
LU-17646 llapi: lustreapi: add FID in error messages

Use llapi_fd2fid() to print FID in llapi_lease_set() and
llapi_lease_check() error messages.

Test-Parameters: trivial
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iac97ea721860652e304c674007ac7646d183e2fd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17841 kfilnd: Race between hello and tagged RMA 72/55072/2
Chris Horn [Fri, 3 May 2024 19:22:12 +0000 (13:22 -0600)]
LU-17841 kfilnd: Race between hello and tagged RMA

A race exists between processing an incoming hello and initiating the
RMA for bulk operations that can result in RKEY re-use.

Initiator:
Posts tagged receive with RKEY based on peerA::kp_local_session_key X
and tn_mr_key Y
Bulk request (1) sent to target
Some earlier transaction fails:
 - Deletes peerA::kp_local_session_key X
 - Creates peerA::kp_local_session_key Z
 - HELLO request send to peerA

Target:
Processes HELLO request - updates kp_remote_session_key from X to Z.
Handles bulk request (1)
Performs RMA using session key Z and tn_mr_key Y, but completion is
delayed

Initiator:
Bulk request (1) hits timeout
 - Tagged receive canceled, and tn_mr_key Y is released
Posts tagged receive with RKEY based on peerA::kp_local_session_key Z
and tn_mr_key Y
Bulk request (2) sent to target

Target:
RMA for (1) is completed using the RKEY for (2)

The solution is to create a new bulk request message that contains
the session key used to set up the tagged buffer on the initiator.
This is compared against the session key exchanged during hello
handshake prior to initiating the RMA. If there's a mismatch
then the RMA is failed and the transaction is finalized. The session
key stored in the new bulk request is also used to generate the RKEY
rather than using the session key stored in the kfilnd_peer. This is
a protocol change so the KFILND_MSG_VERSION is bumped.

During testing it was found that the kfilnd_msg::version was not
being set correctly for immediate and bulk messages. To allow interop
the kfilnd_msg::version must be set to the handshaked negotiated
version that is stored in kfilnd_peer::kp_version. This has been
fixed. This issue only impacts kfilnd peers with message version > 1,
so backwards compatability between versions 1 and 2 will work
correctly.

The KFILND_TN_DEBUG macro is modified to print additional information
that was useful when debugging this issue.

Lastly, the TN_EVENT_TAG_TX_OK was missing from tn_event_to_str(), so
this is added.

HPE-bug-id: LUS-12317
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0b52a8367cd45b7587ba9ec3fa5212f548bebb57
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55072
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17840 kfilnd: Race between peer del RKEY reuse 71/55071/2
Chris Horn [Wed, 1 May 2024 16:33:33 +0000 (11:33 -0500)]
LU-17840 kfilnd: Race between peer del RKEY reuse

kfilnd_peer object deletion is a two step process. First a flag
(kfilnd_peer::kp_remove_peer = 1) is atomically set in the object to
mark it for removal via a call to kfilnd_peer_del(). Then, the next
caller of kfilnd_peer_put() will atomically modify this flag
(kfilnd_peer::kp_remove_peer = 2) again to denote that it is removing
the peer from the rhashtable before actually removing the object.

The window between marking a peer for deletion and removing it from
the peer cache allows a race where an RKEY may be re-used. For
example:

Thread 1: Posts tagged receive with RKEY based on
      peerA::kp_local_session_key X and tn_mr_key Y
Thread 1: Cancels tagged receive
Thread 1: kfilnd_peer_del() -> peerA::kp_remove_peer = 1
Thread 2: kfilnd_peer_put() -> peerA::kp_remove_peer = 2
Thread 1: kfilnd_peer_put() -> kfilnd_tn_finalize() -> releases
tn_mr_key Y
Thread 3: allocates tn_mr_key Y
Thread 3: Fetches peerA with kp_local_session_key X
Thread 2: Removes peerA from rhashtable

At this point, thread 3 has the same RKEY used by thread 1.

The fix is to check on the peer lookup path whether a peer found in
the rhashtable has been marked for removal. If it has then we perform
the lookup again. We do this in a loop until either no peer is found,
or a peer is found that has not been marked for removal.

To reduce the size of this window, the process for kfilnd_peer
deletion is modified so that the first thread to call
kfilnd_peer_del() will also remove the peer from the rhashtable.

HPE-bug-id: LUS-12312
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibbbb38cd5ee2d90956791f8350dafbee5fe5d888
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55071
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17839 kfilnd: Wait for hello response to mark peer uptodate 70/55070/2
Chris Horn [Wed, 15 Nov 2023 19:22:24 +0000 (12:22 -0700)]
LU-17839 kfilnd: Wait for hello response to mark peer uptodate

We need to ensure that a target peer has processed a hello request
from the sender before initiating network transactions. This can be
positively affirmed iif we receive a hello response message from
the target.

There are two issues where messages may be dropped because hello
request or response has not been processed.

Issue 1 - Race:
A@kfi -> HELLO REQ -> B@kfi
A@kfi <- HELLO REQ <- B@kfi
A@kfi processes HELLO REQ, marks B@kfi uptodate
A@kfi -> MSG -> B@kfi
A@kfi -> HELLO RSP -> B@kfi

MSG is dropped by B@kfi because it did not process A@kfi's HELLO REQ
or RSP.

Issue 2 - HELLO target already considers originator as uptodate
A@kfi -> HELLO REQ -> B@kfi
B@kfi processes HELLO REQ
A@kfi <- MSG <- B@kfi
A@kfi <- HELLO RSP <- B@kfi

MSG is dropped by A@kfi because it did not process B@kfi's HELLO RSP.

We resolve the first race by waiting for the hello responses to
be processed before marking the peer as uptodate. To ensure that
we will always receive a hello response, the target of a hello request
must initiate its own handshake with the originator. When we receive
a hello request from a new peer then instead of setting the peer state
to KP_STATE_UPTODATE we instead set it to KP_STATE_WAIT_RSP. We can
process RX events for peer in this state, but sends to this peer will
be throttled until we receive a hello response from it.

To resolve the second race we need an additional change to allow
TN_EVENT_RX_OK events to be replayed until the hello response is
received and processed. However, this could result in state changes
that invalidate RX_OK events on replay. Thus, this race will remain
open.

Add CFS_KFI_REPLAY_RX_HELLO_REQ fail_loc to delay the processing of
an incoming hello request.

Add CFS_KFI_FAIL_MSG_TYPE_EAGAIN to delay the sending of specified
message types.

HPE-bug-id: LUS-11673
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaaa6b4a533dbcf13cd2a8c1365a89ba521d70af0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17838 kfilnd: Prevent simultaneous hellos 69/55069/2
Chris Horn [Tue, 14 Nov 2023 16:35:15 +0000 (09:35 -0700)]
LU-17838 kfilnd: Prevent simultaneous hellos

There is a race condition with checking, setting and clearing the
kp_hello_pending flag that can result in multiple hello requests being
sent for the same peer. If no hello response is received after the
LND timeout then multiple threads can race with each other in
clearing the kp_hello_pending flag and posting a new hello request
message.

Thread 1: sets kp_hello_pending and posts hello request message
<No hello response received after LND timeout>
Thread 2: Clears kp_hello_pending, then sets kp_hello_sending
Thread 3: Clears kp_hello_pending, then sets kp_hello_sending
Thread 2/3: Both post hello request message

To resolve this issue we change kp_hello_pending from a simple binary
to instead track three states of a hello request: KP_HELLO_NONE,
KP_HELLO_INIT, and KP_HELLO_SENT. State is NONE when there is no
hello in the process of being sent. State is INIT when a thread is
allocating a HELLO request in preparation for sending. State is SENT
when the HELLO request is being posted. Now, when some threads detect
that we have not received hello response after LND timeout seconds
then only one of them will be able to transition to the hello state
from SENT -> NONE.

Add CFS_KFI_REPLAY_IDLE_EVENT fail_loc that can be used to delay
processing of TNs in the idle state depending on the TN event
value specified in fail_val.

HPE-bug-id: LUS-11974
Test-Parameters: trivial
Fixes: 11a32d886b ("LU-16213 kfilnd: Allow one HELLO in-flight per peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4dddf57971848a80a550df7523d55ad03f4a083e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55069
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17837 kfilnd: Set dev_cpt 68/55068/2
Ron Gredvig [Fri, 20 Oct 2023 19:46:48 +0000 (19:46 +0000)]
LU-17837 kfilnd: Set dev_cpt

The dev_cpt value was not being set by kfilnd.

Query the kfabric provider to get the low level
device. Using the device, determine the dev_cpt.

This change is backwards compatible with older
versions of the kfabric provider. If the query
is not supported the dev_cpt is set to
CFS_CPT_ANY.

HPE-bug-id: LUS-11352
Test-Parameters: trivial
Signed-off-by: Ron Gredvig <ron.gredvig@hpe.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id8af36b7aa5e89969de93dc8db9c0bba03236140
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55068
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-15988 osp: don't print nid on -ESTALE 49/55049/2
Lai Siyao [Fri, 3 May 2024 00:27:04 +0000 (20:27 -0400)]
LU-15988 osp: don't print nid on -ESTALE

Osp_send_update_req() should not access import upon -ESTALE, because
this MDT may be in umount.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibd869e4e8da4f90ffd608a36d866264d5d552d0e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55049
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17000 obdclass: Add NULL check for parms under class_exp2cliimp 30/55030/3
Arshad Hussain [Tue, 7 May 2024 05:29:03 +0000 (01:29 -0400)]
LU-17000 obdclass: Add NULL check for parms under class_exp2cliimp

This patch adds NULL pointer check for parameters
passed under class_exp2cliimp()

Test-Parameters: trivial
CoverityID: 424699 ("Dereference before null check")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie7d96c10086959a3f31b290d56621261da480a36
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55030
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17817 llapi: avoid potential NULL component 28/55028/4
Rajeev Mishra [Mon, 6 May 2024 20:12:54 +0000 (20:12 +0000)]
LU-17817 llapi: avoid potential NULL component

Avoid potential NULL dereference for component issue in
llapi_layout_file_open() and llapi_layout_file_comp_add()

CoverityID: 425352 ("Dereferencing 'comp', which is known to be NULL")
HPE-bug-id: LUS-12326
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: Id773fdbf031a2d11256140590f570f90da46ec3a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55028
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17816 llapi: ensure pool name is nul terminated 18/55018/2
Shaun Tancheff [Mon, 6 May 2024 09:26:22 +0000 (16:26 +0700)]
LU-17816 llapi: ensure pool name is nul terminated

strncpy() usage is inconsistent about the size of pool name
and sometimes for get to ensure a nul byte is placed at the
end of the copy.

CoverityID: 397181 ("Buffer not null terminated (BUFFER_SIZE)")

Also cleanup a case of checking that an unsigned value >= 0

CoverityID: 397820 ("Unsigned compared against 0 (NO_EFFECT)")

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idec7adaf89c9dabc0275687c4a069fc8fa63e7a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55018
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17504 libcfs: safer LIBCFS_ALLOC 15/55015/2
Shaun Tancheff [Mon, 6 May 2024 05:11:15 +0000 (12:11 +0700)]
LU-17504 libcfs: safer LIBCFS_ALLOC

Make the LIBCFS_ALLOC() family of macros safer by adding
parenthesis around arguments such as (size) to avoid uninteded
expansion.

CoverityID: 415056 ("Integer handling issues")

Fixes: 718e3f3e68 ("LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I9701f87025bc5ce038a6bf34413b64a3f019d998
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55015
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17815 tests: skip conf-sanity.sh test_5h 12/55012/4
Emoly Liu [Mon, 6 May 2024 03:15:37 +0000 (11:15 +0800)]
LU-17815 tests: skip conf-sanity.sh test_5h

Skip conf-sanity.sh test_5h because it always caused test_102 and
test_108 failure in recent interop testing.

Test-Parameters: trivial serverbuildno=170 serverjob=lustre-b2_12 serverdistro=el7.9 testlist=conf-sanity env=ONLY="5h 102 108",HONOR_EXCEPT=y
Test-Parameters: trivial testlist=conf-sanity

Fixes: d1b5146eda ("LU-12206 mdt: mdt_init0 failure handling")

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id6ffe8b5d88e1d79883cbf2d84d73796945fc734
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17791 build: use external o2ib path for ko2iblnd.ko 84/54984/2
Shaun Tancheff [Thu, 2 May 2024 09:20:49 +0000 (16:20 +0700)]
LU-17791 build: use external o2ib path for ko2iblnd.ko

The O2IBPATH variable was split into INT_O2IBPATH used
for in-kernel o2iblnd and EXT_O2IBPATH for the external
o2iblnd driver.

Correct a case where the transtion from @O2IBPATH@ to
@EXT_O2IBPATH@ was missed when support for multiple lnds
deb packaging was initially added.

Fixes: 95287378fab ("LU-16967 build: Separate lnet LND deb packaging")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I94ff393a437c6875cda9db266ab636fd88871188
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17756 lod: add tunable lod.*.max_stripes_per_mdt 45/54945/4
Lai Siyao [Thu, 25 Apr 2024 08:15:49 +0000 (04:15 -0400)]
LU-17756 lod: add tunable lod.*.max_stripes_per_mdt

Add a tunable lod.*.max_stripes_per_mdt for directory overstriping.
The default value is LMV_MAX_STRIPES_PER_MDT(5).

Add sanity 300uh 300ui.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id8199f01f5e2d62ead6bf43d239eee8ec1e4cbb5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54945
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-17431 utils: adapt dynamic use in nodemap_cmd 00/55000/2
Sebastien Buisson [Tue, 30 Apr 2024 16:08:22 +0000 (18:08 +0200)]
LU-17431 utils: adapt dynamic use in nodemap_cmd

In nodemap_cmd(), try to detect if we are running on an MGS
before using the dynamic parameter.

Test-Parameters: trivial
Fixes: fecc3bd4e2 ("LU-17431 utils: add 'dynamic' parameter to nodemap_cmd")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I63a727491c839e457e44eaf1f4b4d11b164fd8b4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17431 utils: fix various ret codes in lctl 01/54501/6
Sebastien Buisson [Wed, 13 Mar 2024 13:19:25 +0000 (14:19 +0100)]
LU-17431 utils: fix various ret codes in lctl

When nodemap_cmd() returns an error, use errno to print
correct return code.
Make get_mgs_device() return an errno in case of failure.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I74f6e27fc17158bf454f0d8be490a087aa137079
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-17431 nodemap: sanity check ioctl user buffer 28/54928/4
Sebastien Buisson [Fri, 26 Apr 2024 14:49:17 +0000 (16:49 +0200)]
LU-17431 nodemap: sanity check ioctl user buffer

In server_iocontrol_nodemap(), user data is copied into a struct
lustre_cfg. Then this data must be sanity checked, by calling
lustre_cfg_sanity_check().

CoverityID: 425252 ("Passing tainted expression lcfg->lcfg_buflens to lustre_cfg_string")
CoverityID: 397130 ("Passing tainted expression lcfg->lcfg_buflens")
Fixes: 72734cf178 ("LU-17431 ptlrpc: move nodemap related ioctls to ptlrpc")

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I268b53fc0e977716ffd1985d145dc27b6acccf94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17649 ptlrpc: fix -EACCES connection error handling 48/54448/13
Mikhail Pershin [Mon, 18 Mar 2024 15:37:02 +0000 (18:37 +0300)]
LU-17649 ptlrpc: fix -EACCES connection error handling

Connection errors -EACCES and -EROFS leave import in
intermediate state. It is still active as well as pinger
over it but has obd_no_recov set. That allows import to
recover after all if server security is updated. But even
in FULL state any RPC over import gets -ESHUTDOWN as
obd_no_recov is set

Meanwhile obd_no_recov is not supposed to be used in that
way, it reflects particular mount option and should not
be recovered ever. So patch sets import to deactive state
instead, making import not operational too but with
option to be activated manually or remounted

Server connections like LWP, MDT-OST and MDT-MDT are
excluded and are never deactivated. Such errors are
considered as temporary until remote target updates own
security as required or administrative intervention will
restart target as needed.

In both cases console message is issued.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib83e1b0ac541823ec236591f08145340d6f6bf04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54448
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16314 tests: enable debug_raw_pointers on mount 54/54254/8
Shaun Tancheff [Wed, 17 Apr 2024 08:44:12 +0000 (15:44 +0700)]
LU-16314 tests: enable debug_raw_pointers on mount

When the MGS is mounted:
  do_facet mgs "$LCTL set_param -P debug_raw_pointers=Y"

So debug_raw_pointers need only be set once instead of
enabled and distabled for each test.

Switching kptr_restrict for every node on every test (twice)
does not add value when testing on dedicated test VMs.

This adds a KPTR_ON_MOUNT to allow a less restrictive setting
during test-framework setupall()/cleanall().

The initial kptr restrict values are persisted to and restored
from a well-known temporary file $TMP/kptr-$PPID-env

The patch enables KPTR_ON_MOUNT by default.

HPE-bug-id: LUS-10945
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4d8975f26e57ea064608663f309400d09406d500
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
4 weeks agoLU-17342 o2ib: build without Module.symvers 58/53358/3
Timothy Day [Thu, 7 Dec 2023 05:12:57 +0000 (05:12 +0000)]
LU-17342 o2ib: build without Module.symvers

When building against an external kernel tree, the
configure script fails if there isn't a Module.symvers
available. This prevents us from using the
'modules_prepare' make target on the kernel tree.
ko2iblnd.ko can be build even without Module.symvers.
Hence, downgrade this message from an error to a
warning.

Also, don't fail if ko2iblnd can't be built. Just
emit a warning.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8bca7f945c753fdac3aa5d9889d3347613baf059
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-16819 build: use mofed path based on target kernel 37/50937/9
Ake Sandgren [Thu, 11 May 2023 06:48:32 +0000 (08:48 +0200)]
LU-16819 build: use mofed path based on target kernel

Instead of using "uname -r", which limits builds to the currently
running kernel, use the target kernel which is available in
LINUXRELEASE, if the directory is available.
Building for a specific kernel is common practice when using DKMS.

Test-Parameters: trivial
Signed-off-by: Ake Sandgren <ake.sandgren@hpc2n.umu.se>
Change-Id: Ifce912061a74fc5b7435cd940105190f0c3cd544
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50937
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16350 ldiskfs: Server support for LTS linux v6.1 60/52260/13
Shaun Tancheff [Thu, 25 Apr 2024 15:38:20 +0000 (22:38 +0700)]
LU-16350 ldiskfs: Server support for LTS linux v6.1

Keep LTS kernel support and very recent kernel
ldiskfs series. Squash older series and drop
any unused patches.

Dropping 5.8 and 5.9 non LTS kernel series
Adding patches with kernel version that originated
the change
   linux-5.18/ext4-lookup-dotdot.patch
   linux-6.0/ext4-data-in-dirent.patch
   linux-6.0/ext4-pdirop.patch
   linux-6.1/ext4-dont-check-before-replay.patch
   linux-6.1/ext4-mballoc-extra-checks.patch
   linux-6.1/ext4-prealloc.patch
refresh linux-5.16/ext4-misc.patch to use strscpy instead of strlcpy

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Id747e200f5d3f50475094ee5ad948c389cce3184
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52260
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-11085 ldlm: convert ldlm extent locks to linux extent-tree 92/41792/17
Mr NeilBrown [Fri, 21 Aug 2020 00:28:53 +0000 (10:28 +1000)]
LU-11085 ldlm: convert ldlm extent locks to linux extent-tree

As Linux has a fully customizable extent tree implementation, use that
instead of the one in lustre.  This removes the need to store the
extent endpoints in the lock twice, thus recovering some of the space
wasted in a previous patch.

It also allows iteration loops to be in-line rather than requiring a
callback - though in some cases we keep the callback.

Note that interval_expand() will not expand the lower boundary down if
the tree is not empty.  We now make that explicit in the loop in
ldlm_extent_internal_policy_granted().  Consequently testing of
'conflicting > 4' is irrelevant.

Linux extent-trees does not have a direct equivalent to
interval_is_overlapped(), however we can use extent_iter_first() to
achieve the same effect.

We ask for the first interval in the tree that covers the range of the
given interval with extent_iter_first().  If nothing is returned, then
nothing in the tree overlaps the interval and interval_is_overlapped()
would return false.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie28c6fb0d40d2c92c7067c7a79f48ee1fc633ce9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41792
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-11085 ldlm: move interval_insert call from ldlm_lock to ldlm_extent 21/34021/18
NeilBrown [Fri, 9 Aug 2019 17:10:03 +0000 (13:10 -0400)]
LU-11085 ldlm: move interval_insert call from ldlm_lock to ldlm_extent

Moving this call results in all interval-tree handling code
being in the one file. This will simplify conversion to
use Linux interval trees.

The addition of 'struct cb' is a little ugly, but will be gone
is a subsequent patch.

Change-Id: I7b392cc57b69969f4bb3c4b51fa406ed643a37b3
Signed-off-by: NeilBrown <neilb@suse.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/34021
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
4 weeks agoLU-17865 osc: fiemap deadlock fix 63/55163/3
Alexander Zarochentsev [Mon, 20 May 2024 18:33:18 +0000 (18:33 +0000)]
LU-17865 osc: fiemap deadlock fix

A fiemap call may deadlock due to wrongly requesting an ldlm lock at
server while the same lock is cached and pinned at the client. Two PR
lock requests are compatible so the deadlock also needs a concurrent
write lock.

ll_fiemap_info_key is shared between osc_object_fiemap()
calls, once OBD_FL_SRVLOCK flag is set, it is reused for
all subsequent RPCs regardless of the local lock caching status.

HPE-bug-id: LUS-12353
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I6e76bc5e4549ed887b8f6177432acf90f9ec614d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55163
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-6142 socklnd: SPDX for sockets LND 14/55114/2
Timothy Day [Wed, 15 May 2024 03:51:51 +0000 (03:51 +0000)]
LU-6142 socklnd: SPDX for sockets LND

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ifb655ba3ad59fb467e288916e4229968450e9788
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55114
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17851 ldiskfs: restart long fallocate tx 11/55111/3
Alexander Zarochentsev [Mon, 29 Apr 2024 17:37:34 +0000 (17:37 +0000)]
LU-17851 ldiskfs: restart long fallocate tx

__ext4_journal_ensure_credits() may allow a long fs operation
like fallocate to run for too long, if the initial credits
estimation is enough high.
The fix is to force tx restart if tx state is not T_RUNNING.

HPE-bug-id: LUS-12311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib03d78739997caa6d13690b41ef7d01609a3623b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55111
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
4 weeks agoLU-16938 utils: setstripe overstripe multiple OST count 92/54192/13
Rajeev Mishra [Fri, 9 Feb 2024 16:49:45 +0000 (16:49 +0000)]
LU-16938 utils: setstripe overstripe multiple OST count

Add an option to "lfs setstripe -C" to specify stripe counts
that are a multiple of the number of OSTs in the filesystem.
Using "-C -1" will create one stripe on all (available) OSTs,
as with "-c -1", to avoid too many stripes.  Using "-C -2"
will create two stripes on each OST, etc.

The maximum multiplier is currently "-C -32", which will
create 32 stripes per OST. It is still possible to specify
a large positive stripe count directly to  "-C" for testing
purposes and to maintain compatibility with current usage.

HPE-bug-id:LUS-11793
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: Ib0462d7a9b71853419ea7c30741bb35d576f0d71
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54192
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-13802 llite: add hybrid IO switch proc stats 96/52596/28
Patrick Farrell [Wed, 13 Mar 2024 14:50:40 +0000 (10:50 -0400)]
LU-13802 llite: add hybrid IO switch proc stats

Hybrid IO switching proc stats are useful for telling us if
and why we switched to DIO.  They're also helpful for
writing tests.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I68649474cf11ffc445574fcca105a81fd6ecd458
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52596
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-13802 llite: add read & write switch thresholds 95/52595/35
Patrick Farrell [Mon, 1 Apr 2024 15:30:29 +0000 (11:30 -0400)]
LU-13802 llite: add read & write switch thresholds

The main criteria for switching to from buffered IO to
hybrid is IO size.  This adds that switching.  The correct
size for cutover is not the same for read and write, so we
have separate checks for read and write.

These checks are elaborated on in further patches, adding
different thresholds based on the backing storage type.

Adding the switching thresholds is what really enables
hybrid IO, so we have to adjust a number of tests which
assume buffered IO.

There are a few obscure hang bugs which have been difficult
to track down, and we are past feature freeze, so this patch
now leaves hybrid IO disabled by default.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I491cd7b2bdafe8bb2c1a4d692442a62154324bec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
4 weeks agoLU-17525 tests: fix sanity hash 2.15 interop 76/55076/5
Andreas Dilger [Sat, 11 May 2024 05:38:29 +0000 (23:38 -0600)]
LU-17525 tests: fix sanity hash 2.15 interop

Fix test version checks for interop testing for DNE directory hash
usage in sanity with 2.15 servers.  This incorrectly was assuming
that the CRUSH2 dir hash was included in the 2.15.0 release, but it
was not backported to that branch, and only landed in 2.15.51.

Exclude UDIO interop failures, which are fixed via LU-17525.

Fixes: 1ac4b9598a ("LU-15720 dne: add crush2 hash type")
Test-Parameters: trivial testlist=sanity serverversion=2.15.4 serverdistro=el8.9 env=SANITY_EXCEPT="56 119 398"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If2097ebc30c7c4dbce88af7774ce3c0e8fb3cb75
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55076
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
4 weeks agoLU-17783 statahead: disable batch statahead for old server 17/55017/4
Qian Yingjin [Mon, 6 May 2024 08:16:19 +0000 (04:16 -0400)]
LU-17783 statahead: disable batch statahead for old server

Disable the batch statahead for the old server that does not
support MDS_BATCH batch RPC.

Fixes: 4435d0121f ("LU-14139 statahead: batched statahead processing")
Test-Parameters: testlist=sanity serverjob=lustre-b_es6_0 serverbuildno=638 clientdistro=el9.3 serverdistro=el8.8 env=ONLY=123
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I79fba4204e0ed44e2bc9a4c4f2758d087f0e406b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55017
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
4 weeks agoLU-17867 ko2iblnd: gcc bug work around 72/55172/4
James Simmons [Wed, 22 May 2024 14:53:24 +0000 (10:53 -0400)]
LU-17867 ko2iblnd: gcc bug work around

Gcc 11 reports
 error: array subscript 'struct sockaddr_in6[0]' is partly
 outside array bounds of 'struct sockaddr[1]'

due to a bug in gcc that it becomes confused with the union.
To work around this we move to struct sockaddr_storage from
struct sockaddr.

Test-Parameters: trivial
Change-Id: I586042d6e3c59be8c63e2821659cf9d3bcdac8e3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 weeks agoLU-17662 osd-zfs: Support for ZFS 2.2.3 30/54530/9
Shaun Tancheff [Mon, 6 May 2024 03:06:31 +0000 (10:06 +0700)]
LU-17662 osd-zfs: Support for ZFS 2.2.3

ZFS commit zfs-2.2.99-269-g9b1677fb5
   dmu: Allow buffer fills to fail
Adds a boolean_t to dmu_buf_will_fill() and dmu_buf_fill_done()

Lustre always uses B_FALSE for this argument.

Also re-arrange and split some configure macros so we can all
the zfs and ldiskfs tests can be run in the same parallel pass.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I71a4723bfa8ce62ae6f270e26ab149bf98278d3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Brian Atkinson <batkinson@lanl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17477 tests: conf-sanity/48 with debug=0 99/53799/13
Alex Zhuravlev [Wed, 24 Jan 2024 07:52:20 +0000 (10:52 +0300)]
LU-17477 tests: conf-sanity/48 with debug=0

conf-sanity/48 takes quite long setting 4,5K ACLs.
debug=0 improves this significantly.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=48
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ifa39b9efc80b41050a13323474dd19b865cc6273
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 fid: rename ptlrpc_req_finished for component fid 94/54994/2
Arshad Hussain [Thu, 2 May 2024 11:28:21 +0000 (07:28 -0400)]
LU-16741 fid: rename ptlrpc_req_finished for component fid

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
fid component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: If5bf08719ab9be8255f1145fa7bcdfebd68da52c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 fld: rename ptlrpc_req_finished for component fld 93/54993/2
Arshad Hussain [Thu, 2 May 2024 11:24:57 +0000 (07:24 -0400)]
LU-16741 fld: rename ptlrpc_req_finished for component fld

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
fld component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7229ccdb4a6440700c120a5d75edd018252b0b8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54993
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ldlm: rename ptlrpc_req_finished for component ldlm 92/54992/2
Arshad Hussain [Thu, 2 May 2024 11:21:02 +0000 (07:21 -0400)]
LU-16741 ldlm: rename ptlrpc_req_finished for component ldlm

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
ldlm component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0daff368ed1b4448f236e7f8f17e1534b3db5e58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 lfsck: rename ptlrpc_req_finished for component lfsck 91/54991/2
Arshad Hussain [Thu, 2 May 2024 11:15:06 +0000 (07:15 -0400)]
LU-16741 lfsck: rename ptlrpc_req_finished for component lfsck

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
lfsck component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I57fa0bac6ecf03a6143ca8342d0fb753dc815d60
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54991
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 quota: rename ptlrpc_req_finished for component quota 90/54990/2
Arshad Hussain [Thu, 2 May 2024 11:11:06 +0000 (07:11 -0400)]
LU-16741 quota: rename ptlrpc_req_finished for component quota

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
quota component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7e671d68be8c0209a7439dc9762b5b10039aa0a3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54990
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 mgc: rename ptlrpc_req_finished for component mgc 89/54989/2
Arshad Hussain [Thu, 2 May 2024 11:07:12 +0000 (07:07 -0400)]
LU-16741 mgc: rename ptlrpc_req_finished for component mgc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
mgc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7b7fac8b3cfc30b6b6e92f68018b494d24390a7c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54989
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 ptlrpc: rename ptlrpc_req_finished for component ptlrpc 88/54988/2
Arshad Hussain [Thu, 2 May 2024 10:57:31 +0000 (06:57 -0400)]
LU-16741 ptlrpc: rename ptlrpc_req_finished for component ptlrpc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
ptlrpc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic41d76ace564132a369288676398bc881048f851
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 mdc: rename ptlrpc_req_finished for component mdc 87/54987/2
Arshad Hussain [Thu, 2 May 2024 10:49:26 +0000 (06:49 -0400)]
LU-16741 mdc: rename ptlrpc_req_finished for component mdc

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
mdc component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I46de8facbafcabbeb5c12daefcc5172f6c9bafd5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-16741 osp: rename ptlrpc_req_finished for component osp 86/54986/2
Arshad Hussain [Thu, 2 May 2024 10:40:02 +0000 (06:40 -0400)]
LU-16741 osp: rename ptlrpc_req_finished for component osp

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
osp component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0da0f922be2a062459c14585f910ef2a6c425b14
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54986
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17797 lnet: avoid use after free of lnet ifaces 75/54975/2
Shaun Tancheff [Wed, 1 May 2024 04:39:26 +0000 (11:39 +0700)]
LU-17797 lnet: avoid use after free of lnet ifaces

Durning inet4 / inet6 enumeration the array of nids can be
reallocated for freed.

When the array is freed the originating reference should be
nulled to avoid a possible use after free.

CoverityID: 425360 ("USE_AFTER_FREE")

Test-Parameters: trivial
Fixes: ab6c8bd18 ("LU-16822 lnet: always initialize IPv6 at start up")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ifd751e0c2f0095b33f8b2cd8dd58cfd8572c5ff4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54975
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17795 lnet: unused return code in lnet_peer_data_present 71/54971/2
Serguei Smirnov [Tue, 30 Apr 2024 17:55:29 +0000 (10:55 -0700)]
LU-17795 lnet: unused return code in lnet_peer_data_present

Coverity check detected an issue with the return code from the call to
lnet_peer_set_primary_nid() in the code added by LU-17379 patch.
Fix it.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: ae6d37 ("LU-17379 lnet: parallelize peer discovery via LNetAddPeer")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I8b9df330200ff2732efd2a54d8de910463993fae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54971
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17788 ptlrpc: restore watchdog revival message 42/54942/12
Andreas Dilger [Sat, 27 Apr 2024 02:48:15 +0000 (20:48 -0600)]
LU-17788 ptlrpc: restore watchdog revival message

Restore the "Service thread pid NNN completed after SSS.mmm
seconds.  This likely indicates the system was overloaded"
message that was lost during ptlrpc watchdog restructuring.

Do not rate limit this message, so that it is possible to see
when all threads are restored, even if their corresponding
"Service thread pid NNN was inactive" message was throttled.

Update recovery-small test_10a to check for these messages,
so that they are not removed again in the future.

Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Test-Parameters: testlist=recovery-small env=ONLY=10a
Fixes: fc9de679a4 ("LU-9859 libcfs: add watchdog for ptlrpc service threads.")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c7e96fb7f73ca5562a6f5ad780a79ffc83ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54942
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
5 weeks agoLU-17786 tests: use $TSTUSR instead of hard coding quota_usr 40/54940/2
James Simmons [Fri, 26 Apr 2024 22:26:46 +0000 (18:26 -0400)]
LU-17786 tests: use $TSTUSR instead of hard coding quota_usr

The bash function check_system_is_clean() hard codes the user.
For many external system due to security we can't create special
users so use $TSTUSR instead that can already exits for us.

Change-Id: I80d522f04bc813cd6d5aef000eeeb34d6ec81ebd
Fixes: 7e1fb1a296e ("LU-17179 tests: check the system is clean")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17504 build: fix lock_handle array-index-out-of-bounds 26/54926/5
Andreas Dilger [Sat, 27 Apr 2024 01:13:52 +0000 (18:13 -0700)]
LU-17504 build: fix lock_handle array-index-out-of-bounds

After Linux kernel patch "ubsan: Tighten UBSAN_BOUNDS on GCC"
(commit v6.4-rc2-1-g2d47c6956ab3), flexible trailing arrays
declared like 'lock_handle[2]' will generate warnings when
CONFIG_UBSAN & co. is enabled:

    UBSAN: array-index-out-of-bounds in ldlm_request.c:1282:18
    index 2 is out of range for type 'lustre_handle [2]'

The declaration lock_handle[LDLM_LOCKREQ_HANDLES] confuses the
compiler into thinking there are only two fields in lock_handle,
but the caller often allocates extra fields beyond this for more
locks to be cancelled due to Early Lock Cancellation or from LRU.

Rather than have a second flexible array after lustre_handle[2],
declare the whole array as flexible, and fix up the few sites
that are allocating this array to ensure LDLM_LOCKREQ_HANDLES
fields are allocated at a minimum.

This subtly changes the checks in wiretest.c due to the removal
of the 2 "base" handles in ldlm_request, but I believe this is not
changing the wire protocol because it still allocates those handles
directly, and I have verified interoperability with a 2.14.0 server.

Test-Parameters: testlist=runtests clientversion=2.14
Test-Parameters: testlist=runtests serverversion=2.14
Test-Parameters: testlist=runtests clientversion=2.15
Test-Parameters: testlist=runtests serverversion=2.15
Test-Parameters: testlist=runtests clientversion=EXA5
Test-Parameters: testlist=runtests serverversion=EXA5
Test-Parameters: testlist=runtests clientversion=EXA6
Test-Parameters: testlist=runtests serverversion=EXA6
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9695fb44f1b5c84bb750d2983cdd8b939e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54926
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17784 build: improve wiretest for flexible arrays 29/54929/2
Shaun Tancheff [Fri, 26 Apr 2024 11:24:34 +0000 (18:24 +0700)]
LU-17784 build: improve wiretest for flexible arrays

Flexible array checking can additionally probe that the size
of the array element is correct.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib7de3d156a2e77dfaf2e9ab1df8fab524c073610
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54929
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17741 gss: fix lsvcgss service for systemd 15/54915/3
Sebastien Buisson [Thu, 25 Apr 2024 16:42:44 +0000 (18:42 +0200)]
LU-17741 gss: fix lsvcgss service for systemd

Add a systemd unit file for lsvcgss service, so that the lsvcgssd
daemon can be handled correctly via systemctl.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5 clientdistro=el9.3 serverdistro=el9.3
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7581996e1e28567415da0827681841ac228ad6c5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17774 build: pass systemdsystemunitdir to "make debs" 02/54902/3
Jian Yu [Fri, 26 Apr 2024 17:10:03 +0000 (10:10 -0700)]
LU-17774 build: pass systemdsystemunitdir to "make debs"

This patch passes "--with-systemdsystemunitdir" configure
option to the configure command performed in "make debs".
It also updates debian/lustre-{client,server}-utils.install
with the detected/specified directory for systemd service files.

Test-Parameters: trivial clientdistro=ubuntu2204

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I7c36904ea0ed0f393a76b0fb0ad444b330dfa78c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17767 build: struct lsmcontext has slot or id member 81/54881/3
Sebastien Buisson [Tue, 23 Apr 2024 17:48:32 +0000 (10:48 -0700)]
LU-17767 build: struct lsmcontext has slot or id member

With Ubuntu 24.04 kernel 6.8.0-31-generic, the struct lsmcontext uses
a field named 'id' to identify the LSM module, instead of 'slot' in
previous kernel versions.

Fixes: 0e66489401 ("LU-16619 build: Ubuntu jammy 5.19 client support")
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5080e60614b42ed63103f93cae1f481851742d0b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17769 tests: run_one() repeats subtests for set duration 69/54869/6
Charlie Olmstead [Mon, 22 Apr 2024 16:37:12 +0000 (10:37 -0600)]
LU-17769 tests: run_one() repeats subtests for set duration

Implement ONLY_MINUTES=M environment variable to allow test runners
to execute a subtest for at least M minutes. Each time the subtest
completes, the duration is checked to see if it has exceeded
ONLY_MINUTES, therfore the parameter represents a minimum number
of minutes to run rather than an exact duration.

If, for some reason, both ONLY_REPEAT and ONLY_MINUTES are set,
the ONLY_REPEAT value takes precedence.

Test-Parameters: trivial testlist=sanity env=ONLY=73
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_REPEAT=10
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_MINUTES=5
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_REPEAT=100,ONLY_MINUTES=10
Test-Parameters: testlist=sanity env=ONLY=73,ONLY_REPEAT=10,ONLY_MINUTES=10
Signed-off-by: Charlie Olmstead <charlie@whamcloud.com>
Change-Id: I4b454fd8582d2b875762ee15451150afb3117d15
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54869
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17000 misc: fix strscpy() Coverity warnings 65/54865/6
Arshad Hussain [Mon, 22 Apr 2024 09:25:50 +0000 (14:55 +0530)]
LU-17000 misc: fix strscpy() Coverity warnings

Fix warning reported for use of uninitialized vairable

CoverityID: 425254 ("Uninitialized scalar variable")

Fix warning reported when changing call from strlcpy()
to strscpy()

CoverityID: 425253 ("Unsigned compared against 0")
CoverityID: 425262 ("Unsigned compared against 0")
Fixes: 7a0517fa2 ("LU-17592 build: kernel 6.8 removed strlcpy()")

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Id3804c77a105e4776a0242db787dc1ca2528d9ca
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17761 tests: make sanity-compr sanity/sanityn return 0 55/54855/2
Jian Yu [Fri, 19 Apr 2024 18:54:04 +0000 (11:54 -0700)]
LU-17761 tests: make sanity-compr sanity/sanityn return 0

While running sanity-compr sanity/sanityn, if there was
sub-subtest failure, the sanity/sanityn test_cleanup would
be incorrectly marked as FAIL.

We should leave it to the individual sanity/sanityn subtests
to mark their failures, test_sanity() and test_sanityn()
should not also return an error.

Change-Id: I1fd645b80b92e583f1a564f85e6d2d6d871b8fa8
Test-Parameters: trivial testlist=sanity-compr
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54855
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-14391 lnet: optimize the Netlink packet size for routes 44/54844/12
James Simmons [Fri, 26 Apr 2024 17:15:02 +0000 (13:15 -0400)]
LU-14391 lnet: optimize the Netlink packet size for routes

Currently Netlink by default sets its maximum packet size
to send back to user land to 64K. Some sites setup many
routes, above ~430, which exceed this limit. We can avoid
this limitation by calculate about the actually size of
the netlink packet and setting cb->min_dump_alloc. The
new max is then 4GB which should be plenty (27K of routes)

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ica01f0cf290992a5d27b8ac2d09508d0a6e8151a
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54844
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17455 scripts: add IPv6 support to ksocklnd-config 33/54833/4
Serguei Smirnov [Wed, 17 Apr 2024 21:15:22 +0000 (14:15 -0700)]
LU-17455 scripts: add IPv6 support to ksocklnd-config

Expand ksocklnd-config script to support IPv6.
For every interface listed as the argument, check if IPv6
address is configured and set up routing accordingly.
The change replicates existing behavior for IPv4:
   - if existing route is found for the interface,
     or skip_mr_routing is enabled, the script skips
     adding a new route and prints a warning
   - if default gateway is found on the same subnet,
     a source-based rule and route are added for the
     IP/interface using the gateway
   - if default gateway is not found, a source-based rule
     and a local route are added for the IP/interface

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I69e249f2858a201f1b108afa05cce9fdf4ee8c80
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-14535 utils: fix FORWARD_NULL issue from Coverity 27/54827/4
Hongchao Zhang [Sun, 14 Apr 2024 23:13:57 +0000 (07:13 +0800)]
LU-14535 utils: fix FORWARD_NULL issue from Coverity

Fixing the possible NULL pointer issued reported from Coverity

   case 'e':
CID 424708:    (FORWARD_NULL)
Passing null pointer "optarg" to "strtoul", which dereferences it.
      end_qid = strtoul(optarg, NULL, 0);
      break;

CoverityID: 424708 ("FORWARD NULL")

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Idfb5cb4c6fe63ec08dd9048742f3f280b125eb8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54827
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17625 statahead: avoid to use @sai after its has been freed 26/54826/3
Qian Yingjin [Wed, 17 Apr 2024 08:22:02 +0000 (04:22 -0400)]
LU-17625 statahead: avoid to use @sai after its has been freed

There is a race between a statahead thread startup and another
statahead reqeust trying to access the same statahead structure.
But the statahead thread startup was failed and free the statahead
structure too earlier. The user stat() request will use the
statahead structure which memory has been freed already wrongly...

In this patch, we repace the @ll_sai_free/@ll_sax_free with
@ll_sai_put/@ll_sax_put to avoid freeing the statahead structure
too eariler when they were still being used by user stat()
request.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3840be959160aed2887a91be81da05f796306cd9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54826
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17734 build: Debian: oblige --disable-tests if asked 64/54764/4
Ellis Wilson [Fri, 15 Oct 2021 20:23:25 +0000 (16:23 -0400)]
LU-17734 build: Debian: oblige --disable-tests if asked

Do not disable tests by default for debian-based builds, but permit
users to disable them if they choose by passing in --disable-tests.

Test-Parameters: trivial
Signed-off-by: Ellis Wilson <elliswilson@microsoft.com>
Change-Id: I90088e6e95fa9e46ae063dfc061a324293fde9a2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54764
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17714 gss: protect against revoked session keyring 06/54706/5
Sebastien Buisson [Mon, 8 Apr 2024 15:52:50 +0000 (17:52 +0200)]
LU-17714 gss: protect against revoked session keyring

In case the session keyring is revoked, request_key() still tries to
search it. Sadly this keyring is searched before the user keyring, so
it will return -EKEYREVOKED, and the user keyring, that does contain
the Lustre key, will not even be searched.
To work around this issue in the kernel implementation of request_key,
override the current process's credentials with no session keyring,
if we detect it has been revoked.

Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.9
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I64b6ac4693a47cf43d6fa1bf4e17bfb4907670fa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54706
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17714 gss: cleanup user keyring usage 92/54692/8
Sebastien Buisson [Mon, 8 Apr 2024 09:06:50 +0000 (11:06 +0200)]
LU-17714 gss: cleanup user keyring usage

User keys are linked to the user keyring. But we should not keep an
extra reference on the user keyring for every user key being created.
This leads to too many references on this keyring, and prevents proper
destroy in case the system wants to clean it up (because the user
logged off for instance).
And when unlinking a user key, we need to take care of the user
namespace, in order to fetch the real user keyring, and not the one
associated with the mapped uid in the user namespace.
Finally we must handle the case where the user key is explicitly
revoked via 'keyctl revoke' on the command line, by carrying out the
same cleanup as when 'lfs flushctx' is called. This properly drops
references on the key, and frees the security context associated with
the key.

Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.9
Fixes: 02b456e4a4 ("LU-17173 gss: user keys go to user keyring")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic168b68f8652689aa4402eaa4fcdbd852743d320
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54692
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Bruno Faccini <bfaccini@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17704 revert: "LU-17379 ptlrpc: fix check for callback discard" 86/54686/4
Andreas Dilger [Fri, 5 Apr 2024 22:42:48 +0000 (22:42 +0000)]
LU-17704 revert: "LU-17379 ptlrpc: fix check for callback discard"

This reverts commit a6886dba0ed8a622c9831cd33d310d933492c72d.
This is failing dbench intermittently in sanity-benchmark.

Change-Id: Id3720c79ca8dd9276e086aab5d3fcfe43ddd680a
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54686
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
5 weeks agoLU-17657 build: gcc 13 stricter enum checking 68/54468/6
Shaun Tancheff [Fri, 26 Apr 2024 15:25:19 +0000 (22:25 +0700)]
LU-17657 build: gcc 13 stricter enum checking

gcc 13 does not allow mixing of enum and integer
types between function declaration and implementation.

Cleanup a couple of instances where an enum is treated
as an uint32_t / __u32 and treat it as an enum type.

lustre/lov/lov_ea.c: In function 'lsme_unpack_comp':
lustre/lov/lov_ea.c:531:21: error: array subscript
   'struct lov_stripe_md_entry[0]' is partly outside array bounds
    of 'struct lov_stripe_md_entry[0]' [-Werror=array-bounds=]
  531 |                 lsme->lsme_magic = magic;

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8e2ef989ecbdebe5e13bcea0fbb210c4a14eb45e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54468
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 weeks agoLU-17580 llite: Remove all referance of LOOKUP_CONTINUE 69/54169/7
Arshad Hussain [Sun, 25 Feb 2024 01:13:22 +0000 (06:43 +0530)]
LU-17580 llite: Remove all referance of LOOKUP_CONTINUE

Newer kernel (3.1 and beyond) LOOKUP_CONTINUE flag is
replaced/same as LOOKUP_PARENT flag. Can safely
remove any definations of LOOKUP_CONTINUE

Linux-commit: 49084c3bb2055c401f3493c13edae14d49128ca0
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I05eac0ec1321d230c7a215f95888d4040b7c670a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54169
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 weeks agoLU-13791 mdt: allow using symbolic capability names 18/54118/3
Andreas Dilger [Wed, 21 Feb 2024 00:59:25 +0000 (17:59 -0700)]
LU-13791 mdt: allow using symbolic capability names

Allow "mdt.*.enable_cap_mask" param set and print symbolic names,
similar to the "debug" and "subsystem_debug" parameters.  The
allowed parameter names are in the capabilities(7) man page, in
either upper or lowercase, like cap_chown, cap_dac_read_search,
etc. along with "all" to enable all capabilities if clients are
trusted.  For example:

    lctl set_param -P mdt.lfs-*.enable_cap_mask=+cap_dac_read_search

Since kernel_cap_t is a 64-bit value, enhance cfs_str2mask() to
take u64 mask arguments.  The calling libcfs_debug_str2mask()
sticks with "int mask" for now.

Split the core out from libcfs_debug_mask2str() into a new helper
function cfs_mask2str() so it can be called directly.

Fixes: 54f677651b ("LU-13791 mdt: parameter to tune capabilities")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3f71f61a17d4d3614e46a526c60e709d9eb825b3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54118
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17523 ldiskfs: sync series to include el8.4 92/53992/7
Shaun Tancheff [Tue, 5 Mar 2024 02:23:33 +0000 (09:23 +0700)]
LU-17523 ldiskfs: sync series to include el8.4

el8.4 .5 and .6 include:
  rhel8/ext4-deep-tree.patch
  rhel7.6/ext4-dquot-commit-speedup.patch
  rhel8/ext4-ext-merge.patch
  rhel8/ext4-mballoc-dense.patch

el8.6 include:
  rhel8/ext4-race-in-ext4-destroy-inode.patch
  rhel8/ext4-mballoc-dense.patch

el8.7 include:
  rhel8/ext4-deep-tree.patch
  rhel8/ext4-race-in-ext4-destroy-inode.patch
  rhel8/ext4-mballoc-dense.patch

el8.8 and .9 include:
  rhel8/ext4-limit-per-inode-preallocation-list.patch

el8.9 include:
  rhel8/ext4-race-in-ext4-destroy-inode.patch
  rhel8/ext4-mballoc-dense.patch

Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.9 serverdistro=el8.9 testlist=sanity
Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.8 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.7 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.6 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.5 testlist=sanity
Test-Parameters: optional fstype=ldiskfs clientdistro=el8.8 serverdistro=el8.4 testlist=sanity
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2f5515947a16dff7f2502ec281675f56b2470ea7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53992
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17483 gss: refresh req context with already existing one 59/53859/8
Sebastien Buisson [Tue, 30 Jan 2024 12:13:52 +0000 (13:13 +0100)]
LU-17483 gss: refresh req context with already existing one

When we are processing a request with a root GSS context that
has the PTLRPC_CTX_ERROR_BIT bit set, try to replace it with an
already existing context. Such a context can already be up-to-date
thanks to other authentication requests sent to failover NIDs while
the current request was in the delay list. This valid context can be
fetched from the struct ptlrpc_sec.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iff1cf727c4579cba6456e010aac6537cf888b0ae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53859
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-12885 mds: add enums for MDS_OPEN flags 69/36469/33
Andreas Dilger [Tue, 9 Apr 2024 08:22:07 +0000 (04:22 -0400)]
LU-12885 mds: add enums for MDS_OPEN flags

This patch is first of the series of patch that separates
kernel open flags from MDS open flags

The first step is to add enum mds_open_flags to the code to
make it easier to follow the logic. Rename it_flags to
it_open_flags and use enum mds_open_flags in the code so it
is clear that MDS_OPEN flags are being used.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I933a6e6102f947a9276cb6bf03826fd4a53ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/36469
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
5 weeks agoLU-11085 ldlm: save space in struct ldlm_lock 31/53931/9
Mr NeilBrown [Mon, 5 Feb 2024 22:46:49 +0000 (09:46 +1100)]
LU-11085 ldlm: save space in struct ldlm_lock

Moving the 'interval' handle into ldlm_lock has made the structure
bigger.  Compensate for this by shared space for fields only needs for
specific lock types.

i.e.  some fields are only needed for EXTENT locks, some for FLOCK
locks, some for PLAIN and IBITS which use "skiplists".

One x86_64 the reduces the size of ldlm_lock to what is was before the
previous patch.  A future patch will reduce it even more.

As extent and flock both used the interval tree node, they now have
different instances.  So the names in flock are changed.  Both of
these will disappear in future patches.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Iec92a41c174e4884852ebf8fbb2cd50d4e165035
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53931
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
5 weeks agoLU-11085 ldlm: simplify use of interval-tree. 21/33221/28
NeilBrown [Wed, 25 Mar 2020 02:50:16 +0000 (22:50 -0400)]
LU-11085 ldlm: simplify use of interval-tree.

The interval tree used for keeping track of extent locks is currently
separate from those locks themselves.  A separate 'ldlm_interval'
structure is allocated and linked to all locks which have the same
extent.

This requires that the interval tree library handles an insert where
exactly the same interval already exists differently from any other
insert.  No other users of the interval tree library wants this, and
the library which is part of linux doesn't support it.  So it would be
good to remove this requirement.

This patch changes the library, removes the 'ldlm_interval' structure,
and stores each lock in the tree.  This substantially simplifies a lot
of code, but has some costs.

The ldlm_lock is now larger - it contains three pointers for the
rbtree where previously it had one, and it now has an extra copy of
the range start/end.  These will be resolved in later patches by
removing duplication and sharing space with other fields that aren't
used for extent locks.

The extent-tree can now be substantially larger as it now contains
every lock for a given extent rather than each extent only once.  As
the depth of the tree grows with the log of the number of elements,
this isn't an enormous cost, but it may still be measurable.  In
particular, locks that cover the full extent [0..MAX] are common and
can swamp other locks (citation needed).  Such locks can be easily
kept in a separate list.  This will restore some of the code
complexity, but is otherwise of little cost.

Linux-commit: 71236833ad7a98b69e6e675efefbdc04a74c1d4b

Change-Id: I6c82d971aabd02bb036ac0bd27a934d48e972895
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/33221
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
5 weeks agoLU-14810 lnet: ongoing push when discovery is stopped 84/54884/3
Cyril Bordage [Wed, 24 Apr 2024 02:21:53 +0000 (04:21 +0200)]
LU-14810 lnet: ongoing push when discovery is stopped

If a push is not completed when discovery thread is stopped, then we
still have ln_dc_handler used as md handler (from
lnet_peer_send_push). That leads to assert failure from
lnet_assert_handler_unused.

To fix that, we call lnet_assert_handler_unused only after the monitor
thread has been stopped. Thus, the patch for LU-17496 is not needed
anymore.

Fixes: 36b14a23a6 ("LU-17207 lnet: race b/w monitor thr stop and discovery push")
Test-Parameters: testlist=sanity-lnet env=ONLY="212 220",ONLY_REPEAT=100
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I426c37b12a3d29327a7295f528a5b875a9ac88a0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54884
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17745 llite: fix the umount panic due to BDI unregister 50/54850/4
Qian Yingjin [Fri, 19 Apr 2024 02:53:10 +0000 (22:53 -0400)]
LU-17745 llite: fix the umount panic due to BDI unregister

There is a regression in the patch for LU-16954 on the old RHEL
kernel (RHEL8.2). When the Lustre is unmounted, the client gets
a crash.

In LU-16954, to avoid the remount failure, we explicitly
unregister the sysfs for the @bdi on the new kernel such as Unbutu
2204 v5.15 kernel.
However, this is not needed for the old kernel such RHEL 8.2.
In this patch, we remove the explicit unregister for the old kenel
to avoid the client crash during unmount.

Fixes: dcc1dd39a6 ("LU-16954 llite: add SB_I_CGROUPWB on super block for cgroup")
Test-Parameters: clientdistro=ubuntu2204 testlist=sanity-sec
Test-Parameters: clientdistro=el8.9 testlist=sanity-sec
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ic6df572744bed8994c08fb1369cc9beccbe2d87a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54850
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-6142 osd-zfs: Fix style issues for osd_io.c 64/54264/5
Arshad Hussain [Mon, 4 Mar 2024 07:45:23 +0000 (02:45 -0500)]
LU-6142 osd-zfs: Fix style issues for osd_io.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_io.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia9153be34a1d583195e3ecfc56ca4ab279781566
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-17743 ko2iblnd: move to struct lnet_nid 71/54771/6
James Simmons [Thu, 25 Apr 2024 23:00:24 +0000 (19:00 -0400)]
LU-17743 ko2iblnd: move to struct lnet_nid

Move all non wire data structures using lnet_nid_t to
struct lnet_nid. This is the first step to support
IPv6 / GUID.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I9d1281a1b7ab7bda566369be2bc5f07ba3ce17f9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 weeks agoLU-13814 osc: Remove osc delete for transient pages 79/52079/19
Patrick Farrell [Fri, 23 Feb 2024 16:16:42 +0000 (11:16 -0500)]
LU-13814 osc: Remove osc delete for transient pages

Transient pages do not need an extra reference for being
part of a transfer, because they are referenced throughout
by cl_io.  This requires a tweak to the page completion
behavior.

This allows us to remove osc_page_delete for transient
pages.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I96539731f972b19830b2e08bf0f1d1f1e9674241
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
5 weeks agoLU-13814 osc: specialize osc_page_delete 78/52078/20
Patrick Farrell [Fri, 23 Feb 2024 16:05:35 +0000 (11:05 -0500)]
LU-13814 osc: specialize osc_page_delete

Nearly all of osc_page_delete is only done for cacheable pages,
so make that explicit.  osc_lru_del() doesn't do anything because
transient pages can't go in the LRU.  In osc_teardown_async_page(),
the latter side of the if statement is a search in cache, so it
never finds the page, then the earlier part is a check that the
page isn't in an RPC.  That's not really possible for DIO pages
unless something is *really* off.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I998fc196c276aa97829f5b368e23aa4b7a797294
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
5 weeks agoLU-17524 llite: DIO and writev and readv syscalls 96/53996/19
Shaun Tancheff [Wed, 24 Apr 2024 22:24:44 +0000 (18:24 -0400)]
LU-17524 llite: DIO and writev and readv syscalls

Linux kernel v3.15-rc4-329-g62a8067a7f35
  bio_vec-backed iov_iter
Introduced iov_iter_get_pages_alloc

In kernels prior to iov_iter_get_pages_alloc the family
of iovec iter syscalls such as readv and writev fail to
interate over the the iovec segments.

In this case the iter() handler should submit the iovec
while looping over the segments.

Linux kernel v5.19-10287-gfcb14cb1bdac
  new iov_iter flavour - ITER_UBUF

This introduce user_backed_iter() and provide a user_backed_iter
for older kernels.

Fixes: 0006eb3644 ("LU-16328 llite: migrate_folio, vfs_setxattr")
Fixes: 044503492c ("LU-6260 llite: add support for new iter functionality")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idec6a956918a1744f2801ffce9b40acb2c074523
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53996
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
5 weeks agoLU-16822 tests: Update sanity-lnet router tests for IPv6 28/53728/6
Chris Horn [Thu, 25 Apr 2024 17:36:25 +0000 (13:36 -0400)]
LU-16822 tests: Update sanity-lnet router tests for IPv6

Modify sanity-lnet test cases that test routing to work with IPv6
NIDs.

test_100/102/105/106:
  - Modified to use setup_router_test() to create a real router and
    use the associated LNet configuration in their tests.
test_101/103:
  - These test cases exercise the NID range functionality. They are
    skipped under IPv6 config

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I47b23e9c63d74d937cae7c7b8b1b27dd383fc0dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53728
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 weeks agoNew tag 2.15.63 2.15.63 v2_15_63
Oleg Drokin [Thu, 2 May 2024 05:05:18 +0000 (01:05 -0400)]
New tag 2.15.63

Change-Id: I2ceb1e0afe9bd966555579b5d70bd263016884e2
Signed-off-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17504 build: fix gcc-13 [-Werror=stringop-overread] error 34/54834/6
Shaun Tancheff [Thu, 25 Apr 2024 17:57:36 +0000 (00:57 +0700)]
LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error

This patch fixes the following [-Werror=stringop-overread] and
[-Werror=attribute-warning] errors detected by gcc 13:

lustre/mgc/mgc_request.c:190:21: error: 'strcmp' reading 1 or
more bytes from a region of size 0 [-Werror=stringop-overread]
  190 | if (strcmp(logname, cld->cld_logname) == 0) {
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In function 'fortify_memcpy_chk',
    inlined from 'class_handle_ioctl' at
/root/lustre-release/lustre/obdclass/class_obd.c:381:3:
include/linux/fortify-string.h:528:25: error:
call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  528 |  __write_overflow_field(p_size_field, size);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f5a88b4cd64c9f4e67e568546baada371543b1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 weeks agoLU-17587 build: use kernel version from dkms for client 30/54830/2
snehring [Wed, 17 Apr 2024 16:09:17 +0000 (11:09 -0500)]
LU-17587 build: use kernel version from dkms for client

The current behavior of the dkms build for clients is to only build
for the running kernel. This is fine if the other kernels are ABI
compatible with the running kernel because we tell dkms to run
weak-updates as part of the install process. However, if kernels that
are not ABI compatible with the running kernel are installed they
won't be targeted and weak-updates won't add in the modules. This
could be worked around by running 'dkms install' once booted into the
new kernel, but that's additional administrator overhead and not the
assumed behavior for a dkms module.

This modifies the dkms build script to accept the kernel version from
dkms and configure for that version. It also changes the behavior of
dkms wrt lustre to disable weak module updates since we're now
building for individual kernel versions. This will likely result in
longer times to install the client since we're building for each
installed version of the kernel, but it _should_ mean the client is
actually installed for each version.

Signed-off-by: snehring <snehring@iastate.edu>
Change-Id: I55fb1bb7159772d7ecd9d1837e870c7097c02d78
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17736 tests: Fix sanityn/73 for test machines with auditd 09/54809/3
Ellis Wilson [Fri, 8 Oct 2021 14:27:39 +0000 (10:27 -0400)]
LU-17736 tests: Fix sanityn/73 for test machines with auditd

getfattr performs one stat followed by two getxattr syscalls against
the provided file.  Normally, the stat results in no getxattr calls
internally (as it's not something stat is required to return).

However, if auditd is enabled AND one of the rules includes a
filesystem-specific rule such as watch directory X and record if it's
modified, then for every lookup (each of the three syscalls includes
one) an additional getxattr will be performed, resulting in 5 total
getxattrs.

Because there is significant fuzz here, revise the check to be
at minimum the two "expected" getxattrs but allow for more.
Comments have been added explaining this.

Signed-off-by: Ellis Wilson <elliswilson@microsoft.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=73,ONLY_REPEAT=10
Change-Id: I0da5c2a5331f7dba4e65051a073e2bec05327a25
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54809
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 weeks agoLU-17362 build: Update ZFS version to 2.1.15 69/54769/2
Jian Yu [Fri, 12 Apr 2024 15:46:44 +0000 (08:46 -0700)]
LU-17362 build: Update ZFS version to 2.1.15

Update ZFS version to 2.1.15. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.15

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.3 testlist=sanity

Test-Parameters: optional fstype=zfs testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs testgroup=full-dne-zfs-part-3

Change-Id: I51532dbf9dbcadf64bb9dbd3b10e88d0cab38ffd
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17727 tests: add to auster --stop-on-error option 55/54755/3
Xiaolin (Charlene) Zang [Thu, 8 Jul 2021 04:32:40 +0000 (00:32 -0400)]
LU-17727 tests: add to auster --stop-on-error option

add to auster --stop-on-error option, a comma separated list of tests.

If any such test fails, auster will exit immediately without any
cleanup to make debugging particularly difficult and rare bugs more
tractable.

Signed-off-by: Xiaolin (Charlene) Zang <xiaolinzang@microsoft.com>
Change-Id: Icd8d1eaf8ae799bd74f9147ac9080a0950977526
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54755
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17497 tests: skip sanity-sec/69 for old MDS 82/54782/2
Andreas Dilger [Sun, 14 Apr 2024 07:43:08 +0000 (01:43 -0600)]
LU-17497 tests: skip sanity-sec/69 for old MDS

Older MDS versions do not have strict checking for identity_upcall
or rsi_upcall, don't run the test with those servers.

Test-Parameters: trivial testlist=sanity-sec env=ONLY=69 serverversion=2.15
Fixes: 2153e86541 ("LU-17497 obdclass: check upcall incorrect values")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icdfda82eca32c2de7e88991ead0d9723023ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-16741 lvm: rename ptlrpc_req_finished for component lvm 93/54693/2
Arshad Hussain [Mon, 8 Apr 2024 10:51:37 +0000 (06:51 -0400)]
LU-16741 lvm: rename ptlrpc_req_finished for component lvm

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
lvm component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I58dd90e4ae1a8834866491bf866cbacbd1c6e609
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54693
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17706 lnet: reserve TOFULND and EFALND 74/54674/2
Andreas Dilger [Thu, 4 Apr 2024 18:42:02 +0000 (12:42 -0600)]
LU-17706 lnet: reserve TOFULND and EFALND

Reserve network numbers for Fujitsu Torus Fusion LND and Amazon
Elastic Fabric Adapter LND to avoid hard-to-fix conflicts in the
future.

Add comments for the other LND numbers to provide some context.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icea6cecf5a951c5a44527c937a2631c9cc3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54674
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17703 lod: check the inherited pool for conflicts 61/54661/5
Vitaly Fertman [Wed, 3 Apr 2024 20:33:20 +0000 (23:33 +0300)]
LU-17703 lod: check the inherited pool for conflicts

In addition to LU-15658, the start index could be inherited from
parent and the pool from root: drop the pool in case of conflict
as well.

Another case of a problem inheritance is saving the inherited LOVEA
to subdir, when all the parameters are inherited but the ost list.

HPE-bug-id: LUS-11330, LUS-11631
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ief1dbd8c1ee0433bb625cbff1834b248d4fb2992
Reviewed-on: https://es-gerrit.hpc.amslabs.hpecorp.net/161800
Tested-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54661
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17669 test: using unintialized variable in sanity:160n 49/54549/2
Li Xi [Mon, 25 Mar 2024 02:20:35 +0000 (10:20 +0800)]
LU-17669 test: using unintialized variable in sanity:160n

This patch fix a simple typo of unintialized variable.

Fixes: d813c75df ("LU-14688 mdt: changelog purge deletes plain llog")

Test-Parameters: trivial testlist=sanity env=ONLY=160n
Change-Id: I2e29cce33733c925dfe9a53c06af7ac17b2c6be3
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54549
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-930 ptlrpc: quiet idle import logging 40/54540/2
Andreas Dilger [Fri, 22 Mar 2024 23:20:38 +0000 (16:20 -0700)]
LU-930 ptlrpc: quiet idle import logging

Don't log a debug message for every idle import every 25s, as this
pushes out other more important messages from the logs.

Fixes: 5a6ceb664f ("LU-7236 ptlrpc: idle connections can disconnect")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id98c2acad07cec62af0d705a437a4d2915ce9f62
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17431 utils: add 'dynamic' parameter to nodemap_cmd 03/54503/10
Sebastien Buisson [Wed, 20 Mar 2024 08:05:41 +0000 (09:05 +0100)]
LU-17431 utils: add 'dynamic' parameter to nodemap_cmd

Adding a 'dynamic' parameter to nodemap_cmd() will enable
'lctl nodemap_*' commands to handle dynamic nodemaps, i.e.
nodemaps created directly on MDS/OSS side, and stored in memory.

If both MDT and OST are running on the same node, the MDS device
is used for the ioctl.  It doesn't matter which one is actually
used, since it gets to the same place in ptlrpc anyway, it just
needs to find a valid OBD device to run the ioctl.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id58199e1ad6622aad896737604c0a8e1287ba34e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54503
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 weeks agoLU-17431 nodemap: add function to know if nodemap is on MGS 06/54506/8
Sebastien Buisson [Wed, 20 Mar 2024 08:33:11 +0000 (09:33 +0100)]
LU-17431 nodemap: add function to know if nodemap is on MGS

Adding nodemap_mgs() function allows to know if nodemaps are defined
on an MGS node (pointer to a nodemap config file) or not.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id87e34dd8d13cd21c88c87ef9e8e91ff9ff142c8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54506
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17627 build: fix new mofed version 36/54336/5
Minh Diep [Wed, 6 Mar 2024 02:26:58 +0000 (18:26 -0800)]
LU-17627 build: fix new mofed version

Allow multi-digit MOFED version numbers.
Fix compare_version function to return what it should

Test-Parameters: trivial
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: I0f585cb355bb34270003ae1139688080c301186a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54336
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
8 weeks agoLU-17592 build: kernel 6.8 -Werror=missing-prototypes 28/54228/13
Shaun Tancheff [Mon, 15 Apr 2024 18:29:30 +0000 (11:29 -0700)]
LU-17592 build: kernel 6.8 -Werror=missing-prototypes

Linux commit v6.7-rc4-156-g0fcb70851fbf
  Makefile.extrawarn: turn on missing-prototypes globally

With -Wmissing-prototypes and -Werror cleanup some additional
funtions that are implicitly static and provide declarations
for those that are exported.

Add SERVER_ONLY and SERVER_ONLY_EXPORT_SYMBOL to wrap functions
that are only exported for and used by server components.

Test-Parameters: trivial
HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ice5219df5463effe964d2cd2114f003d185337da
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 weeks agoLU-17379 lnet: parallelize peer discovery via LNetAddPeer 33/53933/10
Serguei Smirnov [Tue, 6 Feb 2024 03:24:01 +0000 (19:24 -0800)]
LU-17379 lnet: parallelize peer discovery via LNetAddPeer

Initiate peer discovery via its non-primary NIDs
as they are being added in LNetAddPeer by pretending
that they belong to different peers. This may be
useful if some of the comma-separated NIDs in the
mount command (including the first listed NID) are down.
If discovery is performed in the background and there's
at least one reachable NID in the list, the discovery
will succeed and peer records will get consolidated.

If primary NID locking is enabled, The first NID in the list
provided by Lustre to LNetAddPeer always gets locked as primary:
even if it doesn't get discovered.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I449cb9898c0242db874555a62fe8099352e913e6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53933
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>