Whamcloud - gitweb
fs/lustre-release.git
2 months agoLU-10499 pcc: check first before set PCC-RO on a file 70/54370/8
Qian Yingjin [Fri, 5 Feb 2021 03:48:26 +0000 (11:48 +0800)]
LU-10499 pcc: check first before set PCC-RO on a file

In this patch, MDT takes a CR layout lock against the file object
first to check whether the file is already PCC-RO cached. If so,
return immediately; Otherwise, take an EX lock on the file to
update the FLR PCC-RO state accordingly. By this check, it can
avoid heavy lock contention and unnecessary revocation of the
layout lock granted to the other clients when multiple processes
from many clients perform read-only attach on a shared file
simultaneously.

EX-bug-id: EX-2455
Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If59315abe444917f8a890b60a38c239b8ee045bf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54370
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-14003 pcc: rework PCC mmap implementation 92/40092/22
Qian Yingjin [Wed, 30 Sep 2020 03:00:43 +0000 (11:00 +0800)]
LU-14003 pcc: rework PCC mmap implementation

In the old PCC mmap implementation, it replaces the vm_file with
the file of the PCC copy, and then call ->fault() or
->page_mkwrite() on the PCC copy, after that restore the vm_file
with the one of the Lustre file.
This design exists problem as a mmaped region (vma) could be
faulted concurrently with multiple children threads (each children
threads can clone the VM of the parent process). There is no any
atomic guarantee for the replacement and restore the vm_file during
calling ->fault() or ->page_mkwrite().

This patch reworks the mmap() implementation for PCC.
In the new design, PCC mmap replaces the inode mapping of the PCC
copy on the PCC backend filesystem with the one of the Lustre file.
By this way, the mmaped region (vma) will link into the mapping of
the Lustre inode not the mapping of the PCC copy.
It keeps using vm_file with the file handle of the PCC copy until
the PCC cached file is detached or unmmaped.

LU-14003 pcc: convert mapping pagecache for mmap

In the PCC mmap implementation, it will replace the mapping of
the PCC copy with the one of the Lustre file when do mmap() to
make the mmapped region (vma) link into the mapping of the
Lustre file not the mapping of the PCC copy.
At this time, in the old design the pagecache in the original
mapping of the PCC copy is simply dropped as the mapping of each
page is different after the replacement of the mapping.

This may have negative impact on the mmap performance.
The reason is that during PCC attach it will write the data from
Lustre into PCC copy in buffered I/O mode, these data will keep
in pagecache and managed by the mapping of the PCC copy if there
is enough system memory. Then for the latter mmap, the page fault
could directly read data from the pagecache to speed up the mmap
operation.
If drop these pagecahe due to the different mapping of each pages,
the page fault must read page from the disk and may result in bad
performance.

To make full use of these pagecache of the PCC copy, during mmap
call, it can first remove the page from the original mapping of
the PCC copy, and then convert and add it into the mapping of the
Lustre file. By this way, all pagecaches are converted and can be
reused for the latter page fault.
Was-Change-Id: I1591937543d7d31b8811ec62088accd0070d7d37

EX-8421 llite: disable kernel readahead for pcc mmap

Set ra_pages to 0 for PCC files when mmaped, because
otherwise this setting carries through to Lustre and will
cause crashes and possible inconsistencies.  This happens
because the PCC file and Lustre file share a mapping, which
is a weird trick required to have mmap work on PCC.

Add a set of asserts which confirm kernel readahead is
disabled and wasn't used for mmap.
Was-Change-Id: I117042d68fac25158e8141c243acba698cf1930f

LU-17866 pcc: zero ra_pages explictly for a file after PCC mmap

To support mmap under PCC, we do some special magic with mmap to
allow Lustre and PCC to share the page mapping.
The mapping host (@mapping->host) for the Lustre file is replaced
with the PCC copy for mmap. This may result in the wrong setting
of @ra_pages for the Lustre file handle with the backing store of
the PCC copy in the kernel:
->do_dentry_open()->file_ra_state_init():
file_ra_state_init(struct file_ra_state *ra,
   struct address_space *mapping)
{
ra->ra_pages = inode_to_bdi(mapping->host)->ra_pages;
ra->prev_pos = -1;
}

Setting readahead pages for a file handle is the last step of the
open() call and it is not under the control inside the Lustre file
system.
Thus, to avoid setting @ra_pages wrongly we set @ra_pages with
zero for Lustre file handle explictly in all read I/O path.

When invalidate a PCC copy, we will switch back the mapping
between Lustre and PCC. We also set mapping->a_ops back with
@ll_aops.
The readahead path in PCC backend may enter the ->readpage() in
Lustre. Then we check whethter the file handle is a Lustre file
handle. If not, it should be from mmap readahead I/O path of the
PCC copy and return error code directly in this case.
Was-Change-Id: Id1e4a9e47bb484e97053759e1743fd2fce040149

Test-Parameters: clientcount=3 testlist=sanity-pcc,sanity-pcc,sanity-pcc
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Icc5019a691dfb04b5e1fdd580d83915cfe590158
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40092
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17464 lod: use OBD_ALLOC_LARGE for ldo_comp_entries 49/55449/3
Bobi Jam [Mon, 17 Jun 2024 10:10:05 +0000 (18:10 +0800)]
LU-17464 lod: use OBD_ALLOC_LARGE for ldo_comp_entries

The lod_object::ldo_comp_entries is allocated/free with _LARGE macros
so that it could be large enought to use vmalloc instead of kmalloc
for memory allocation. There are some places use OBD_ALLOC without
_LARGE to re-allocate memory which mismatch the assumption.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie356ae875329af07c893586fa4b1485dbd17afe6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55449
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-6142 lnet: Fix style issues for console.[ch] 48/55448/2
Arshad Hussain [Mon, 17 Jun 2024 08:29:19 +0000 (04:29 -0400)]
LU-6142 lnet: Fix style issues for console.[ch]

This patch fixes issues reported by checkpatch
for file lnet/selftest/console.[ch]

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I484b2ffee5d5add360055b424e23fdc97c5618ae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55448
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16822 tests: Modify local_node to check for IPv6 63/55463/4
Chris Horn [Mon, 17 Jun 2024 18:09:06 +0000 (12:09 -0600)]
LU-16822 tests: Modify local_node to check for IPv6

Nodes may be configured with just IPv6 addresses, so local_node()
needs to look for both IPv4 and IPv6 addresses to determine if a given
host is local.

sanity-lnet/110 is re-written so that it does not rely on
local_addr_list(). Otherwise the test may attempt to configure an NI
using an invalid address. This test case can now execute on o2ib
configs.

Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id3471376dbb2089a44b00ed7cb9bc2256e5e7501
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55463
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17404 kernel: fix filemap_splice_read detection 54/55454/3
Sebastien Buisson [Mon, 17 Jun 2024 14:29:33 +0000 (16:29 +0200)]
LU-17404 kernel: fix filemap_splice_read detection

On Centos 9 kernel 5.14, filemap_splice_read is in the header files,
but the symbol is not exported by the kernel.
So instead of trying to build a kernel module with a call to this
function, just use LB_CHECK_EXPORT on this symbol.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1f55d0b41c46a992204c1cebc3f5c8c7dbc6128e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55454
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 months agoLU-16822 tests: Force IPv6 testing in mixed environment 22/55422/9
Srikanth Ramamurthy [Mon, 17 Jun 2024 17:57:40 +0000 (11:57 -0600)]
LU-16822 tests: Force IPv6 testing in mixed environment

When an interface has both IPv4 and v6 addresses LNet will, by
default, configure the NI using the v4 address. The '--large' option
to lnetctl lnet configure tells LNet to configure the NI using the
v6 address instead. This patch adds a test-framework environment
variable that, when set, passes the --large option to lnet configure.
This allows us to force testing of IPv6 when running in a mixed v4/v6
environment.

This patch implements ip_is_v6() which is needed by some of the
router tests when using IPv6 NIDs.

Some test cases are added to the except list:
 - 230 requires lctl conn_list to be updated to work with large nids.
 - 303, and 500 have been found to trip LU-17460 bug.

Change-Id: I8934a87bfd836779b167df39c5d09d97ff78debf
Test-Parameters: trivial
Signed-off-by: Srikanth Ramamurthy <srramamu@microsoft.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55422
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoNew release 2.15.64 2.15.64 v2_15_64
Oleg Drokin [Tue, 25 Jun 2024 03:34:26 +0000 (23:34 -0400)]
New release 2.15.64

Change-Id: I0d760b4b58bd24b72e781c52465c07417725cffe
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9119 lnet: whitespace cleanup for wirecheck.c 46/55446/5
Olaf Weber [Mon, 17 Jun 2024 04:14:17 +0000 (00:14 -0400)]
LU-9119 lnet: whitespace cleanup for wirecheck.c

Clean up the whitespace use in lnet/utils/wirecheck.c

Test-Parameters: trivial
Signed-off-by: Olaf Weber <olaf@sgi.com>
Signed-off-by: Olaf Weber <olaf.weber@hpe.com>
Change-Id: I5c90d09fd694c8151f6f11f716c491ac3db79eb0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55446
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17856 tests: ignore sanity stripe-count off-by-1 27/55427/3
Frederick Dilger [Fri, 14 Jun 2024 03:29:10 +0000 (23:29 -0400)]
LU-17856 tests: ignore sanity stripe-count off-by-1

In some cases the MDS may not create all stripes on a file, if the
MDT-OST connection does not have precreated objects. This is OK,
so the tests should not fail the stripe-count check if trying to
create a fully-striped file and one of the stripes is missing.

parse_layout_param was modified to change the output value of
stripe count to be $OSTCOUNT if the stripe_count=$OSTCOUNT - 1.

Even if the stripe_count was meant to be $OSTCOUNT - 1 this
shouldn't fail any tests as both tested values will be modified.

Test-Parameters: trivial testlist=sanity env=ONLY=56xd-56xe,ONLY_REPEAT=100
Test-Parameters: trivial testlist=sanity env=ONLY=65n,ONLY_REPEAT=100
Test-Parameters: trivial testlist=sanity env=ONLY=184d,ONLY_REPEAT=100
Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: Ie908a07d21b75e3ba60b7e6ca326675684ee2037
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55427
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17948 pcc: fix llapi_pcc_del() -Werror=enum-int-mismatch 17/55417/3
Jian Yu [Thu, 13 Jun 2024 00:07:07 +0000 (17:07 -0700)]
LU-17948 pcc: fix llapi_pcc_del() -Werror=enum-int-mismatch

gcc 13 does not allow mixing of enum and integer
types between function declaration and implementation.

This patch fixes the following build failures:
liblustreapi_pcc.c:755:5: error:
conflicting types for 'llapi_pcc_del' due to enum/integer mismatch;
have 'int(const char *, const char *, enum lu_pcc_cleanup_flags)'
[-Werror=enum-int-mismatch]
  755 | int llapi_pcc_del(const char *mntpath, const char *pccpath,
      |     ^~~~~~~~~~~~~

liblustreapi_pcc.c:790:5: error:
conflicting types for 'llapi_pcc_clear' due to enum/integer mismatch;
have 'int(const char *, enum lu_pcc_cleanup_flags)' [-Werror=enum-int-mismatch]
  790 | int llapi_pcc_clear(const char *mntpath, enum lu_pcc_cleanup_flags flags)
      |     ^~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-pcc

Change-Id: I2900b59a609410c6faab78d24f6176bc5c268e98
Fixes: 0d7d9ae ("LU-17657 build: gcc 13 stricter enum checking")
Fixes: c74878c ("LU-12373 pcc: uncache the pcc copies when remove a PCC backend")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55417
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17948 llite: replace i_mtime.tv_sec with inode_get_mtime_sec() 16/55416/2
Jian Yu [Wed, 12 Jun 2024 23:32:53 +0000 (16:32 -0700)]
LU-17948 llite: replace i_mtime.tv_sec with inode_get_mtime_sec()

This patch replaces i_mtime.tv_sec with inode_get_mtime_sec() to
fix the following build failure:

lustre/llite/pcc.c:1691:32: error:
'struct inode' has no member named 'i_mtime'; did you mean '__i_mtime'?
 1691 |         item.pm_mtime = inode->i_mtime.tv_sec;
      |                                ^~~~~~~
      |                                __i_mtime

Test-Parameters: trivial testlist=sanity-pcc

Change-Id: Iaed264c32be3d48039c5350ebd306f4fc3ef5eb9
Fixes: 3835f4d ("LU-13881 pcc: comparator support for PCC rules")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55416
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17947 build: fix LASSERTF [-Werror=format=] failure 15/55415/3
Jian Yu [Thu, 13 Jun 2024 11:44:46 +0000 (07:44 -0400)]
LU-17947 build: fix LASSERTF [-Werror=format=] failure

This patch fixes the following build failures:

libcfs/include/libcfs/libcfs_private.h:89:34:
error: format '%o' expects argument of type 'unsigned int',
but argument 4 has type 'long unsigned int' [-Werror=format=]
   89 | "ASSERTION( %s ) failed: " fmt, #cond, \
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~
lustre/ptlrpc/wiretest.c:2718:9: note: in expansion of macro 'LASSERTF'
 2718 | LASSERTF(MDS_FMODE_CLOSED == 000000000000UL, "found 0%.11oUL\n",
      | ^~~~~~~~

Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I97a895e6234721c34f681d0ee7ce91ead4dd30f8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55415
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17945 lnet: fix nla_extract_val() -Werror=missing-prototypes 13/55413/2
Jian Yu [Wed, 12 Jun 2024 21:38:10 +0000 (14:38 -0700)]
LU-17945 lnet: fix nla_extract_val() -Werror=missing-prototypes

This patch explicitly defines nla_extract_val() as a static function
to fix the following build failure:

lnet/lnet/api-ni.c:2888:1: error:
no previous prototype for 'nla_extract_val' [-Werror=missing-prototypes]
 2888 | nla_extract_val(struct nlattr **attr, int *rem,
      | ^~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-lnet

Change-Id: Ieb11d25ea8fcd19b715e2decf958cfd9d920bcc8
Fixes: 629d80d ("LU-10003 lnet: migrate fail nid to Netlink")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55413
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17940 gss: get rid of root key sooner 06/55406/5
Sebastien Buisson [Thu, 13 Jun 2024 09:19:04 +0000 (11:19 +0200)]
LU-17940 gss: get rid of root key sooner

The root key associated with a GSS context (gck_key) is used to pass
information between kernel and userspace during GSS context
negotiation.
Once the GSS context for root is up-to-date, the key is never used
again, although it has a permanent validity. And when the context
expires, the key is directly revoked and replaced with a new one to
serve the negotiation of a new root context.
So to avoid issues with keys staying in the root's kernel keyring and
being accidentally revoked, just get rid of the key associated with a
root context as soon as the negotiation process has finished.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4be773723b9046ed451684bd141d5ef2bc584bfb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55406
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-11671 tests: re-enable sanity test_45 for aarch64 03/55403/6
Xinliang Liu [Wed, 12 Jun 2024 10:24:53 +0000 (10:24 +0000)]
LU-11671 tests: re-enable sanity test_45 for aarch64

This is fixed by patch https://review.whamcloud.com/54763
 ("LU-17733 tests: sanity test_45 fix dirty count").

Test-Parameters: trivial
Test-Parameters: testlist=sanity clientarch=aarch64 \
  clientdistro=el9.3 env=ONLY=45,ONLY_REPEAT=100

Change-Id: I4716a0bee2689ffb33db8a81f1f33be6562b929e
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55403
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-17000 utils: Initialize 'idgot' time before using 02/55402/2
Arshad Hussain [Wed, 12 Jun 2024 09:35:17 +0000 (05:35 -0400)]
LU-17000 utils: Initialize 'idgot' time before using

In case there is an error reading the contents of permission
file. gettimeofday() is correctly not called on 'idgot'.
However, this means that 'idgot' timeval is left uninitialized.
This patch Initialize 'idgot' timeval to 0 so that in case as
above the value is printed as zero and not garbage.

Test-Parameters: trivial
CoverityID: 397122 ("Uninitialized scalar variable")
Fixes: d5b26443 ("LU-16615 utils: add messages in l_getidentity")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie3d5dff1f02ede83690472e60cc14c12ec5d978a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17935 kfilnd: Cleanup debug logging 98/55398/2
Chris Horn [Fri, 10 May 2024 17:41:57 +0000 (11:41 -0600)]
LU-17935 kfilnd: Cleanup debug logging

Log messages that refer to a struct kfilnd_transaction should print
the pointer to the struct with "TN %p".

Assign kfilnd_transaction::msg_type in the kfilnd_process_rx_event
path so that debug messages show the correct message type.

HPE-bug-id: LUs-11325
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iabe3bf245b64f1eb66c85259072491c723fb6119
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55398
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17930 gss: node principal expectations 92/55392/4
Sebastien Buisson [Tue, 11 Jun 2024 10:40:26 +0000 (12:40 +0200)]
LU-17930 gss: node principal expectations

When a credentials cache exists for Kerberos, lgss_keyring looks into
it to find a valid entry. The cache's principal must match the
expected role for the GSS request being processed:
- LGSS_ROOT_CRED_MDT: expect "lustre_mds" principal;
- LGSS_ROOT_CRED_OST: expect "lustre_oss" principal;
- LGSS_ROOT_CRED_ROOT: expect "lustre_root" or "host" principal.
And there is the special case of the GSS request on the MGC, for which
by convention all 3 roles are applied at the same time.

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4c46b03bb012c5f56bd26efdfaa6dab5fc7de31a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55392
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17929 ptlrpc: ptlrpc_request_alloc_pack() always returns an error code 91/55391/6
Aurelien Degremont [Tue, 11 Jun 2024 08:33:11 +0000 (10:33 +0200)]
LU-17929 ptlrpc: ptlrpc_request_alloc_pack() always returns an error code

Current code was always considering that when this function
returns NULL it meant ENOMEM error, but this is not always
true, especially when using GSS by example, or when
reconnecting from an IDLE state.
Also, instead of having every caller converting NULL to
ENOMEM, do that directly in the function when
appropriate.

Make ptlrpc_request_alloc_pack() return -errno in case
of error instead of a NULL pointer.

Thanks to that change, error code will be propagated up
and will help error reporting and debugging.

Took the opportunity to simplify related error path
for 2 HSM functions.

Also changed param.status to a signed data, as it can
store -errno.

Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: Id2b873d5f0c5cb89db070f6db00269545e6c85e8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55391
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17919 tests: wait to resolve ENOSPC in sanity 398l 59/55359/4
Patrick Farrell [Fri, 7 Jun 2024 15:49:40 +0000 (11:49 -0400)]
LU-17919 tests: wait to resolve ENOSPC in sanity 398l

Test 398l does not wait to clear up the ENOSPC it induces,
so sometimes it causes 398m to fail with ENOSPC.

Wait for deletes to resolve this.

OCI-bug-id: LFS-288

Note on test-parameters - we can't 'REPEAT' a pair of
tests, it would run 398l over and over and then 398m,
which doesn't test what we need to test.  So instead we
just create 5 sessions like this.

Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: testlist=sanity envdefinitions=ONLY="398l 398m"
Test-Parameters: trivial
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I2fcc1069a0304bc6edfa576331b6255289b71b98
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55359
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-16491 utils: update getdirstripe yaml format 46/55346/10
Frederick Dilger [Thu, 6 Jun 2024 20:00:25 +0000 (16:00 -0400)]
LU-16491 utils: update getdirstripe yaml format

'lfs getdirstripe --yaml' now prints directory layout in yaml
format. getdirstripe now also prints the "self" FID whether the
directory is striped or not. "migrating" fields
(lmv_migrate_offset, lmv_migrate_hash) were not included because
of the additional code complexity required to add the two fields.
The migrating fields are stored in 'struct lmv_mds_md_v1' which
AFAIK isn't available though getdirstripe.

For 0 striped directories, lmv_objects: will now contain
information on the directory itself, this information
becomes redundant with -v, however it is useful when the
lmv_fid isn't being shown.

New YAML layout:

    lmv_fid:           0x280000404:0x5:0x0
    lmv_magic:         0xcd20cd0
    lmv_stripe_count:  4
    lmv_stripe_offset: 2
    lmv_hash_type:     crush
    lmv_objects:
          - l_mdt_idx: 2
            l_fid:     0x280000400:0x4:0x0
          - l_mdt_idx: 0
            l_fid:     0x200000402:0x4:0x0
          - l_mdt_idx: 1
            l_fid:     0x240000402:0x2:0x0
          - l_mdt_idx: 3
            l_fid:     0x2c0000403:0x2:0x0

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I03ddc24816484d11c8c70892831e9edc9da5455a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17855 lnet: Set peer NI down on lnet_notify 42/55342/3
Chris Horn [Fri, 15 Mar 2024 18:52:07 +0000 (12:52 -0600)]
LU-17855 lnet: Set peer NI down on lnet_notify

The LNet router peer health feature is intended to allow LNet routers
drop messages for peer NIs that it considers down/unreachable so that
resources can be freed to forward messages to peer NIs that are
up/reachable.

This feature was integrated with the LNet health feature under
LU-11300, and, as a result, routers only consider a peer NI
down/unreachable if two criteria are met:
1. The router hasn't received a message from the peer NI within the
LND's "peer_timeout" value (default 180 seconds).
2. The health value of the peer NI has been decremented or the cached
peer NI status is LNET_NI_STATUS_DOWN.

(1) is problematic because a lot of messages can be queued to a down
peer while we wait for the peer_timeout to expire. This can
introduce latency for messages being forwarded to other peers.

(2) is problematic because there are some cases where LNet health
will not be decremented (namely single-rail peers), and the cached
peer NI status can only be set to LNET_NI_STATUS_DOWN if the router
receives a discovery push from the peer. If the peer loses all
connectivity to the router then it is possible the router will never
consider it down.

To address the problems with (1) the requirement is dropped
completely.

To address the problems with (2), LNet routers will now decrement
health values of single-rail peers and lnet_notify() is modified to
set the peer NI status UP/DOWN according to the aliveness information
provided by the LND.

HPE-bug-id: LUS-12209
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I7823cc7ae73bcb0b6b52db8d4f84cff7b999d8c0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55342
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17910 build: iokit should also be named per lustre_name 28/55328/2
Shaun Tancheff [Sat, 13 Apr 2024 02:39:01 +0000 (10:39 +0800)]
LU-17910 build: iokit should also be named per lustre_name

Update lustre-iokit to follow the ${lustre_name} scheme

HPE-bug-id: LUS-12250
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I236aaaac1acaf86f08aa584c6a7d5d3a3d75ff49
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17909 dkms: enable dkms with external lustre_name 27/55327/2
Shaun Tancheff [Thu, 6 Jun 2024 02:17:15 +0000 (09:17 +0700)]
LU-17909 dkms: enable dkms with external lustre_name

Allow dkms package naming with --define 'lustre_name cray-lustre'

HPE-bug-id: LUS-12249
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I35ea6ed1017b691e5c0c105ff5c3f3a0028b2cbd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55327
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17907 enc: enc flag should not remove other flags 17/55317/6
Sebastien Buisson [Wed, 5 Jun 2024 12:35:02 +0000 (14:35 +0200)]
LU-17907 enc: enc flag should not remove other flags

When updating inode flags, the lli_flags must be taken into account
so that they do not get lost. So provide helper functions for callers
of ll_update_inode_flags(), as an overlay to ll_inode_to_ext_flags().
And on server side, the mdd layer must fetch the existing flags when
setting LUSTRE_ENCRYPT_FL attr flag.

Fixes: 40d91eafe2 ("LU-12275 sec: atomicity of encryption context getting/setting")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I150f2d87cef112beab81d1d2030133671d4b7361
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55317
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17844 debug: purge the final LCONSOLE_ERROR_MSG() 82/55282/3
Timothy Day [Sat, 1 Jun 2024 04:37:13 +0000 (04:37 +0000)]
LU-17844 debug: purge the final LCONSOLE_ERROR_MSG()

Replace the remaining LCONSOLE_ERROR_MSG() with LCONSOLE_ERROR().
Remove the LCONSOLE_ERROR_MSG() macro.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I289ced87e5f77dd3eebcf94ee978e23b79e55698
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55282
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17844 llite: remove all LCONSOLE_ERROR_MSG() 80/55280/3
Timothy Day [Sat, 1 Jun 2024 04:20:36 +0000 (04:20 +0000)]
LU-17844 llite: remove all LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I6409f22f9707428d89cbbe1f92e1b2e11ce6f10a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55280
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17844 mgs: remove all LCONSOLE_ERROR_MSG() 35/55135/3
Timothy Day [Fri, 17 May 2024 00:26:26 +0000 (00:26 +0000)]
LU-17844 mgs: remove all LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I933f7c7acbf85cddd1bdb95e9506cb0f37abdba5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55135
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 months agoLU-17852 gss: do not use expired reverse gss contexts 27/55127/9
Sebastien Buisson [Thu, 16 May 2024 09:58:24 +0000 (11:58 +0200)]
LU-17852 gss: do not use expired reverse gss contexts

On server side, a reverse context matches a gss context established
on client side. These reverse contexts have a expiration time, and are
replaced with fresh ones when they expire.
So get rid of expired reverse contexts when we find them in the
gsk_clist. And when we look up for a context, do not continue using
the current one if it is expired.

Add sanity-krb5 test_200 to check the expired reverse contexts.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I11f2d8ab298073f9d5bedff187b67f2ca289ae47
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55127
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-13577 tests: fix interop checking for intent_mkdir tests 61/55161/2
Li Dongyang [Tue, 21 May 2024 00:35:48 +0000 (10:35 +1000)]
LU-13577 tests: fix interop checking for intent_mkdir tests

Use the correct interop version for replay-single/137a|b|c
sanity/852, sanityn/116

Change-Id: I0b7fc03c542574dfb468a17719a6be81d738c5b3
Fixes: 668dfb53de ("LU-13577 wbc: reimplement mkdir() by using intent lock")
Test-Parameters: trivial
Test-Parameters: serverjob=lustre-master serverbuildno=4524 testlist=replay-single env=ONLY="137a 137b 137c"
Test-Parameters: serverjob=lustre-master serverbuildno=4524 testlist=sanity env=ONLY=852
Test-Parameters: serverjob=lustre-master serverbuildno=4524 testlist=sanityn env=ONLY=116
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55161
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17404 kernel: new kernel [RHEL 9.4 5.14.0-427.20.1.el9_4] 12/54712/8
Jian Yu [Tue, 11 Jun 2024 06:25:06 +0000 (23:25 -0700)]
LU-17404 kernel: new kernel [RHEL 9.4 5.14.0-427.20.1.el9_4]

This patch makes changes to support new RHEL 9.4 release
for Lustre client.

Test-Parameters: trivial \
  mdtcount=4 mdscount=2 clientdistro=el9.4 testlist=sanity
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-3

Change-Id: Ic292c01ad16dc06e8dee966c4a211896fea284c0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54712
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-11085 tests: add a test case for same range lock 65/54565/6
Yang Sheng [Mon, 25 Mar 2024 21:14:39 +0000 (05:14 +0800)]
LU-11085 tests: add a test case for same range lock

Add a new case in ldlm_extent to test performance for
same range work with interval tree.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ia3ffe90263a2a7f01a3a44f8801a32fc789b5abc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54565
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17276 tests: performance test for same range lock 64/54364/5
Yang Sheng [Tue, 12 Mar 2024 17:20:05 +0000 (01:20 +0800)]
LU-17276 tests: performance test for same range lock

Lustre used a optimaized interval tree to manage the
extents. It only keeps one entry in tree for all of same
range locks. But upstream links all lock in tree. This
test want to compare the difference of performance
between two cases.

Test-Parameters: trivial testlist=performance-sanity
Test-Parameters: trivial testlist=performance-sanity
Test-Parameters: trivial testlist=performance-sanity
Test-Parameters: trivial testlist=performance-sanity
Test-Parameters: trivial testlist=performance-sanity
Test-Parameters: trivial testlist=performance-sanity
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I653ad1fc2cdc1012312d11af77b1e2c133b7c34e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54364
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17480 o2iblnd: add a timeout for rdma_connect 86/53986/7
Etienne AUJAMES [Mon, 5 Feb 2024 14:12:20 +0000 (15:12 +0100)]
LU-17480 o2iblnd: add a timeout for rdma_connect

For a RoCE network, if a RDMA connection request is sent to an
unreachable node, the CM can take >4min to return
CM_EVENT_UNREACHABLE.
This hangs lustre_rmmod if a Lustre router is down.

This patch track connection requests and apply a timeout of
lnd_timeout/4 (with a minimum of 5s) to destroy the hanging
connection.

Also, the patch decrease the timeout for
rdma_resolve_addr()/rdma_resolve_route() to 5s (like most of
the upstream drivers: sunrpc, smb).

The default timeouts should be:

lnd_timeout = (transaction_timeout - 1) / (retry_count + 1)
lnd_timeout = (150 - 1) / 3 = 49s
lnd_connreq_timeout = max(5, lnd_timeout / 4) = 12s

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I09e40ffaa75424c4acca1d0cf986e1ff9c6dc96b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53986
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17478 clio: parallelize unaligned DIO write copy 44/53844/16
Patrick Farrell [Wed, 24 Apr 2024 22:26:04 +0000 (18:26 -0400)]
LU-17478 clio: parallelize unaligned DIO write copy

The data copying for unaligned/hybrid IO reads is already
parallel because it is done by the ptlrpc threads at the
end of the IO.  That for the writes is not - it is done
by the submitting thread during IO submission.

This has a huge performance impact, limiting writes to
around 3.0 GiB/s when reads are at 12 GiB/s.

With the iov iter issue fixed, we can do this copy as
part of IO submission.

With this and the patch to use page pools for buffer
allocation (https://review.whamcloud.com/53670), the
maximum performance of hybrid IO is basically the same as
DIO, at least for current speeds.

This means hybrid reads and writes at 20 GiB/s with
current master + this and the pool patch.

Note this requires a funny workaround: If a user thread
calls fsync while a DIO write is in progress, the user
thread can pick that write up at the RPC layer and become
responsible for writing it out, even though that write
isn't in cache.  (Because the write is waiting to be
picked up by a ptlrpcd thread.)

If that DIO write is unaligned, the user thread is unable
to do the memory copy.  It's not an option to have the
thread ignore a ready RPC, so instead we spawn a kthread
to handle this case.

This only occurs when DIO is racing with fsync, so
performance doesn't really matter, and this cost is OK.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Ic8209e1fda97cda83e5b87baba48d15dd4dcc15f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53844
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17000 utils: Fix negative argument passed 96/53796/10
Arshad Hussain [Mon, 22 Jan 2024 16:09:19 +0000 (21:39 +0530)]
LU-17000 utils: Fix negative argument passed

This patch fixes bunch of "Argument cannot be negative"
reported by Coverity.

CoverityID: 397713 ("Argument cannot be negative")
CoverityID: 397899 ("Argument cannot be negative")
Pass -rc to strerror() since rc is guaranteed to be negative.

CoverityID: 397769 ("Argument cannot be negative")
read() could return -1 on error. Check before using the value

CoverityID: 397780 ("Argument cannot be negative")
Use static allocation removing the need to call sysconf()

CoverityID: 403104 ("Argument cannot be negative")
On failure to open do not try to call close()

Fixes: 58d744e3 (LU-10092 pcc: Non-blocking PCC caching)
Fixes: f172b116 (LU-10092 llite: Add persistent cache on client)
Fixes: aed82919 (LU-16029 utils: add options to lr_reader to parse raw files)
Fixes: 86ba46c2 (LU-9680 obdclass: user netlink to collect devices information)
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Id51c6c184d30dce596e7ab948a6fd0768eed1503
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53796
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
2 months agoLU-17246 build: Refine SUSE 15 SP3 ldiskfs version 43/52943/3
Shaun Tancheff [Wed, 1 Nov 2023 09:52:49 +0000 (04:52 -0500)]
LU-17246 build: Refine SUSE 15 SP3 ldiskfs version

SUSE linux-5.3.18-150300.59.60_7.0.4.15 should select
5.3.18-sles15sp3.series however a recent change maps
linux-5.3.18-150300.59.* to 5.3.18-sles15sp3-59.series
which does not apply.

Lustre should also consider the patch level [Ex: 60]
and map 59.60* to 5.3.18-sles15sp3.series while
61 and later should map to 5.3.18-sles15sp3-59.series

The sles15sp3*.series should also include
  base/ext4-delayed-iput.patch
  base/ext4-delayed-iput.patch

Test-Parameters: trivial
HPE-bug-id: LUS-11910
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I68d6ecfc94e9edf7c657e743fbdbdb58b39bf191
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52943
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17579 build: drop SUSE in-kernel ofed special handling 76/54176/5
Shaun Tancheff [Wed, 28 Feb 2024 16:00:26 +0000 (23:00 +0700)]
LU-17579 build: drop SUSE in-kernel ofed special handling

SUSE special case handling for mofed does not apply to in-kernel
ofed build.

Drop the complicated search and filter looking for Module.symvers

Test-Parameters: trivial
Fixes: 8b1d2a72f1 ("LU-16967 build: Add in-kernel-ko2iblnd driver")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I90c3de9020a26bdbc12c0ca859ada722b7c73c94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54176
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Caleb Carlson <caleb.carlson@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-16350 ldiskfs: Server support for LTS linux v6.6 19/52919/17
Shaun Tancheff [Mon, 10 Jun 2024 18:25:38 +0000 (11:25 -0700)]
LU-16350 ldiskfs: Server support for LTS linux v6.6

Migrate upai ext4 headers into staging for ldiskfs

Updated patch series for Linux LTS v6.6.10
   ext4-attach-jinode-in-writepages.patch
   ext4-dont-check-before-replay.patch
   ext4-mballoc-pa-free-mismatch.patch
   ext4-pdirop.patch
   ext4-prealloc.patch
   ext4-corrupted-inode-block-bitmaps-handling-patches.patch
   ext4-delayed-iput.patch
   ext4-encdata.patch
   ext4-ialloc-uid-gid-and-pass-owner-down.patch
   ext4-mballoc-extra-checks.patch
Dropped:
   ext4-add-periodic-superblock-update.patch

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2a0a5d4be1e724ed1936178ccc3f7a7e7a2672c7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52919
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17085 llite: safely duplicate iov_iter 66/52266/25
Shaun Tancheff [Tue, 4 Jun 2024 23:18:34 +0000 (06:18 +0700)]
LU-17085 llite: safely duplicate iov_iter

When copying an iov_iter the underlying iovec/bvec also
needs to be duplicated.
Discard and xarray iters are not relevant here however the
memory allocated needs to be tracked and freed.

This borrows heavily from dup_iter()

The iter argument to ll_dio_user_copy() is no longer used
and can be removed.

Explicitly handle ubuf() cases to work with el9.4 back-port
of ITER_UBUF series.

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I9f68eaa14abc8915d543dba91eea598edbd9872d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-14712 ldiskfs: introduce EXT4_BG_TRIMMED to optimize fstrim 23/51923/9
Li Dongyang [Fri, 11 Aug 2023 05:32:31 +0000 (15:32 +1000)]
LU-14712 ldiskfs: introduce EXT4_BG_TRIMMED to optimize fstrim

Currently the flag indicating block group has done fstrim is not
persistent, and trim status will be lost after remount, as
a result fstrim can not skip the already trimmed groups, which
could be slow on very large devices.

This patch introduces a new block group flag EXT4_BG_TRIMMED,
we need 1 extra block group descriptor write after trimming each
block group.
When clearing the flag, the block group descriptor is journalled
already so no extra overhead.

Add a new super block flag EXT2_FLAGS_TRACK_TRIM, to indicate if
we should honour EXT4_BG_TRIMMED when doing fstrim.
The new super block flag can be turned on/off via tune2fs.

Change-Id: I7faaca4754b1726ad05d0aafe3e90e0e9f591617
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 utils: Fix style issues for routerstat.c 20/55420/2
Arshad Hussain [Wed, 12 Jun 2024 05:56:51 +0000 (01:56 -0400)]
LU-6142 utils: Fix style issues for routerstat.c

This patch fixes issues reported by checkpatch
for file lnet/utils/routerstat.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I679fcf0530b1ce25c00fa35c05a904d5b0a4ecb8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55420
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17941 ofd: do not copy over filter_fid structure 08/55408/4
Bobi Jam [Mon, 17 Jun 2024 17:20:33 +0000 (10:20 -0700)]
LU-17941 ofd: do not copy over filter_fid structure

When a bigger filter_fid has been writen on disk by newer server,
downgraded Lustre would read more data but we need store less to
fit smaller filter_fid structure.

Test-Parameters: serverdistro=el8.10
Fixes: 28c366cee6d ("LU-17218 ofd: improve filter_fid upgrade compatibility")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idb5c8fffe4af22f35b64aa93e7efce7f9dd206d6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55408
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17440 tests: Fix sanity-lnet/test_226 97/55397/4
Chris Horn [Tue, 11 Jun 2024 17:55:10 +0000 (11:55 -0600)]
LU-17440 tests: Fix sanity-lnet/test_226

test_226 is not deleting a route from the remote peer as intended.
This is because of two issues. The first issue is that the wrong
gateway NID is being provided to do_route_del(). The second issue is
that do_route_del() is not correctly quoting the remote command to
detect and delete routes. The correct NID is now specified and
the quoting in do_route_del() has been fixed. Some additional
output was also added so that we can verify via test output that it is
working correctly.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 2b210f3905 ("LU-17440 lnet: prevent errorneous decref for asym route")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie21a1bdb174cb65ae64134a727d69e1c95a4ddd5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55397
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17911 lustre: fix faked flexible arrays in getinfo_fid2path 54/55354/5
Bruno Faccini [Fri, 7 Jun 2024 09:22:44 +0000 (11:22 +0200)]
LU-17911 lustre: fix faked flexible arrays in getinfo_fid2path

faked (0-length) flexible arrays need some rework to comply with
new coding rules to stay Fortify feature compliant (see document
at https://people.kernel.org/kees/bounded-flexible-arrays-in-c).
With this particular getinfo_fid2path struct content, we ended-up
with generated code causing straight crash upon each call of
lmv_fid2path routine upon strlen() first call.

Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Change-Id: Id6f594779ca0ae86f0c2842535abccbf4df688d3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55354
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 contrib: update checkpatch.pl to 6.10-rc2 53/54153/4
Timothy Day [Sat, 24 Feb 2024 18:05:14 +0000 (18:05 +0000)]
LU-6142 contrib: update checkpatch.pl to 6.10-rc2

checkpatch.pl hasn't been updated in around 7 years. To make sure
we don't waste our efforts, update the script while porting a few
Lustre specific changes.

This updates checkpatch.pl to 6.10-rc2 while recording the changes
in contrib/.

This is mostly a simple port of the old changes to a new version of
checkpatch.pl. It is intended to silence a few false positives and
generate a few Lustre specific warnings.

One feature has been added: the ability to differentiate between
user space code and kernel code when doing spell checks.

I tested this by running the updated script across the entire tree,
ensuring that nothing failed:

contrib:         loc=192    errors/warnings=1
libcfs:          loc=25117  errors/warnings=1624
lnet:            loc=133290 errors/warnings=16381
lustre:          loc=645087 errors/warnings=81360

Also, add a userspace_spelling.txt file by copying spelling.txt. We
start by simply duplicating the old spelling.txt file while removing
a number of entries.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia39c7f04d2407786f158d7f6f95968e182e99ebf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 contrib: Lustre checkpatch.pl diff 54/54154/4
Timothy Day [Sat, 24 Feb 2024 02:14:53 +0000 (02:14 +0000)]
LU-6142 contrib: Lustre checkpatch.pl diff

Record the changes made to vanilla checkpatch.pl.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I095d435c66425476be34483661137a1359f77911
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54154
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10391 lnet: Fault injection add/del ioctls to netlink 32/53732/8
Chris Horn [Sun, 16 Jun 2024 13:28:20 +0000 (09:28 -0400)]
LU-10391 lnet: Fault injection add/del ioctls to netlink

Convert the fault injection add/del ioctls to a netlink
implementaiton.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I20f38d4e7c0215a1b19772c6253c617174c0b00c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53732
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-9680 lnet: Fault injection list ioctls to netlink 33/53733/11
Chris Horn [Wed, 12 Jun 2024 14:12:34 +0000 (10:12 -0400)]
LU-9680 lnet: Fault injection list ioctls to netlink

Convert the fault injection list ioctls to a netlink implementation.

sanity-lnet tests that use fault injection can now be enabled for
large NIDs.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ieaf9c01401fc0841c1e5805667531ba3455e8110
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53733
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17525 llite: unaligned DIO interop page alignment 97/53997/38
Shaun Tancheff [Fri, 31 May 2024 15:58:30 +0000 (09:58 -0600)]
LU-17525 llite: unaligned DIO interop page alignment

Correctly size brw/ptlrpc bulk ops I/O between archs with differing
page sizes (ex: 64k client and 4k server).

Since the number of MDs need for a bulk are not stable until all the
pages are added we have two parts to interop calculation.

In the fist step if this is an unaligned dio bulk with a 64k offset
greater than 4k in size calculate the initial MD utilization
as if first partial page is 64k in length.

In the second step after bulk is sized and split across 1 or more
MDs, if the number of MDs:
 - if the number of MDs is 1, clear the interop flag
 - if the number of MDs is 3 or more, keep the interop flag
 - if the number of MDs is 2 and the size with the 64k offset does
   not exceed the LNET_MUT then collapse the extra MD.
This is done by assuming the first page is 64k length.

Additionally fixup OBD_CONNECT2_UNALIGNED_DIO and add a ZFS osd
heuristic check. No unaligned DIO should be performed with older
zfs-osd, however unaligned dio with unpatched ldiskfs servers is
allowed for most cases. Allow I/O that will trigger the page
size interop issue will get fail with -EINVAL, were previously
these i/o would fail sending with an error:

LNetError: 7386:0:(lib-ptl.c:189:lnet_try_match_md()) Matching packet
   from 12345-10.240.22.81@tcp, match 1789613636069888 length 1044481
   too big: 983041 left, 983041 allowed

triggering the MD to be resent, however the size calcuations reamin
unchanged resulting in a hang.

Test-Parameters: testlist=sanity clientarch=aarch64 clientdistro=el9.3
Test-Parameters: testlist=sanity clientarch=ppc64le clientdistro=el8.8 env=SANITY_EXCEPT="398c 411b"
Test-Parameters: testlist=sanity serverversion=2.15.4 serverdistro=el8.9 env=SANITY_EXCEPT="17n 24g 27R 56oc 56wc 162c 230b 230c 230x 230t 273c 300i"
Test-Parameters: testlist=sanity clientarch=aarch64 clientdistro=el9.3 serverversion=2.15.4 serverdistro=el8.9 env=SANITY_EXCEPT="27R 56oc 56wc 160a 160l 162c 230c 230m 230t 230x 273c 300i"
Test-Parameters: testlist=sanity clientarch=aarch64 clientdistro=el9.3 envdefinitions=ONLY="119e 119f 119g 119h 119i"
Test-Parameters: testlist=sanity clientarch=aarch64 clientdistro=el9.3 envdefinitions=ONLY="119m 119m 119n 119o 119p 119q"
Test-Parameters: testlist=sanity clientarch=x86_64 clientdistro=el9.3 envdefinitions=ONLY="119e 119f 119g 119h 119i"
Test-Parameters: testlist=sanity clientarch=x86_64 clientdistro=el9.3 envdefinitions=ONLY="119m 119m 119n 119o 119p 119q"
Fixes: 7194eb6431 ("LU-13805 clio: bounce buffer for unaligned DIO")
Fixes: 0e6e60b123 ("LU-13805 llite: Implement unaligned DIO connect flag")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ifb5152b7ebaba696e6f2cef3af43b0ecd5e53d94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53997
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 months agoLU-16692 tests: force_new_seq_all interop version checking 40/54840/4
Li Dongyang [Thu, 18 Apr 2024 11:10:39 +0000 (21:10 +1000)]
LU-16692 tests: force_new_seq_all interop version checking

force_new_seq_all is still needed in those test suites
if testing against servers don't have v2_15_61-226-gf00d2467fc

Test-Parameters:trivial serverjob=lustre-master serverbuildno=4516 testlist=replay-single,replay-ost-single,replay-dual,recovery-small,replay-vbr,sanity-pfl

Change-Id: Iab963ac10308b56a60508774c1a63bcdfffdba85
Fixes: 9ef186b71b ("LU-16692 tests: remove force_new_seq from some test suites")
Fixes: f00d2467fc ("LU-16692 osp: do not assert on seq got over network")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54840
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17888 osd-ldiskfs: osd_scrub_refresh_mapping deadlock 67/55267/4
Alexander Zarochentsev [Thu, 30 May 2024 16:23:25 +0000 (16:23 +0000)]
LU-17888 osd-ldiskfs: osd_scrub_refresh_mapping deadlock

After copying a lustre special file (last_rcvd for example)
to a new inode, lustre mount hangs with the following stack trace:

[root@testbed ~]# cat /proc/pidof mount.lustre/stack
[<0>] rwsem_down_write_slowpath+0x32a/0x610
[<0>] osd_obj_update_entry.isra.22+0xb7/0x900 [osd_ldiskfs]
[<0>] osd_obj_spec_update+0x146/0x160 [osd_ldiskfs]
[<0>] osd_scrub_refresh_mapping+0x282/0x420 [osd_ldiskfs]
[<0>] osd_ios_scan_one+0x5df/0xe10 [osd_ldiskfs]
[<0>] osd_ios_root_fill+0x267/0x300 [osd_ldiskfs]
[<0>] call_filldir+0xb0/0x120 [ldiskfs]
[<0>] ldiskfs_readdir+0x7a7/0xac0 [ldiskfs]
[<0>] iterate_dir+0x13c/0x190
[<0>] osd_ios_general_scan+0x10e/0x250 [osd_ldiskfs]
[<0>] osd_initial_OI_scrub+0x72/0x920 [osd_ldiskfs]
[<0>] osd_scrub_setup+0x8ab/0x9e0 [osd_ldiskfs]
[<0>] osd_device_init0+0x447/0x810 [osd_ldiskfs]
[<0>] osd_device_alloc+0x186/0x220 [osd_ldiskfs]
[<0>] obd_setup+0x115/0x2d0 [obdclass]
[<0>] class_setup+0x57f/0x790 [obdclass]
[<0>] class_process_config+0x1104/0x2460 [obdclass]
[<0>] do_lcfg+0x21d/0x530 [obdclass]
[<0>] lustre_start_simple+0x77/0x1d0 [obdclass]
[<0>] osd_start+0x408/0x7f0 [obdclass]
[<0>] server_fill_super+0x382/0x10d0 [obdclass]
[<0>] lustre_fill_super+0x3a1/0x3f0 [lustre]
[<0>] mount_nodev+0x48/0xa0
[<0>] legacy_get_tree+0x27/0x40
[<0>] vfs_get_tree+0x25/0xb0
[<0>] do_mount+0x2e2/0x950
[<0>] ksys_mount+0xb6/0xd0
[<0>] __x64_sys_mount+0x21/0x30
[<0>] do_syscall_64+0x5b/0x1a0
[<0>] entry_SYSCALL_64_after_hwframe+0x65/0xca

root inode lock is attempted to be taken twice,
once in iterate_dir() and another attempt in
osd_obj_update_entry().

HPE-bug-id: LUS-12368
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Idc5f9bd2a20d25dfb5eb4a044ddd00ff7eb4558b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55267
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17914 lnet: Fix erroneous net set error 44/55344/2
Chris Horn [Thu, 6 Jun 2024 21:43:09 +0000 (15:43 -0600)]
LU-17914 lnet: Fix erroneous net set error

lnetctl net set --health command reports a false error. This is
because in lnet_genl_parse_local_ni() the last value stored in rc is
from nla_strscpy() which can return a positive value.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: fff650726b ("LU-13642 lnet: Allow dynamic IP specification")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I86211a1083af4b225076f966d2e0c7793589a33a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17913 iokit: Fix lst.sh host mode 43/55343/2
Chris Horn [Thu, 6 Jun 2024 20:55:08 +0000 (14:55 -0600)]
LU-17913 iokit: Fix lst.sh host mode

The ssh command used to get primary NIDs of each test node uses a
single quote. This prevents the LCTL variable from being expanded
so the command fails. Use double quotes around this command and
make sure we return an error (non-zero) status when we are unable
to determine primary NIDs of test nodes.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: c265e1c7b0 ("LU-16861 obdfilter: Exclude quotes when getting NIDs")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic17d8a22baf93e205c6d7e12a0079f93222013e3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55343
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17894 lnet: Initialize common lnd params 41/55341/4
Chris Horn [Thu, 6 Jun 2024 19:35:29 +0000 (13:35 -0600)]
LU-17894 lnet: Initialize common lnd params

This ensures the correct default values are used when the user does
not specify a value for them.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 7e01787863 ("LU-12452 lnet: allow to set IP ToS value per-NI")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I201e9acbc3be8e27cd8957b722c8cea9a64de6c1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55341
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17895 lnet: Validate input for lnetctl import 40/55340/3
Chris Horn [Thu, 6 Jun 2024 16:55:39 +0000 (10:55 -0600)]
LU-17895 lnet: Validate input for lnetctl import

Add validation of the input being parsed by lnetctl import when it is
expecting the global parameters. This resolves segmentation faults
when it encounters bad or missing keys and values.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I232daed773283ea89e9815bf6bd79526dfd5fd4f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-15644 llog: don't report warning in no error case 35/55335/2
Mikhail Pershin [Thu, 6 Jun 2024 11:12:00 +0000 (14:12 +0300)]
LU-15644 llog: don't report warning in no error case

Fix wrong check which includes rc == 0 valid case wronly

Fixes: bd9839f7db (LU-15644 llog: don't replace llog error with -ENOTDIR)
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id6e7b2cd42b4769765c67d418552a13f048ea050
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55335
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17000 utils: Call yaml_parser_scan before using 'token' 30/55330/4
Arshad Hussain [Thu, 6 Jun 2024 04:38:30 +0000 (00:38 -0400)]
LU-17000 utils: Call yaml_parser_scan before using 'token'

Function 'yaml_parser_scan' is used to initilize and get
the next token. Call yaml_parser_scan() before using token.

Test-Parameters: trivial testlist=sanity-lnet
CoverityID: 403112 ("Uninitialized scalar variable")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ieeb3d50a32690dc892a831fb60a9e85e4ec05113
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55330
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 gss: update init_channel initialization 22/55322/4
Sebastien Buisson [Wed, 5 Jun 2024 13:50:41 +0000 (15:50 +0200)]
LU-17000 gss: update init_channel initialization

Update 'init_channel' to match newer coding guidelines.

DDN-Bug-Id: EX-9705
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6539ade1a9d815664f6659a5c1ee25e7f1f7df0e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55322
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
2 months agoLU-15565 utils: updated lfs getstripe yaml format 11/55311/6
Frederick Dilger [Fri, 31 May 2024 04:28:27 +0000 (00:28 -0400)]
LU-15565 utils: updated lfs getstripe yaml format

For composite files 'lfs getstripe --yaml' used a key for each
component rather than an array. This hindered the use of the
output with common YAML tooling. The leading spaces meant users
needed to pre-process the output for seemingly no reason. Each
component also had the id harcoded into the name rather than
using a more flexible list ('components:').

'lfs setstripe --yaml YAML_FILE file|dir' is still compatible
with the previous YAML format incase there were stored files
being used.

The new YAML formatting has been check with results from
"https://zhwt.github.io/yaml-to-go/".

Modified the argument parsing in 'lfs_migrate' to be more
robust as it was failing tests 56[w[a-c],x[b,c]].

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I50ff4ebd9413fb66f05647c11542f7ce9f1ba879
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17897 lfsck: don't assert on orphan existence 02/55302/2
Lai Siyao [Fri, 17 May 2024 09:40:23 +0000 (05:40 -0400)]
LU-17897 lfsck: don't assert on orphan existence

lfsck_namespace_create_orphan_dir() is called in several cases,
and orphan may exist in some cases, change assertion to check.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I28563aa60d0f345616fd30cd0899495e7c1ef8f0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17000 lnet: Initilize lnet_nidlist local variable 90/55290/2
Arshad Hussain [Mon, 3 Jun 2024 07:37:33 +0000 (03:37 -0400)]
LU-17000 lnet: Initilize lnet_nidlist local variable

Initialize vairbale lnet_nidlist under
lustre_lnet_modify_peer(). This is due to a case
where "nids" is false lustre_lnet_mod_peer_nidlist()
is called with lnet_nidlist uninitialized.

Test-Parameters: trivial testlist=sanity-lnet
CoverityID 397609: ("Uninitialized scalar variable")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I07de6a637c4567daed4ce7b35c54a1eed4d9bd2f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55290
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17892 lnet: Fix export with empty nets 79/55279/3
Chris Horn [Fri, 24 May 2024 17:32:17 +0000 (11:32 -0600)]
LU-17892 lnet: Fix export with empty nets

lnetctl export --backup should not print an error when there haven't
been any NIs added.

Test-Parameters: trivial
Fixes: 8f8f6e2f36 ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Id9e916d2d70d5dc01442e24449cb787c5a6a7e1d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55279
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17416 utils: option for lctl get_param to skip links 36/55236/12
Frederick Dilger [Mon, 27 May 2024 21:38:51 +0000 (17:38 -0400)]
LU-17416 utils: option for lctl get_param to skip links

Added new --links and --no-links options for 'lctl get_param' and
'lctl list_param' to avoid following symlinks. Useful when combined
with a command like "lctl get_param -R '*'" which can dump a lot of
duplicate data due to symlinks under lov.*.target_obds and
lmv.*.target_obds pointing back to their respective osc.* and mdc.*
trees. By default --links is enabled to for lctl get_param to
continue to operate as it did before this patch.

Additionally, long options have been added for all previous options
in {list, get, set}_param to update command options to current
standards. This will also facilitate adding new options in the future
as well as code maintenance and readability.

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I24115835f5045623f78fa2045dc3e0ce0b795316
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-12706 tests: sanity-quota 4a sync timeout fix 16/55216/3
Sergey Cheremencev [Mon, 27 May 2024 22:49:24 +0000 (01:49 +0300)]
LU-12706 tests: sanity-quota 4a sync timeout fix

Don't sync all OSTs in a system - this might take
too much time. Instead, set striping only on OST0000
and sync only MDTs and OST0000. This fix is against
the following failure:

  FAIL: Passed grace time 20, 15669105271566910563

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I525e6c73c6d14a126a2bde7d92bc28f11f3c78c8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55216
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17870 lu: delete lu_ref forever 82/55182/5
Timothy Day [Thu, 23 May 2024 03:28:51 +0000 (03:28 +0000)]
LU-17870 lu: delete lu_ref forever

Remove lu_ref infrastructure forever. This debugging infrastructure
is often broken and doesn't coorespond with the actual reference
counting used to manage object lifetimes. Hence, when a real bug
is encountered (i.e. some thread isn't releasing a reference),
this code (assuming it happens to be working) can't actually help
debug the issue.

Recently, I was debugging an issue with ld_ref counting. Naturally,
I turned to the debugging code available already in Lustre. I was
dismayed to find that it was more broken than the code I was already
debugging. Rather than debug the debugging code, I think it's
better to cast it away.

Most compelling, the builds used by Maloo and Gerrit Janitor don't
enable this feature. So it can be broken for long periods of time
without anyone noticing.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I8eaa6d8518f642adebb612ec3fa780b584366f4f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55182
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17872 ldlm: switch to read_positive in reclaim_full 41/55141/7
Patrick Farrell [Wed, 29 May 2024 20:41:54 +0000 (16:41 -0400)]
LU-17872 ldlm: switch to read_positive in reclaim_full

Checking reclaim full for every lock request is expensive;
it requires taking a global spinlock and can completely
clog the MDS CPU on larger systems.

If we switch to read_positive rather than sum_positive for
our counter read, we avoid this spinlock at the cost of
being off by as much as NR_CPU*32.

Since the counter is for hundreds of thousands to millions
of items and just triggers memory reclaim, this level of
error is completely fine.

This resolves the contention issue, on an OCI system with
384 cores, here's our mdtest comparison:

Operation           | Without Patch | With Patch  | %Change
---------------------|---------------|-------------|-------
Directory creation  | 69481.994     | 64373.060   | -7%
Directory stat      | 87942.757     | 274670.454  | 212%
Directory rename    | 78127.922     | 92592.239   | 19%
Directory removal   | 69901.490     | 89560.415   | 28%
File creation       | 62789.774     | 107294.450  | 71%
File stat           | 88039.061     | 480469.711  | 446%
File read           | 82192.370     | 151117.380  | 84%
File removal        | 146690.828    | 127589.655  | -13%
Tree creation       | 46.549        | 56.992      | 22%
Tree removal        | 51.531        | 53.967      | 5%

Note the *446%* improvement in stat and the 70-80% in
file creation and read.

Note this issue is likely much worse on systems with higher
core counts since the cost of summing the counter scales
with the number of CPUs.  This may be why this has not been
seen before.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I01a39abf5e6f0829156b413b1f44001e2c504be2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55141
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: wangdi <di.d.wang@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17844 ptlrpc: remove all LCONSOLE_ERROR_MSG() 36/55136/3
Timothy Day [Fri, 17 May 2024 00:30:57 +0000 (00:30 +0000)]
LU-17844 ptlrpc: remove all LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ica8ebd06b7e8ea8c7eb00181ab3c0b06de2481ca
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55136
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17844 obdclass: remove all LCONSOLE_ERROR_MSG() 34/55134/3
Timothy Day [Fri, 17 May 2024 00:15:25 +0000 (00:15 +0000)]
LU-17844 obdclass: remove all LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ib5310f65cda7a3537837e8a38801e6d1771d4759
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55134
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-10026 csdc: reserve OBD_BRW_SPECULATIVE_COMPR flag 04/55104/5
Artem Blagodarenko [Tue, 14 May 2024 12:25:19 +0000 (08:25 -0400)]
LU-10026 csdc: reserve OBD_BRW_SPECULATIVE_COMPR flag

DIO does not set KMS like buffered IO, and the KMS it sets
is not safe.  So this requires special handling for last
chunk compression.

Since we can't know when we're doing the last chunk with
DIO, the solution is as follows:
If a DIO write is chunk aligned at the start but not a full
chunk, we compress it but mark it 'speculative'.  Then the
server double checks that the write is beyond current file
size, and if it's not, it will ask the client to do a
resend, and the client will send the data back
uncompressed.

This makes it reasonable to fully enable DIO to compressed
files - previously we converted unaligned DIO to buffered
IO.

This patch reserves OBD_BRW_SPECULATIVE_COMPR flag.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I679bc103bd2862115d94286e7c2ed43e1580b29e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
2 months agoLU-17832 gni: build should not collapse extra symbols 52/55052/2
Shaun Tancheff [Wed, 8 May 2024 15:38:43 +0000 (22:38 +0700)]
LU-17832 gni: build should not collapse extra symbols

cray-obs spec files (ari,gem,dmp) define:
   KBUILD_EXTRA_SYMBOLS and GNICPPFLAGS

When building kgnilnd the environment variable needs to be passed
through to make.

HPE-bug-id: LUS-12269
Test-Parameters: trivial
Fixes: 8b1d2a72f1 ("LU-16967 build: Add in-kernel-ko2iblnd driver")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Icc7ac33138300bf3836082a014daf580a1632436
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55052
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17708 lnet: update kfi and o2ib to handle NULL lnet_msg 77/54677/7
Shaun Tancheff [Sun, 9 Jun 2024 23:59:43 +0000 (06:59 +0700)]
LU-17708 lnet: update kfi and o2ib to handle NULL lnet_msg

Handle the handle NULL lnet_msg cases in the lnd_recf() handlers
of kfi and o2ib lnds.

HPE-bug-id: LUS-12245
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia0a8957653353380ef77c9686a020284db0e460c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54677
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17573 lov: change default object size. 37/54137/3
Alexey Lyashkov [Thu, 22 Feb 2024 06:38:03 +0000 (09:38 +0300)]
LU-17573 lov: change default object size.

OST don't able to use indirects for long time,
let's switch a object size to extent based.

Test-Parameters: trivial
HPe-bug-id: LUS-11428
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I9759fc7122c41075ebc35d52ade342c37706b041
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54137
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17460 lnet: support IPv6 for link state 61/53761/11
James Simmons [Tue, 23 Apr 2024 13:30:04 +0000 (09:30 -0400)]
LU-17460 lnet: support IPv6 for link state

The LNet layer montiors the state of the underlying TCP
connection. Currently it only supports network interfaces
setup with IPv4 addresses. Update to handle IPv6 setups.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I249e9591d5f637112f6bd862cd0f928a555af229
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53761
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-15553 test: mkdir_on_mdt0 in sanity-krb5 53/51653/8
Lai Siyao [Sat, 8 Jul 2023 22:32:29 +0000 (18:32 -0400)]
LU-15553 test: mkdir_on_mdt0 in sanity-krb5

test_8 in requires test dir created on MDT0, replace mkdir
with mkdir_on_mdt0. It's found by script:
grep -C 10 -n "do_facet.*SINGLEMDS" lustre/tests/*.sh | grep -w mkdir

Fixes: b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity-krb5,sanity-krb5,sanity-krb5
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I09b1aec95bff84622accea91650887dffc1245f3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51653
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-11085 lustre: remove interval-tree code 66/49166/11
Mr NeilBrown [Wed, 5 Jun 2024 19:59:31 +0000 (15:59 -0400)]
LU-11085 lustre: remove interval-tree code

The lustre interval tree is no longer used.  All users have been
changed to use the Linux interval tree implementation.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7aaa79ebb5e672657dd96c79bd8f85cdf3ce5438
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49166
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17276 ldlm: use interval tree for searching in flock 51/53951/14
Mr NeilBrown [Fri, 26 Apr 2024 14:40:20 +0000 (10:40 -0400)]
LU-17276 ldlm: use interval tree for searching in flock

This patch converts ldlm_process_flock_lock() to use the new interval
tree to find flock locks more efficiently.

Previously all locks the the same owner were adjacent in the
lr_granted list.  This was used for the second stage of merging
overlapping locks once it was confirmed that there were no conflicts.
Now instead we build up a temporary list of locks in the target range
that have the same owner, and use that for the second stage.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I0a4f1e833d8db36827c318a020de564a78b0adb5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53951
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17276 ldlm: convert flock locks to linux interval tree. 50/53950/17
Mr NeilBrown [Wed, 7 Feb 2024 05:21:48 +0000 (16:21 +1100)]
LU-17276 ldlm: convert flock locks to linux interval tree.

Convert to using the linux interval tree code.  When the range of a
lock is changed as part of adding or removing an overlapping range,
the lock is removed and readded to the tree.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I747b625af1e83210b12daac5102600a3de173a2a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53950
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 months agoLU-6142 utils: Fix style issues for lst.c 89/55289/6
Arshad Hussain [Mon, 3 Jun 2024 05:59:53 +0000 (01:59 -0400)]
LU-6142 utils: Fix style issues for lst.c

This patch fixes issues reported by checkpatch
for file lnet/utils/lst.c

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9a10254fe3da725fcc88f656f944cfc2597ed8cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55289
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 o2ib: SPDX for Infiniband driver 87/55287/2
Timothy Day [Sun, 2 Jun 2024 23:15:53 +0000 (23:15 +0000)]
LU-6142 o2ib: SPDX for Infiniband driver

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I3205f7e0e2e4bbb8609320e32f0975f82882f5dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55287
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-6142 mgs: SPDX for management server 86/55286/3
Timothy Day [Sun, 2 Jun 2024 23:02:07 +0000 (23:02 +0000)]
LU-6142 mgs: SPDX for management server

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I41c91276789bbadf9967ee18033620431d7561a0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 months agoLU-17070 lov: retry layout refresh if got old layouts 61/55061/6
Bobi Jam [Thu, 9 May 2024 09:23:37 +0000 (17:23 +0800)]
LU-17070 lov: retry layout refresh if got old layouts

lov_layout_change() would not apply old layouts which can get through
when MDS doesn't take layout lock, this patch would retry getting
the layout and re-apply the layout again for once.

Fixes: 13557aa869 ("LU-15300 mdt: refresh LOVEA with LL granted")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Id29ec4ada85060a20f730f92a6a9409d755a56a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55061
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 months agoLU-17809 osp: make disconnect asynchronous 95/54995/7
Alexander Boyko [Sat, 20 Apr 2024 22:02:54 +0000 (18:02 -0400)]
LU-17809 osp: make disconnect asynchronous

MDT could have many osp devices. During umount there is a problem
of casscading timeouts of disconnect request. It could lead to
unpredictable large umount time.

This patch adds ability of parallel disconnect for OSP devices.
During LCFG_PRECLEANUP osp_disconnect() sends disconnects requests.
And osp_shutdown() waits it. So casscading timeouts were changed
to a single request wait.

Don't drop obd_force flag from upper layers.

Adds replay-single test 201, it simulates delays of OSP disconnects.
This leads to a high cumulative umount time.

HPE-bug-id: LUS-12251
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Id788b22c494147bdc7f0d36968629e7b7f660e01
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54995
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 months agoLU-9680 lnet: Convert lnetctl debug recovery to netlink 34/53734/12
Chris Horn [Wed, 5 Jun 2024 13:07:40 +0000 (09:07 -0400)]
LU-9680 lnet: Convert lnetctl debug recovery to netlink

Convert the lnetctl debug recovery command to netlink

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic44cd93708b2e753e99901ba10334be17250a23c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
2 months agoLU-12998 lod: statfs upon nocreate check 37/53437/3
Lai Siyao [Tue, 12 Dec 2023 19:50:33 +0000 (14:50 -0500)]
LU-12998 lod: statfs upon nocreate check

lod_declare_create() checks whether directory create target MDT is
current MDT, this may happen if nocreate is set on some MDT. Upon
such mismatch, call dt_statfs() to fetch latest statfs to know
whether nocreate is set.

lmv_create() will choose another MDT if target MDT is set with
nocreate, but in case the flag is cleared, call obd_statfs() to fetch
cached statfs and check again.

Fixes: 1dbcd0bab88 (LU-12998 mds: add no_create parameter to stop creates)
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2575d15416968554c66d40dcf18ecca2a06c7a37
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53437
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17402 kernel: update dotdot patch path for RHEL 8.10 81/55381/2
Jian Yu [Mon, 10 Jun 2024 18:06:53 +0000 (11:06 -0700)]
LU-17402 kernel: update dotdot patch path for RHEL 8.10

After commit 0536b2a landed, the patch path for
ext4-hash-indexed-dir-dotdot-update.patch was
changed from ubuntu18/ to rhel8.7/.

Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
  clientdistro=el8.10 serverdistro=el8.10 testlist=sanity

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.10 serverdistro=el8.10 \
  testgroup=full-part-3

Change-Id: I323fe06cfd125ad57959782bb33a2af81b705788
Fixes: 0536b2a ("LU-17711 osd-ldiskfs: do not delete dotdot during rename")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55381
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17887 obd: do not update obd_memory from RCU 63/55263/2
Bruno Faccini [Thu, 30 May 2024 16:39:37 +0000 (18:39 +0200)]
LU-17887 obd: do not update obd_memory from RCU

OBD_FREE_PRE() should not be run from an RCU
callback as the obdclass module may have been
unloaded during the RCU grace period.

Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Change-Id: I6f663b2aed2e60c15f2a1b9755b2c4050bd91ce2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55263
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-17000 utils: Initialize var 'gw' and 'net' before using 16/55316/2
Arshad Hussain [Wed, 5 Jun 2024 06:46:13 +0000 (02:46 -0400)]
LU-17000 utils: Initialize var 'gw' and 'net' before using

Although this is called at "sequence end" and most
likely 'gw' and 'net' will be populated by then. It
is still good to be defensive and make them initialize

Test-Parameters: trivial testlist=sanity-lnet
CoverityID: 410246 ("Uninitialized scalar variable")
CoverityID: 410240 ("Uninitialized scalar variable")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I2f47df431eea0e0344043ac22806865e87435c6e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55316
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17899 gss: lsvcgss service fix 93/55293/2
Sebastien Buisson [Mon, 3 Jun 2024 11:52:20 +0000 (13:52 +0200)]
LU-17899 gss: lsvcgss service fix

The lsvcgss service can fail to start if the daemon is invoked with
the '-k' option whereas no proper Kerberos configuration is in place
on the server. The daemon should ignore the '-k' option is such case
and try to start the other provided modes if any (SSK, Null).
And in case the daemon is started with the '-s' option (SSK), it
spawns a temporary additional thread to compute the number of rounds
used for Miller-Rabin prime testing. So the lsvcgss_sysd script should
support that.

Fixes: c6878334a1 ("LU-17741 gss: fix lsvcgss service for systemd")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iba632bd0ea9696ccea52bff5982a4d4e490597a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55293
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17000 obdclass: Initialize var 'bufsize' before using 91/55291/2
Arshad Hussain [Mon, 3 Jun 2024 10:01:04 +0000 (06:01 -0400)]
LU-17000 obdclass: Initialize var 'bufsize' before using

This patch initialize variable bufsize before using. This is because
bufsize is left uninitialized if obd_page_dif_generate_buffer() calls
fails. Once bufsize is initialize calling cfs_crypto_hash_final()
becomes safe.

Test-Parameters: trivial
CoverityID: 397224 ("Uninitialized scalar variable")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I933cc3746d107acb308bd0060b7648a82410711c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17844 oss: remove all LCONSOLE_ERROR_MSG() 81/55281/2
Timothy Day [Sat, 1 Jun 2024 04:32:01 +0000 (04:32 +0000)]
LU-17844 oss: remove all LCONSOLE_ERROR_MSG()

These magic numbers aren't so magical anymore. Just
use LCONSOLE_ERROR().

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id7ae3b50478c434203adfb375cb31f158d4b29d4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55281
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-16946 utils: allow lfs find time within a range 71/55271/6
Frederick Dilger [Thu, 30 May 2024 22:00:57 +0000 (18:00 -0400)]
LU-16946 utils: allow lfs find time within a range

If multiple times are specified on the command-line like:

        lfs find -atime +60 -atime -90 ...

use those times as the upper and lower bounds of the margin.
This makes it easier to find files that were created within
a specific range of dates.

While working on this patch I noticed that that margin
bounds are a little odd; using the range:

(limit - margin, limit]

instead of what I intuitively thought,

(limit - margin, limit + margin)

The logic behind this is unknown to me, but it can be found
'liblustreapi.c' in the method 'find_value_cmp()'.
However, for the time being, when using a time range, it will
simply shift the limit to the largest of the two and have
the margin cover the difference.

Signed-off-by: Frederick Dilger <fdilger@whamcloud.com>
Change-Id: I2e5b856396472eab91e1d2c3214f304010601a41
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55271
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-7892 utils: removed deprecated create_iam.c 65/55265/4
Maximilian Dilger [Thu, 30 May 2024 16:18:48 +0000 (12:18 -0400)]
LU-7892 utils: removed deprecated create_iam.c

Removed create_iam.c and all found references. The OI is now created
by osd-ldiskfs, so it is safe to remove create_iam.c

Signed-off-by: Max Dilger <mdilger@whamcloud.com>
Change-Id: Ibbc89ecfbfbebf6f61d93d4a784b509977ccb3c2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55265
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 months agoLU-17878 build: compatibility updates for kernel 6.9 01/55201/3
Shaun Tancheff [Sat, 25 May 2024 23:20:38 +0000 (17:20 -0600)]
LU-17878 build: compatibility updates for kernel 6.9

Linux v6.8-2-ga8922f79671f
  ceph: remove SLAB_MEM_SPREAD flag usage

Provide a replacement for older kernels when SLAB_MEM_SPREAD
is not defined.

Linux v6.8-rc1-47-gc69ff4071935
  filelock: split leases out of struct file_lock

Provide abstractions for:
  flc_type, flc_pid, flc_file, flc_flags, and flc_owner

Test-Parameters: trivial
HPE-bug-id: LUS-12363
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ide457ba29fc2d3537f074fe9a66cf0c8567f7621
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17516 utils: new --mdt and --ost options for lfs df 56/55156/7
Frederick Dilger [Fri, 17 May 2024 20:19:17 +0000 (14:19 -0600)]
LU-17516 utils: new --mdt and --ost options for lfs df

Added [--mdt | -m] and [--ost | -o] options for 'lfs df' to print
only usage of the respective MDT or OST devices in mntdf(). If both
"--mdt" and "--ost" are specified it will show both types of devices
which is identical to having neither specified.

Signed-off-by: Frederick Diger <fdilger@whamcloud.com>
Change-Id: I196b7c9c0c385850372331587936fa5cf6b71d93
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55156
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17000 utils: Use correct printf specifier for lustre_rsync.c 54/55154/2
Arshad Hussain [Mon, 20 May 2024 07:17:58 +0000 (03:17 -0400)]
LU-17000 utils: Use correct printf specifier for lustre_c

In lr_copy_xattr() use "%s" for "char *" and "%zd"
for "ssize_t" data type.

Change 'struct lr_info' fields xsize and xvsize from size_t
to ssize_t as extended attribute functions can return
negative values

Test-Parameters: trivial testlist=lustre-rsync-test
CoverityID: 397866 ("Invalid type in argument to printf format specifier")
CoverityID: 397573 ("Invalid type in argument to printf format specifier")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Idd7c4f81c1a1751c595c86b10493aab6f959059f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55154
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17854 lnet: Router should not drop msg past deadline 31/55131/7
Chris Horn [Wed, 22 May 2024 19:34:25 +0000 (13:34 -0600)]
LU-17854 lnet: Router should not drop msg past deadline

It has been observed that messages can become queued in LNet on
router nodes so long that they exceed their message deadlines. These
messages will currently be dropped, even if the target peer is alive.
PtlRPC adaptive timeouts can dynamically increase to account for the
increased network latency, but if the RPCs are dropped on routers then
these operations will fail. Routers should only drop messages when
the router peer health feature determines the target is down. This
gives Lustre the best chance to complete operations during periods of
increased network latency.

A bug in sanity-lnet/do_route_del() is fixed. The lnetctl route show
output was stored in a variable named "output", but the variable
"lnetctl_text" was checked to determine if the route needed to be
deleted.

test_102() was also modified to call cleanup_router_test(). A
comment there indicated it was not needed because the routes were
already deleted, but cleanup_router_test() does more than just
delete the route entries. Namely, unloading modules on all nodes.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-12153
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1e6966d4a3a2b10dd7b99620774d5c32b7eccd1f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17000 ptlrpc: Use matching deallocator for cfs_expr_list_values 43/55043/3
Arshad Hussain [Wed, 8 May 2024 10:05:30 +0000 (15:35 +0530)]
LU-17000 ptlrpc: Use matching deallocator for cfs_expr_list_values

For cfs_expr_list_values() allocator use cfs_expr_list_values_free()
as deallocator.

Coverity actually complained that kfree() should not
be called but free() should be called instead. It looks
like coverity is checking under file libcfs/libcfs/util/string.c
function cfs_expr_list_values which is calling calloc().
This cannot be correct under ptlrpc

Test-Parameters: trivial
CoverityID: 424700 ("Incorrect deallocator used")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Idfbb6be585b35f87a59ae92d0cffa85c8dff623a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55043
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 months agoLU-17566 mdt: improve new_init_ucred() for refactoring 25/55025/10
Aurelien Degremont [Wed, 6 Mar 2024 14:04:41 +0000 (15:04 +0100)]
LU-17566 mdt: improve new_init_ucred() for refactoring

In order to merge new_init_ucred() and old_init_ucred()
code eventually, move new_init_ucred() code around
for it to look even closer to old_init_ucred().

- Fill generic ucred fields at the beginning (similar to
what old_init_ucred() is doing.
- Move code for the bottom part to be closer to
old_init_ucred_common().

This code path is not used on most of lustre deployments,
so I'm enabling kerberos testing to ensure some tests
will go through this code path.

Test-Parameters: kerberos=true testlist=sanity-krb5

Change-Id: I113fca6a104c1db66d9e0defd6fd91e378d7208c
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/55025
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>