Whamcloud - gitweb
fs/lustre-release.git
5 hours agoLU-17504 build: fix gcc-13 [-Werror=stringop-overread] error 34/54834/6 master
Shaun Tancheff [Thu, 25 Apr 2024 17:57:36 +0000 (00:57 +0700)]
LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error

This patch fixes the following [-Werror=stringop-overread] and
[-Werror=attribute-warning] errors detected by gcc 13:

lustre/mgc/mgc_request.c:190:21: error: 'strcmp' reading 1 or
more bytes from a region of size 0 [-Werror=stringop-overread]
  190 | if (strcmp(logname, cld->cld_logname) == 0) {
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In function 'fortify_memcpy_chk',
    inlined from 'class_handle_ioctl' at
/root/lustre-release/lustre/obdclass/class_obd.c:381:3:
include/linux/fortify-string.h:528:25: error:
call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  528 |  __write_overflow_field(p_size_field, size);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f5a88b4cd64c9f4e67e568546baada371543b1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 hours agoLU-17587 build: use kernel version from dkms for client 30/54830/2
snehring [Wed, 17 Apr 2024 16:09:17 +0000 (11:09 -0500)]
LU-17587 build: use kernel version from dkms for client

The current behavior of the dkms build for clients is to only build
for the running kernel. This is fine if the other kernels are ABI
compatible with the running kernel because we tell dkms to run
weak-updates as part of the install process. However, if kernels that
are not ABI compatible with the running kernel are installed they
won't be targeted and weak-updates won't add in the modules. This
could be worked around by running 'dkms install' once booted into the
new kernel, but that's additional administrator overhead and not the
assumed behavior for a dkms module.

This modifies the dkms build script to accept the kernel version from
dkms and configure for that version. It also changes the behavior of
dkms wrt lustre to disable weak module updates since we're now
building for individual kernel versions. This will likely result in
longer times to install the client since we're building for each
installed version of the kernel, but it _should_ mean the client is
actually installed for each version.

Signed-off-by: snehring <snehring@iastate.edu>
Change-Id: I55fb1bb7159772d7ecd9d1837e870c7097c02d78
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17736 tests: Fix sanityn/73 for test machines with auditd 09/54809/3
Ellis Wilson [Fri, 8 Oct 2021 14:27:39 +0000 (10:27 -0400)]
LU-17736 tests: Fix sanityn/73 for test machines with auditd

getfattr performs one stat followed by two getxattr syscalls against
the provided file.  Normally, the stat results in no getxattr calls
internally (as it's not something stat is required to return).

However, if auditd is enabled AND one of the rules includes a
filesystem-specific rule such as watch directory X and record if it's
modified, then for every lookup (each of the three syscalls includes
one) an additional getxattr will be performed, resulting in 5 total
getxattrs.

Because there is significant fuzz here, revise the check to be
at minimum the two "expected" getxattrs but allow for more.
Comments have been added explaining this.

Signed-off-by: Ellis Wilson <elliswilson@microsoft.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=73,ONLY_REPEAT=10
Change-Id: I0da5c2a5331f7dba4e65051a073e2bec05327a25
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54809
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 hours agoLU-17362 build: Update ZFS version to 2.1.15 69/54769/2
Jian Yu [Fri, 12 Apr 2024 15:46:44 +0000 (08:46 -0700)]
LU-17362 build: Update ZFS version to 2.1.15

Update ZFS version to 2.1.15. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.15

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el8.9 serverdistro=el8.9 testlist=sanity

Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
  clientdistro=el9.3 serverdistro=el9.3 testlist=sanity

Test-Parameters: optional fstype=zfs testgroup=full-dne-zfs-part-1
Test-Parameters: optional fstype=zfs testgroup=full-dne-zfs-part-2
Test-Parameters: optional fstype=zfs testgroup=full-dne-zfs-part-3

Change-Id: I51532dbf9dbcadf64bb9dbd3b10e88d0cab38ffd
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54769
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17727 tests: add to auster --stop-on-error option 55/54755/3
Xiaolin (Charlene) Zang [Thu, 8 Jul 2021 04:32:40 +0000 (00:32 -0400)]
LU-17727 tests: add to auster --stop-on-error option

add to auster --stop-on-error option, a comma separated list of tests.

If any such test fails, auster will exit immediately without any
cleanup to make debugging particularly difficult and rare bugs more
tractable.

Signed-off-by: Xiaolin (Charlene) Zang <xiaolinzang@microsoft.com>
Change-Id: Icd8d1eaf8ae799bd74f9147ac9080a0950977526
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54755
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17497 tests: skip sanity-sec/69 for old MDS 82/54782/2
Andreas Dilger [Sun, 14 Apr 2024 07:43:08 +0000 (01:43 -0600)]
LU-17497 tests: skip sanity-sec/69 for old MDS

Older MDS versions do not have strict checking for identity_upcall
or rsi_upcall, don't run the test with those servers.

Test-Parameters: trivial testlist=sanity-sec env=ONLY=69 serverversion=2.15
Fixes: 2153e86541 ("LU-17497 obdclass: check upcall incorrect values")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icdfda82eca32c2de7e88991ead0d9723023ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-16741 lvm: rename ptlrpc_req_finished for component lvm 93/54693/2
Arshad Hussain [Mon, 8 Apr 2024 10:51:37 +0000 (06:51 -0400)]
LU-16741 lvm: rename ptlrpc_req_finished for component lvm

Patch renames ptlrpc_req_finished to ptlrpc_req_put for
lvm component

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I58dd90e4ae1a8834866491bf866cbacbd1c6e609
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54693
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17706 lnet: reserve TOFULND and EFALND 74/54674/2
Andreas Dilger [Thu, 4 Apr 2024 18:42:02 +0000 (12:42 -0600)]
LU-17706 lnet: reserve TOFULND and EFALND

Reserve network numbers for Fujitsu Torus Fusion LND and Amazon
Elastic Fabric Adapter LND to avoid hard-to-fix conflicts in the
future.

Add comments for the other LND numbers to provide some context.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icea6cecf5a951c5a44527c937a2631c9cc3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54674
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17703 lod: check the inherited pool for conflicts 61/54661/5
Vitaly Fertman [Wed, 3 Apr 2024 20:33:20 +0000 (23:33 +0300)]
LU-17703 lod: check the inherited pool for conflicts

In addition to LU-15658, the start index could be inherited from
parent and the pool from root: drop the pool in case of conflict
as well.

Another case of a problem inheritance is saving the inherited LOVEA
to subdir, when all the parameters are inherited but the ost list.

HPE-bug-id: LUS-11330, LUS-11631
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ief1dbd8c1ee0433bb625cbff1834b248d4fb2992
Reviewed-on: https://es-gerrit.hpc.amslabs.hpecorp.net/161800
Tested-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54661
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17669 test: using unintialized variable in sanity:160n 49/54549/2
Li Xi [Mon, 25 Mar 2024 02:20:35 +0000 (10:20 +0800)]
LU-17669 test: using unintialized variable in sanity:160n

This patch fix a simple typo of unintialized variable.

Fixes: d813c75df ("LU-14688 mdt: changelog purge deletes plain llog")

Test-Parameters: trivial testlist=sanity env=ONLY=160n
Change-Id: I2e29cce33733c925dfe9a53c06af7ac17b2c6be3
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54549
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-930 ptlrpc: quiet idle import logging 40/54540/2
Andreas Dilger [Fri, 22 Mar 2024 23:20:38 +0000 (16:20 -0700)]
LU-930 ptlrpc: quiet idle import logging

Don't log a debug message for every idle import every 25s, as this
pushes out other more important messages from the logs.

Fixes: 5a6ceb664f ("LU-7236 ptlrpc: idle connections can disconnect")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id98c2acad07cec62af0d705a437a4d2915ce9f62
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17431 utils: add 'dynamic' parameter to nodemap_cmd 03/54503/10
Sebastien Buisson [Wed, 20 Mar 2024 08:05:41 +0000 (09:05 +0100)]
LU-17431 utils: add 'dynamic' parameter to nodemap_cmd

Adding a 'dynamic' parameter to nodemap_cmd() will enable
'lctl nodemap_*' commands to handle dynamic nodemaps, i.e.
nodemaps created directly on MDS/OSS side, and stored in memory.

If both MDT and OST are running on the same node, the MDS device
is used for the ioctl.  It doesn't matter which one is actually
used, since it gets to the same place in ptlrpc anyway, it just
needs to find a valid OBD device to run the ioctl.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id58199e1ad6622aad896737604c0a8e1287ba34e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54503
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 hours agoLU-17431 nodemap: add function to know if nodemap is on MGS 06/54506/8
Sebastien Buisson [Wed, 20 Mar 2024 08:33:11 +0000 (09:33 +0100)]
LU-17431 nodemap: add function to know if nodemap is on MGS

Adding nodemap_mgs() function allows to know if nodemaps are defined
on an MGS node (pointer to a nodemap config file) or not.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id87e34dd8d13cd21c88c87ef9e8e91ff9ff142c8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54506
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17627 build: fix new mofed version 36/54336/5
Minh Diep [Wed, 6 Mar 2024 02:26:58 +0000 (18:26 -0800)]
LU-17627 build: fix new mofed version

Allow multi-digit MOFED version numbers.
Fix compare_version function to return what it should

Test-Parameters: trivial
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: I0f585cb355bb34270003ae1139688080c301186a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54336
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
5 hours agoLU-17592 build: kernel 6.8 -Werror=missing-prototypes 28/54228/13
Shaun Tancheff [Mon, 15 Apr 2024 18:29:30 +0000 (11:29 -0700)]
LU-17592 build: kernel 6.8 -Werror=missing-prototypes

Linux commit v6.7-rc4-156-g0fcb70851fbf
  Makefile.extrawarn: turn on missing-prototypes globally

With -Wmissing-prototypes and -Werror cleanup some additional
funtions that are implicitly static and provide declarations
for those that are exported.

Add SERVER_ONLY and SERVER_ONLY_EXPORT_SYMBOL to wrap functions
that are only exported for and used by server components.

Test-Parameters: trivial
HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ice5219df5463effe964d2cd2114f003d185337da
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17379 lnet: parallelize peer discovery via LNetAddPeer 33/53933/10
Serguei Smirnov [Tue, 6 Feb 2024 03:24:01 +0000 (19:24 -0800)]
LU-17379 lnet: parallelize peer discovery via LNetAddPeer

Initiate peer discovery via its non-primary NIDs
as they are being added in LNetAddPeer by pretending
that they belong to different peers. This may be
useful if some of the comma-separated NIDs in the
mount command (including the first listed NID) are down.
If discovery is performed in the background and there's
at least one reachable NID in the list, the discovery
will succeed and peer records will get consolidated.

If primary NID locking is enabled, The first NID in the list
provided by Lustre to LNetAddPeer always gets locked as primary:
even if it doesn't get discovered.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I449cb9898c0242db874555a62fe8099352e913e6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53933
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
5 hours agoLU-10717 tests: minor fixes to conf-sanity 55/53455/5
Andreas Dilger [Thu, 14 Dec 2023 04:13:29 +0000 (21:13 -0700)]
LU-10717 tests: minor fixes to conf-sanity

Remove use of fancy quotation marks in conf-sanity test_102.
Quiet other minor shellcheck warnings in test_30a and test_84.
Fix incorrect variable in error message in test_133.

Test-Parameters: trivial testlist=conf-sanity env=ONLY="30a 84 102"
Fixes: aa9f9344fc ("LU-10717 tests: tests should not start mgs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9825157c6b72addc6883e8bc44aea53b483ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53455
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-14361 statahead: wait inuse entry finished during cleanup 91/49291/18
Qian Yingjin [Thu, 1 Dec 2022 02:43:50 +0000 (21:43 -0500)]
LU-14361 statahead: wait inuse entry finished during cleanup

If the entry is being used by the user process when the statahead is
doing cleanup and quit, it must wait for the inuse entry finished
and then kill the local cached entries in statahead context.

Add sanity/test_123{k,l} to verify it.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I747badd85bd44cb20f7d37ca3126ca308a632371
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 hours agoLU-14361 statahead: add support for mdtest shared dir workload 54/48954/18
Qian Yingjin [Wed, 26 Oct 2022 03:01:57 +0000 (23:01 -0400)]
LU-14361 statahead: add support for mdtest shared dir workload

This patch adds statahead support for shared dir stat() workload
with fname patteren like mdtest shared dir stat() access.

The performance imporvements are shown as follows:
IO500 (KIOPS) w/o patch w/ path
mdtest-easy-stat 740.01 1276.31
mdtest-hard-stat 514.36 1105.33

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I43983e91eb864bd317cfb883e35e2f4c1a8f788c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48954
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 hours agoLU-14361 statahead: return ENOENT for batched statahead 87/51587/21
Qian Yingjin [Thu, 6 Jul 2023 03:41:46 +0000 (23:41 -0400)]
LU-14361 statahead: return ENOENT for batched statahead

When stat on a non-existing file in a batched statahead context,
MDT should return -ENOENT immediately and stop the statahead work.

Otherwise, the client may cache the parent inode with UPDATE lock
and the non-existing dentry under the protection of the parent's
UPDATE lock wrongly.

Add sanity/test_123j to verify it.

Test-Parameters: clientdistro=el9.2 testlist=sanity env=ONLY=123i,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia4618f605d2f38ce712e421bcd7b96688bbfbb32
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51587
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-15277 utils: lfs quota/setquota improvements 15/46615/11
Andreas Dilger [Fri, 25 Feb 2022 08:23:02 +0000 (01:23 -0700)]
LU-15277 utils: lfs quota/setquota improvements

Add long options to "lfs quota" for ease of use.  Improve usage
message for "lfs quota" and "lfs setquota" to match current code.

Deprecate the "lfs quota -i MDT_IDX|-I OST_IDX" options to print one
target, since these arguments are backward from other lfs subcommands.
Add "-o" for the OST_IDX and "-m" for the MDT_IDX and long options
"--ost" and "--mdt" to match other lfs subcommands.  We may eventually
be able to liberate "-i" to use the OST_IDX, but not for a while yet.

Fix "lfs setquota" handling of long --times option.  It was being
checked in lfs_setquota_times(), but not in has_times_option().

Sort arguments to be handled (as much as possible) in alphabetical
order for ease of use in the future.

Update lfs-quota.1 and lfs-setquota.1 man pages to describe all
options and add proper argument formatting.

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I049f22a526469ea1ed1da04beffda6bb683ebbe6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46615
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-15277 quota: don't print extra default quota info 25/45725/14
Hongchao Zhang [Wed, 20 Mar 2024 14:14:27 +0000 (22:14 +0800)]
LU-15277 quota: don't print extra default quota info

While getting quota info by "lfs quota", it's better to include
default quota to the quota output of the specific quota ID.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6726888b8857f9a45a96c83db0a546b29507cf8a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45725
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
5 hours agoLU-13577 wbc: reimplement mkdir() by using intent lock 47/38647/32
Qian Yingjin [Mon, 18 May 2020 07:18:08 +0000 (15:18 +0800)]
LU-13577 wbc: reimplement mkdir() by using intent lock

This patch reworks mkdir() by using intent lock.
Instead of reint mkdir implementation without any lock returned,
a ibits lock (current PR LOOKUP|PERM) is granted to the client and
cached on the client-side lock namespaces by the mkdir() intent
lock request.

This is also a basic requirement for the coming WBC feature, i.e,
create a new directory and an EX WBC lock is returned from MDT in
intent lock request, then this root WBC directory can be safely
cached on the client under the protection of the root WBC EX lock.

This patch also adds a tuning parameter "llite.*.intent_mkdir" to
enable or disable mkdir() by using intent lock. It is set with 0
by default to disable intent mkdir().

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I94e4c2f8262d7ffb27d85b5569070049a47354d7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-16915 tests: improve distro type checking 90/54790/5
Andreas Dilger [Fri, 12 Apr 2024 01:18:28 +0000 (19:18 -0600)]
LU-16915 tests: improve distro type checking

Improve lustre_os_release() infrastructure to reduce redundant
code and make it easier to use.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec env=ONLY=51,HONOR_EXCEPT=y serverdistro=el9.3
Test-Parameters: testlist=sanity env=ONLY=906,HONOR_EXCEPT=y serverdistro=el9.3
Fixes: b881bd1051 ("LU-16915 tests: except sanity-sec test_51")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id02223752df4eb3fd3b62b339e8c417eb33ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54790
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 hours agoLU-17025 llapi: restore 'pool=ignore' functionality 55/54355/8
Rajeev Mishra [Mon, 11 Mar 2024 18:53:11 +0000 (18:53 +0000)]
LU-17025 llapi: restore 'pool=ignore' functionality

Changes to llapi_stripe_param_verify() and related llapi file
creation functions to verify that the given pool name is valid
introduced a bug that disallowed the 'ignore' pool name, which
is used to create files without any pool name.

Allow the reserved pool names from lov_pool_is_reserved() to be
used even (especially!) if the named pool does not exist.

Revert the changes to ost-pools.sh::test_32() that created the
'ignore_pool' pool, and go back to checking that 'ignore' will
create a file that does not use any pool.

Change the pool name validation to only do fsname lookup if the
pool name is actually specified, instead of looking up fsname
but not actually using it for anything.

Fixes: ee7dfc5ad1 ("LU-17025 llapi: Verify stripe pool name")
Signed-off-by: Rajeev Mishra <rajeevm@hpe.com>
Change-Id: I9368f28a41fd9af6b6f0e9468df0e7dfd728db1c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54355
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
5 hours agoLU-6142 osd-zfs: Fix style issues for osd_lproc.c 66/54266/3
Arshad Hussain [Mon, 4 Mar 2024 06:53:11 +0000 (01:53 -0500)]
LU-6142 osd-zfs: Fix style issues for osd_lproc.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_lproc.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Icb7b2a5805cddbd14458ed71835f5e12f14d18ea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54266
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-6142 osd-zfs: Fix style issues for osd_internal.h 63/54263/4
Arshad Hussain [Mon, 4 Mar 2024 08:04:40 +0000 (03:04 -0500)]
LU-6142 osd-zfs: Fix style issues for osd_internal.h

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_internal.h

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iba857ae53a1a579dfc3ef6e422bcb3c47dd88cf1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54263
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-6142 osd-zfs: Fix style issues for osd_handler.c 62/54262/4
Arshad Hussain [Mon, 4 Mar 2024 09:00:15 +0000 (04:00 -0500)]
LU-6142 osd-zfs: Fix style issues for osd_handler.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_handler.c

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ibf1d954b8c1e3e64d3ae1661cfecbb09569ba955
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54262
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-6142 osd: Fix style issues for osd_object.c 57/54257/3
Arshad Hussain [Sun, 3 Mar 2024 16:55:29 +0000 (22:25 +0530)]
LU-6142 osd: Fix style issues for osd_object.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_object.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I32a91583b37752a722cf558dfa14f191163090b3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54257
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-6142 osd: Fix style issues for osd_xattr.c 56/54256/2
Arshad Hussain [Sun, 3 Mar 2024 17:20:40 +0000 (22:50 +0530)]
LU-6142 osd: Fix style issues for osd_xattr.c

This patch fixes issues reported by checkpatch
for file lustre/osd-zfs/osd_xattr.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I446e990ba4865943d17087beaf8e53082bae9131
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-6142 llite: Fix style issues for rw.c 41/54141/2
Arshad Hussain [Thu, 22 Feb 2024 06:39:08 +0000 (12:09 +0530)]
LU-6142 llite: Fix style issues for rw.c

This patch fixes issues reported by checkpatch
for file lustre/llite/rw.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7acdf52f598d26d7b54b5c63384c99ea14fa6e26
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54141
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-17250 mgs: generate a new MDT configuration by copy 14/53614/11
Etienne AUJAMES [Mon, 8 Jan 2024 15:06:08 +0000 (16:06 +0100)]
LU-17250 mgs: generate a new MDT configuration by copy

The configuration for a new MDT is generated by reading the client
configuration. The MGS filter existing mdc/osc, interpret the
records and then create the corresponding osp/osc device for the MDT.

The main idea of this patch is first to convert and copy the records
from the client configuration to create the new MDT.
And then, copy the remaining record sections from an existing MDT.
So the new MDT can inherit OST pools and parameters from the existing
one.

This avoids complex compatibility checks for IPv4/v6 NID because
add_uuid records are copied without need to parse NIDs.
This also allows to copy "add failnid" section from the client.

This patch extend the usage to "add failnid" section on MDT
configurations.

Here are the steps to copy a existing MDT configuration:

1/ read client configuration and generate osp MDT/OST records for the
   new MDT
1/ find an existing MDT configuration
2/ copy and convert the remaining configuration records from the
   existing MDT configuration (parameters and OST pools)

Add the regresion test conf-sanity 137.

Test-Parameters: mdtcount=4 fstype=zfs testlist=conf-sanity
Test-Parameters: mdtcount=4 fstype=ldiskfs testlist=conf-sanity
Test-Parameters: mdtcount=4 fstype=zfs testlist=conf-sanity env=ONLY=137,ONLY_REPEAT=10
Test-Parameters: mdtcount=4 fstype=ldiskfs testlist=conf-sanity env=ONLY=137,ONLY_REPEAT=10
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I4a99085b8930a0dd8002bde87d4e8c575aaccba0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-4341 tests: re-enable SLES sanity test_170 test_243 77/54777/6
Andreas Dilger [Sun, 14 Apr 2024 03:08:48 +0000 (21:08 -0600)]
LU-4341 tests: re-enable SLES sanity test_170 test_243

Re-enable tests on SLES that has been disabled since SLES11.
The SLES version check was broken and these were already
running on SLES15 without issues.

Test-Parameters: trivial
Test-Parameters: testlist=sanity env=ONLY="170 243",ONLY_REPEAT=20 clientdistro=sles15sp5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0f837ac5180d0754b67f349592503267aa2c5f52
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54777
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
5 hours agoLU-16822 lnet: always initialize IPv6 at start up 18/51818/6
James Simmons [Mon, 22 Apr 2024 14:08:11 +0000 (10:08 -0400)]
LU-16822 lnet: always initialize IPv6 at start up

Currently lnet_inet_enumerate() has a bool parameter that enables
collecting IPv6 addresses for selection which is optional. This
patch changes the behavior to always collect proper IPv6 addresses
and now the bool flag means prefer the IPv6 over any IPv4 addresses.
Update the user land applications lctl and lnetctl to send a flag
to select IPv6 or IPv4 at initialization of the LNet stack. This
is useful for IPv6 and other large NID type testing.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ib3f38de15b1295ec1f8e8607dbd971583541f06c
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17744 ldiskfs: mballoc stats fixes 72/54772/3
Alexander Zarochentsev [Sun, 31 Mar 2024 20:21:56 +0000 (20:21 +0000)]
LU-17744 ldiskfs: mballoc stats fixes

Change mballoc statistics to use correct
allocation loop ids.

Fixes: 95f8ae56774 ("LU-12103 ldiskfs: don't search large block range if disk full")
HPE-bug-id: LUS-11936
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I892ead5355865ec9c07fdc758e127c711b42cb1b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17724 gss: fix bad use of user buffer in rsi upcall 30/54730/3
Sebastien Buisson [Thu, 11 Apr 2024 06:58:19 +0000 (08:58 +0200)]
LU-17724 gss: fix bad use of user buffer in rsi upcall

Use the proper kernel buffer to print message out when
upcall_cache_set_upcall() returns an error.

Fixes: 2153e86541 ("LU-17497 obdclass: check upcall incorrect values")
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ice781b4506822f1fd4ce0a062ce742f51e366525
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54730
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 days agoLU-17717 tests: skip sanity-lnet/252 for interop 07/54707/4
Alex Zhuravlev [Tue, 9 Apr 2024 10:14:01 +0000 (13:14 +0300)]
LU-17717 tests: skip sanity-lnet/252 for interop

as the subtest fails finding the memory leak which has been
fixed recently.

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ide80e0b39a053a2774804b025306ebdb1fc964a8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54707
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 days agoLU-17713 mdd: validate the length of mdd append_pool name 91/54691/6
Emoly Liu [Wed, 10 Apr 2024 09:18:03 +0000 (09:18 +0000)]
LU-17713 mdd: validate the length of mdd append_pool name

Validate the length of mdd append_pool name (<= LOV_MAXPOOLNAME)
before saving it in function append_pool_store().
Also, sanity.sh test_27M is improved a little to verify this fix.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id7083fab60e9a18af4d8eedfa3d55f37544ba15d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54691
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
6 days agoLU-17297 tests: recovery-small_156 interop check 46/54646/3
Sergey Cheremencev [Mon, 1 Apr 2024 19:31:15 +0000 (22:31 +0300)]
LU-17297 tests: recovery-small_156 interop check

Don't start recovery-small_156 "tot_granted miscount
after client eviction" with OSTs less than 2.15.60.

Test-Parameters: trivial testlist=recovery-small env=ONLY=156 serverversion=2.15
Fixes: 9df01eee75 ("LU-17297 grant: move tgt_grant_sanity_check() calls")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I800ac435dcba267b9a60a919d007428bb8af7f90
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54646
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 days agoLU-17692 flock: get extra reference for lockd 22/54622/5
Yang Sheng [Thu, 28 Mar 2024 19:54:06 +0000 (03:54 +0800)]
LU-17692 flock: get extra reference for lockd

We should get local locking first for GETLK. Else
the lock_owner could be released while working with
lockd.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I56e4204e315c2bdbc496b7961519ae45ab1820fe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54622
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17688 ofd: access log to release chardev 06/54606/6
Alex Zhuravlev [Thu, 28 Mar 2024 11:35:59 +0000 (14:35 +0300)]
LU-17688 ofd: access log to release chardev

due to missing put_device() OFD access log leaks number of structures.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I36109738201b98025bbd2e6ed7c8830044e505c2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 days agoLU-17684 mdt: lprocfs_mdt_open_files_seq_open() leaks op_data 91/54591/7
Alex Zhuravlev [Wed, 27 Mar 2024 18:54:01 +0000 (21:54 +0300)]
LU-17684 mdt: lprocfs_mdt_open_files_seq_open() leaks op_data

op_data is allocated in single_open() and paired single_close()
is supposed to free it, but instead seq_release() was used.

same for ldlm_granted_fops.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I91846ea7a2c896cb57b878905db4f3630939a652
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54591
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 days agoLU-17672 ldiskfs: release s_mb_prealloc_table 53/54553/10
Alex Zhuravlev [Mon, 25 Mar 2024 07:14:46 +0000 (10:14 +0300)]
LU-17672 ldiskfs: release s_mb_prealloc_table

at umount

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0a5cbf646c9bd73461691c49c6e7a509acd5a500
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
6 days agoLU-17650 gss: fix use out of bounds in ptlrpc_gss 52/54452/6
Oleg Drokin [Tue, 19 Mar 2024 03:10:13 +0000 (23:10 -0400)]
LU-17650 gss: fix use out of bounds in ptlrpc_gss

KASAN highlighted that the sockaddr_un struct is not enough
for the kernel primitives we use, so we have to use the
bigger sockaddr_storage for allocation, alas the field
names inside are different so we have to jump through some
hoops to make it actually work.
Also for a 128 byte allocation on stack variable is fine and
cannpot fail, so convert to that

Change-Id: I2292900b54756bf39530c96f7c5c228835562bef
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54452
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-17630 osc: add cond_resched() to osc_lru_shrink() 46/54346/6
Alex Zhuravlev [Mon, 11 Mar 2024 07:42:24 +0000 (10:42 +0300)]
LU-17630 osc: add cond_resched() to osc_lru_shrink()

osc_lru_shrink() may need to handle lots of pages and this way
can block scheduling for long. add couple cond_resched() to
prevent kernel warnings and other thread's starvation.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I862c568ac777c0b929a1ffb61e246b079aee6718
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <patrick.farrell@oracle.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 days agoLU-17000 utils: Fix do_warn_interval resource leak 30/54330/8
Arshad Hussain [Fri, 8 Mar 2024 10:12:26 +0000 (15:42 +0530)]
LU-17000 utils: Fix do_warn_interval resource leak

In function do_warn_interval 'fd' opened was not closed
in case write() returned error. This leak is fixed by
calling close() before returning

This patch also checks the return from futimens() and
logs an error in case it fails

CoverityID: 415056 ("Resource leak")
Fixes: a454c9efd8 (LU-17137 utils: Deprecate l_getidentity 'files' alias)
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ice0269d524e237a4fc421b2a91d8f26b5e41b13f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54330
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 days agoLU-8130 nrs: for TBF nid handling using rhashtables 93/54193/7
James Simmons [Wed, 27 Mar 2024 17:21:37 +0000 (11:21 -0600)]
LU-8130 nrs: for TBF nid handling using rhashtables

While looking at the nrs code for lnet_nid_t I saw TBF was not
using struct lnet_nid. For the first step to support large NIDs
I moved the current use of cfs_hash to rhashtables. This doesn't
complete IPv6 support but its a first step since the rhashtable
can use large NIDs. Next step will be updating tr_nids handling.

With this port I found the refcount handling to be incorrect.
Before this work I saw in the debug logs

Busy TBF object from client with NID 0@lo, with -1073741824 refs

and nrs_tbf_res_put() never cleans up struct nrs_tbf_client until
the filesystem is unmounted. With this patch we do cleanup
each nrs_tbf_client after we are done with policy. With this being
the case nrs_tbf_nid_hop_exit() should be called unless something
is wrong.

Test-Parameters: trivial testlist=sanityn
Change-Id: Iab69a16c12ed89f0694af7bcfe9158f468838ca4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54193
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17422 clio: use page pools for UDIO/hybrid 70/53670/17
Patrick Farrell [Wed, 27 Mar 2024 21:45:02 +0000 (17:45 -0400)]
LU-17422 clio: use page pools for UDIO/hybrid

This moves unaligned/hybrid IO to using page pools.  This
reduces the time spent in memory allocation while doing IO
to near zero, at least in simple tests.

This should close most of the performance gap between
udio/hybrid and regular DIO for reads, taking them from
~13 GiB/s to close to 20 GiB/s.  This should also scale as
DIO performance improves.

The improvement for writes is much more limited, because
UDIO writes do not have parallel data copy yet.  This will
improve UDIO write performance by perhaps 10-20%, so from
~2.5 GiB/s to ~3.0 GiB/s, very roughly.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I0cb8b5881bf2885a926383291f67fa252b56574f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-17422 osc: Clear PageChecked on bounce pages 65/53865/10
Patrick Farrell [Wed, 27 Mar 2024 21:44:47 +0000 (17:44 -0400)]
LU-17422 osc: Clear PageChecked on bounce pages

When we're finalizing a bounce page, we must clear
PageChecked.  Otherwise, if it's a page pool page, it will
be reused without the full wipe the kernel gives it, and we
will see PageChecked on pages which are not actually from
encryption and will handle them incorrectly.

Fixes: f3fe144b85 ("LU-15003 sec: use enc pool for bounce pages")
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I8b319e7ba55dd883d74db79a19bf93b6f125616a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-17422 obdclass: rename sptlrpc pool and move init 69/53669/14
Patrick Farrell [Wed, 27 Mar 2024 21:44:15 +0000 (17:44 -0400)]
LU-17422 obdclass: rename sptlrpc pool and move init

This patch completes the move of the pools code to obd by
renaming the sptlrpc pool to obd, and moves the pool init
and cleanup to obd.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I9164601745c8faf19559216f55ea5df4e2e226fe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53669
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-17422 obdclass: rename ptlrpc_page_pool 68/53668/14
Patrick Farrell [Wed, 27 Mar 2024 21:41:21 +0000 (17:41 -0400)]
LU-17422 obdclass: rename ptlrpc_page_pool

This patch renames the ptlrpc page pool to reflect its new
place in obd.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I67aa5f3eef26b5fb890e62bced837bea9dd032c6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53668
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-17422 obdclass: move page pools to obdclass 67/53667/14
Patrick Farrell [Wed, 27 Mar 2024 21:34:09 +0000 (17:34 -0400)]
LU-17422 obdclass: move page pools to obdclass

This patch starts the process of moving page pools to
obdclass by moving the file and making the changes necessary
to compile and run Lustre with the file moved.

This does not rename anything in the file yet, that will be
done in subsequent patches.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Iff39dd9ddfb105773f8eafa4754d32189067189b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53667
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17550 lov: check for empty layout on DIO 77/54077/7
Patrick Farrell [Mon, 1 Apr 2024 15:28:49 +0000 (11:28 -0400)]
LU-17550 lov: check for empty layout on DIO

When a write crosses from an area of a file with a layout to
one without, the write should return ENODATA.  Due to layout
caching in the direct IO path, we need an extra check for
this to work correctly for DIO.

Fixes: 14db1faa0f ("LU-13799 lov: Cache stripe offset calculation")
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Ib9a40dab7939d9420144ecaa7460625d6184aa0b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54077
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17533 llite: call merge attr on all writes 16/54016/9
Patrick Farrell [Wed, 13 Mar 2024 14:45:44 +0000 (10:45 -0400)]
LU-17533 llite: call merge attr on all writes

Because DIO writes do not update the inode size during the
write, when a file is closed and the LSOM update is sent,
the file size provided by the client is incorrect.

DIO writes don't cause consistency problems because ls and
other things which check the file size will get the correct
size and update the inode size then, but that just means
this issue isn't fatal - DIO should still update the inode
size.

This is best done by calling ll_merge_attr on all writes at
the end of the write, rather than just for async writes in
vvp_io_write_commit.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I856b319254ad7093e69e41613120e06b71f656cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-16736 quota: set revoke time to avoid endless wait 26/50626/8
Hongchao Zhang [Wed, 20 Mar 2024 05:37:22 +0000 (13:37 +0800)]
LU-16736 quota: set revoke time to avoid endless wait

The revoke time of the lquota entry should be set when its qunit
reaches least qunit, but it could not be set in some rare case,
which could be related to the broken quota LDLM lock, set it in
"qmt_acquire" to avoid endless wait in QSD.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ib68c5dc881346e0e619d43553ee490847ae5e225
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
6 days agoLU-16623 tests: ignore sanity-pfl stripe-count off-by-1 78/54778/6
Andreas Dilger [Sun, 14 Apr 2024 05:54:24 +0000 (23:54 -0600)]
LU-16623 tests: ignore sanity-pfl stripe-count off-by-1

In some cases the MDS may not create all stripes on a file, if the
MDT-OST connection does not have precreated objects.  This is OK,
so the tests should not fail the stripe-count check if trying to
create a fully-striped file and one of the stripes is missing.

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie482fdf86f82e7a2292c021761885249a6c551f1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54778
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17269 obdclass: fix locking for class_register/deregister 47/54747/7
Timothy Day [Thu, 11 Apr 2024 17:56:54 +0000 (17:56 +0000)]
LU-17269 obdclass: fix locking for class_register/deregister

Prevent registration and deregistration from racing with
each other. Otherwise, we could see a crash.

Test-Parameters: testlist=conf-sanity env=ONLY=41c,ONLY_REPEAT=30
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I4d512dcc8778c5116c1d6037ed2b7f486a7bc0dc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54747
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 days agoLU-13374 mdd: fix close time update race with set-in-past 50/54450/8
Vitaly Fertman [Mon, 18 Mar 2024 21:33:25 +0000 (00:33 +0300)]
LU-13374 mdd: fix close time update race with set-in-past

Do not update mtime on close if ctime is not increased.

Save the time when atime was last changed, in case this is
set-in-past, to not lose it on a later LSOM update on close.

HPE-bug-id: LUS-12186
Fixes: d2f7cb7934a0 ("LU-12026 mdt: MDS stores atime|mtime|ctime")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I070578a30f9bf548eec18a34ba6a06f1cb16909e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54450
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
6 days agoLU-6142 lnet: SPDX for lnet/include/ and misc files 51/54251/2
Timothy Day [Sat, 2 Mar 2024 22:00:47 +0000 (22:00 +0000)]
LU-6142 lnet: SPDX for lnet/include/ and misc files

Convert from verbose license text to SDPX.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Iad6b111df015cbe524ff0cad9f2a2efc446c2692
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54251
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-12452 lnet: allow to set IP ToS value per-NI 81/54081/6
Etienne AUJAMES [Fri, 16 Feb 2024 17:19:47 +0000 (18:19 +0100)]
LU-12452 lnet: allow to set IP ToS value per-NI

This patch allows to set the IP "Type of Service" value per network
interface to use IP QoS on TCP or RoCE network.

e.g:
$ lnetctl add --net tcp2 --if eth1 --tos 104

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I99fad41f4b12951c0d09ad7460ff0ed107e7ce0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54081
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-6142 mdd: Fix style issues for mdd_object.c 76/54076/3
Arshad Hussain [Fri, 16 Feb 2024 10:36:34 +0000 (16:06 +0530)]
LU-6142 mdd: Fix style issues for mdd_object.c

This patch fixes issues reported by checkpatch
for file lustre/mdd/mdd_object.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie7ce393116ceb554e95c752739552fae29ada9a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54076
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17440 lnet: prevent errorneous decref for asym route 96/53896/16
Gian-Carlo DeFazio [Thu, 29 Feb 2024 00:44:48 +0000 (16:44 -0800)]
LU-17440 lnet: prevent errorneous decref for asym route

The following stack trace was seen on a lustre server:
Call Trace TBD:
[<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[<0>] lnet_destroy_peer_ni_locked+0x44d/0x4e0 [lnet]
[<0>] lnet_handle_find_routed_path+0x86c/0xee0 [lnet]
[<0>] lnet_select_pathway+0xb95/0x16c0 [lnet]
[<0>] lnet_send+0x6d/0x1e0 [lnet]
[<0>] lnet_parse_local+0x3ed/0xdd0 [lnet]
[<0>] lnet_parse+0xd7d/0x1490 [lnet]
[<0>] kiblnd_handle_rx+0x30e/0x900 [ko2iblnd]
[<0>] kiblnd_scheduler+0x104b/0x10d0 [ko2iblnd]
[<0>] kthread+0x14c/0x170
[<0>] ret_from_fork+0x1f/0x40

It was discovered that the lnet routes between the server
and a client cluster were misconfigured, so that the clients
had routes to the server through all 8 available routers,
but the server had routes to the clients through only 7 of
the routers.

The server was contacted by a client node through the
router with the missing route. It incremented the ref count
for the corresponding struct lnet_peer_ni for that router,
but then, because it had no route through that peer, changed
the value of the struct lnet_peer_ni to a peer with a route
back to the client. It then decremented the new
struct lnet_peer_ni which resulted in the ref count being
decremented to 0 which caused an LBUG.

Detect if the peer is a router to the appropriate net.
If so, decrement its ref count at the end of the function,
if not, decrement its ref count immediately.

Fixes: 2e27193 ("LU-17062 lnet: Update lnet_peer_*_decref_locked usage")
Test-Parameters: testlist=sanity-lnet mdscount=1 osscount=2 clientcount=1
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I2d00faef60ae8768afa7afbb1b00a62ba90535bb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
6 days agoLU-17638 utils: break up Netlink error handling 12/54412/27
James Simmons [Tue, 9 Apr 2024 17:23:44 +0000 (13:23 -0400)]
LU-17638 utils: break up Netlink error handling

In the current code when function yaml_netlink_msg_complete()
calls yaml_netlink_msg_error() the arg becomes NULL. So break
up yaml_netlink_msg_error() into two functions. One called by
the netlink err callback and the other called directly by
yaml_netlink_msg_complete(). Also change the libyaml
read_handler_data to yaml_parser itself since its life cycle
is outside the library itself so no chance of it disappear on
us while executing library code.

Fixes: d3ef8f6 ("LU-9680 lnet: add NLM_F_DUMP_FILTERED support")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Iacb1e9c8929cd8a78a14580d909f94f2569fa5a3
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54412
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-17218 ofd: improve filter_fid upgrade compatibility 98/52798/8
Bobi Jam [Mon, 23 Oct 2023 07:29:07 +0000 (15:29 +0800)]
LU-17218 ofd: improve filter_fid upgrade compatibility

filter_fid could be expanded in later Lustre version, and with
upgrade then downgrade process, the filter_fid EA on disk
could has been expanded during upgrade, and won't work after
the downgrade.

This patch improves this process by allocating bigger buffer to
hold the expanded filter_fid EA then trims the unrecognizable
fileds off.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4c99f1d9f3962d46ebf9e9b799988ff3dba4f919
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52798
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
6 days agoLU-13802 ptlrpc: correctly remove inflight request 99/54099/8
Patrick Farrell [Wed, 13 Mar 2024 14:46:12 +0000 (10:46 -0400)]
LU-13802 ptlrpc: correctly remove inflight request

When removing a request from the active set on error, we
must also remove it from "inflight" or we will not reduce
inflight as needed and hang on cleanup.

This bug has been latent for some time, but running sanity
414 with hybrid IO tends to trigger it.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Ib73980724f6e2f5a74400a39840df2e8835a6e23
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
6 days agoLU-13802 llite: add fail loc to force bio-dio switch 93/52593/25
Patrick Farrell [Mon, 1 Apr 2024 15:28:28 +0000 (11:28 -0400)]
LU-13802 llite: add fail loc to force bio-dio switch

This adds a fail loc to force switching from BIO to DIO.

Test-Parameters: trivial
Test-Parameters:testlist=sanity env=ONLY=119j,ONLY_REPEAT=50
Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Icba303c32a86170af08a78c6b306db08e8ed6047
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
6 days agoLU-13814 osc: add osc transient page ops 77/52077/19
Patrick Farrell [Fri, 23 Feb 2024 16:05:11 +0000 (11:05 -0500)]
LU-13814 osc: add osc transient page ops

As part of gradually removing cl_pages for transient pages,
create a special set of OSC page operations for them.

This makes it easier to see what's left for transient pages
and focus on removing that.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I20bbe7535e8df223ec1fff9b940b3063fcc3f8d7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52077
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-13814 clio: Remove owner for transient pages 76/52076/20
Patrick Farrell [Thu, 28 Mar 2024 03:05:44 +0000 (23:05 -0400)]
LU-13814 clio: Remove owner for transient pages

Removing cl_page ownership is another step in removing
cl_page for transient/DIO pages.  This disables all of the
ownership related functionality for transient pages.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I7f1776284d7cd14bdab89290adcc27e3c73416ec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52076
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-13814 clio: add io to cio_submit 75/52075/21
Patrick Farrell [Fri, 23 Feb 2024 16:02:35 +0000 (11:02 -0500)]
LU-13814 clio: add io to cio_submit

cp_owner is going away for transient pages, so we need to
remove its usage to find the top level IO.  Here that means
passing the IO down through a few layers.

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I7fac0e53a7831247b846261c1c734c9d6e43a7d2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
6 days agoLU-10215 tests: remove disk2_4 disk2_5 images 81/47281/4
Andreas Dilger [Thu, 4 Apr 2024 03:36:24 +0000 (21:36 -0600)]
LU-10215 tests: remove disk2_4 disk2_5 images

Remove the old disk2_4-*.tar.bz2 and disk2_5-ldiskfs.tar.bz2
images from the Git repo.  The disk2_5 image was never included into
testing due to an oversight in Makefile.am, and adding it to testing
is unlikely to be of any practical value as these releases are both
more than 10 years old and very unlikely to have any users that would
actually want to upgrade their systems at this point.

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY=32 mdscount=1 mdtcount=1
Test-Parameters: testlist=conf-sanity env=ONLY=32 mdscount=2 mdtcount=4
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I76a42eb90c3e1198d33783f3089ac30462429ac4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47281
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-17705 ptlrpc: replace synchronize_rcu() with rcu_barrier() 69/54669/5
Alex Zhuravlev [Thu, 4 Apr 2024 08:14:10 +0000 (11:14 +0300)]
LU-17705 ptlrpc: replace synchronize_rcu() with rcu_barrier()

synchronize_rcu() does not wait for in-flight rcu callback completion,
thus kmem_cache_free() can still race with kmem_cache_destroy().

Fixes: a9411a9856a ("LU-17076 nrs: wait for RCU completion")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2da668c06b532a41c8ce2fe681ea17cf6f3013ef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54669
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 weeks agoLU-16904 tests: Fix sanity test 56a and 65a when PFL layout is used 44/54644/3
Wei Liu [Mon, 1 Apr 2024 17:51:56 +0000 (10:51 -0700)]
LU-16904 tests: Fix sanity test 56a and 65a when PFL layout is used

Fix sanity test_56a to use correct operator order
Skip sanity test_65a if PFL layout is set since it is a test of
directory with no stripe info

Test-Parameters: trivial testlist=sanity-compr env=ONLY="56a 65a",compr_STRIPEPARAMS="-E 1M -c1 -E eof"
Test-Parameters: testlist=sanity-compr env=ONLY="56a 65a",compr_STRIPEPARAMS="-E 64k -c 1 -E eof"
Test-Parameters: testlist=sanity-compr env=ONLY="56a 65a",compr_STRIPEPARAMS="-E 64k -c 1 -E eof -c 2"
Test-Parameters: testlist=sanity-compr env=ONLY="56a 65a",compr_STRIPEPARAMS="-E 64k -c 1 -E 1M -c 2 -E eof -c 4 -S 4M"

Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I0c17a0aceed7894f4eefa7336bd4a11e8fd7bc9e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54644
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 weeks agoLU-15552 tests: skip bad result in sanity-flr test_0d 23/54623/3
Alexandre Ioffe [Fri, 29 Mar 2024 05:18:55 +0000 (22:18 -0700)]
LU-15552 tests: skip bad result in sanity-flr test_0d

Ignore bad result of sanity-flr test_0d for MDS version older
than v2_14_57-72-gf468093cb6

Test-Parameters: trivial testlist=sanity-flr env=ONLY=0d
Test-Parameters: trivial testlist=sanity-flr env=ONLY=0d clientversion=2.14
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I0df94eea9fd11ca3f74a7df47b77de1de76c4066
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54623
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-16500 utils: 'lfs migrate' should select new OSTs 00/54600/4
Andreas Dilger [Thu, 28 Mar 2024 03:18:56 +0000 (21:18 -0600)]
LU-16500 utils: 'lfs migrate' should select new OSTs

When migrating a file using "lfs migrate FILE" without any arguments
to specify a new layout, this should migrate the file to the best
OSTs available at that time based on free space, instead of keeping
the file on the same OSTs (which is almost pointless otherwise).

Reset the starting OST index for all components of the copied file
layout so that this can happen properly.  Previously, only the last
component had the OST index reset, which was only partly helpful.

Add llapi_layout_ost_index_reset() to handle this, since it seems
likely that tools using llapi_layout_from_fd() and friends to copy
an existing layout will want to do the same.  Add the corresponding
man page and reference it from llapi_layout_get_from_fd().

Update sanity test_56xe to check that the starting OST index of each
component is not the same for all components.  This check might not
catch a broken "lfs migrate" every time since even before this patch
the last component would be allocated on a random OST, but will still
fail about once every 1/$OST_COUNT runs.  Conversely, with this patch
it passes hundreds of iterations without a false positive, though a
small chance exists that it will have a false positive on occasion.

Add a "make utils" target to simplify building only user utilities.

Test-Parameters: testlist=sanity env=ONLY=56xe,ONLY_REPEAT=100
Fixes: 0568f4ca25 ("LU-16500 utils: set default ost index for lfs migrate")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie4c68d4b2ff09560a7a13ae464723745cf968d36
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54600
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17667 tests: Handle more than 1 IP returned by 'ip' cmd 47/54547/2
Arshad Hussain [Sun, 24 Mar 2024 10:27:22 +0000 (06:27 -0400)]
LU-17667 tests: Handle more than 1 IP returned by 'ip' cmd

An interface could have more than one IP address. This may
be not normal and is a corner case. This patch handles case
where 'ip' command returns more than single IP and also adds
new info/debug messages.

Corner Case:
ip -o -4 a s enp0s8 | awk '{print $4}' | sed 's/\/.*//'
192.168.50.188
192.168.1.12

Before patch:
sanity-lnet.sh line 1174: ((: 188 12: syntax error in expression
(error token is "12")

After patch:
...
IP for enp0s8 found [2]
Interface:IP are
enp0s8:192.168.50.188
enp0s8:192.168.1.12
Using GW_NID:192.168.50.189@tcp
...

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I783a6b67508a4497d18db94b5d2bdab616b4ade5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54547
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17431 ost: add ioctl handler for oss 05/54505/7
Sebastien Buisson [Wed, 20 Mar 2024 08:24:30 +0000 (09:24 +0100)]
LU-17431 ost: add ioctl handler for oss

Adding ioctl handler for oss allows managing dynamic nodemaps
on OSS side.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I90f4c6988bed2ba721e366ae088983958d484a2f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54505
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17431 mdt: add ioctl handler for mds 04/54504/7
Sebastien Buisson [Wed, 20 Mar 2024 08:20:52 +0000 (09:20 +0100)]
LU-17431 mdt: add ioctl handler for mds

Adding ioctl handler for mds allows managing dynamic nodemaps
on MDS side.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6a68a17d3f12c799238a93242bbd385e6eeb1d0b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54504
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17431 ptlrpc: move nodemap related ioctls to ptlrpc 02/54502/6
Sebastien Buisson [Tue, 19 Mar 2024 16:04:20 +0000 (17:04 +0100)]
LU-17431 ptlrpc: move nodemap related ioctls to ptlrpc

Move to ptlrpc the functions designed to handle nodemap specific
ioctls, as they should not be accessible to MGS only.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7a9651ea8484c540d18d6813ab96dc95a0871245
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54502
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17643 gss: make a local copy of the sptlrpc llog 94/54394/8
Sebastien Buisson [Thu, 14 Mar 2024 17:15:29 +0000 (18:15 +0100)]
LU-17643 gss: make a local copy of the sptlrpc llog

Make a local copy on server side of the sptlrpc llog, so that
the targets that do not manage to connect to the MGS know at least
which security flavor to accept from clients.
This needs to pass the super_block to config_log_find_or_add().

Add sanity-sec test_70 to check that sptlrpc llog on MDS and OSS side
is equivalent to the one from the MGS.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I81f0136746e2df7cca1b34c4a17e4b7135a43c29
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54394
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17504 build: fix array-index-out-of-bounds warning 65/54365/9
Jian Yu [Wed, 3 Apr 2024 07:38:47 +0000 (00:38 -0700)]
LU-17504 build: fix array-index-out-of-bounds warning

On Linux kernel 6.5, due to commit 2d47c6956ab3
("ubsan: Tighten UBSAN_BOUNDS on GCC"), flexible
trailing arrays declared like 'lc_array_sum[1];'
will generate warnings when CONFIG_UBSAN & co. is
enabled:

  UBSAN: array-index-out-of-bounds in lprocfs_status.c:1609:17
  index 1 is out of range for type '__s64 [1]'

Since LPROCFS_STATS_FLAG_IRQ_SAFE flag is only used
in one place - obd_memory() counter, we can just
remove it and change obd_memory over to a regular
percpu_counter. This would both simplify the
lprocfs_counter() code, move over to using more
kernel functionality instead of libcfs, as well as
reduce overhead slightly for the memory accounting code.

Change-Id: Ic461c4b30317bfd2b1e9f5b6be84c4a7fb4e3eb9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17624 ssk: support FIPS mode on client 14/54314/8
Sebastien Buisson [Wed, 6 Mar 2024 15:33:25 +0000 (15:33 +0000)]
LU-17624 ssk: support FIPS mode on client

In FIPS mode, only certain crypto methods are allowed. This has an
impact on the DHKE mechanism implemented for SSK, as this relies on
a prime number generated for the client key. More specifically, FIPS
mode imposes that only certain safe, well-known primes be used.

OpenSSL prior to v1.1 just imposes a requirement on the prime length.
OpenSSL v1.1 requires the use of a specific primitive when FIPS mode
is on, to fetch a well-known prime based on a prime NID.
OpenSSL v3 is capable of detecting FIPS mode is enforced, and picks up
a well-known prime instead of generating one.

Because of this, primes used for the DHKE are identical on all clients
in FIPS mode. So urge admins to use a short expiration time on SSK
keys, one day instead of one week, so that security contexts are
re-negotiated more often.

The NIST recommended primes are from see Table 26 in Appendix D of:
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Ar3.pdf

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1 clientdistro=el9.2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 clientdistro=el9.2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I52b1926393e51fba6a9e92a837f86a38516ef6ad
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54314
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-17592 build: compatibility updates for kernel 6.8 29/54229/12
Shaun Tancheff [Sun, 24 Mar 2024 08:33:15 +0000 (15:33 +0700)]
LU-17592 build: compatibility updates for kernel 6.8

Linux commit v4.9-12227-g7b737965b331 introduced
  staging/lustre/libcfs: Convert to hotplug state machine
Linux commit v4.10-rc1-5-g4205e4786d0b
  cpu/hotplug: Provide dynamic range for prepare stage
Linux commit v6.7-rc2-1-g15bece7bec0d
  cpu/hotplug: Remove unused CPU hotplug states

CPUHP_LUSTRE_CFS_DEAD was introduced in 4.9 and removed in 6.8
CPUHP_BP_PREPARE_DYN was introduced in 4.10

With no distro kernels between 4.10 and 4.11 switch to
CPUHP_BP_PREPARE_DYN

Linux commit v6.7-rc1-3-gda549bdd15c2
  dentry: switch the lists of children to hlist
Provide trival wrappers to abstract the changed members

Linux commit v6.7-rc4-79-gaf7628d6ec19
  fs: convert error_remove_page to error_remove_folio
Proved a generic_error_remove_folio() for older kernels.

HPE-bug-id: LUS-12181
Fixes: ce98bfe5f72 ("LU-10499 pcc: add readonly mode for PCC")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib2e85c2acd3d0934e1c4712dad53b80f0ddb1b08
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17592 build: kernel 6.8 removed strlcpy() 27/54227/13
Shaun Tancheff [Sun, 24 Mar 2024 06:55:08 +0000 (13:55 +0700)]
LU-17592 build: kernel 6.8 removed strlcpy()

Linux commit v6.7-11707-gd26270061ae6
  string: Remove strlcpy()

strlcpy() is removed, use strscpy() and provide a strscpy()
for kernels that do not have one.

HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ieab872f20e08d17a4842bc944fa38f9867de81f9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54227
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17518 gss: do not trust supp groups from client with krb 87/53987/12
Sebastien Buisson [Fri, 9 Feb 2024 15:42:40 +0000 (16:42 +0100)]
LU-17518 gss: do not trust supp groups from client with krb

Thanks to Kerberos, Lustre does not have to trust clients anymore,
but relies on keytabs and tickets, cryptographically validated, to
recognize clients and users.
RPC provided supplementary groups should not be trusted, but checked
thanks to identity upcall and the trusted UID from the ticket.

Add sanity-krb5 test_9 to exercise this.

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4113ef654492e76fcd377b2c0cc74e484b27850b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53987
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-16724 ptlrpc: refactor page pools patch 3 63/52663/10
Patrick Farrell [Wed, 27 Mar 2024 21:28:11 +0000 (17:28 -0400)]
LU-16724 ptlrpc: refactor page pools patch 3

This is a combined series that refactors the page pools
code to make it more readable.  It used to be many
separate patches but has been combined in to just three,
and this is the third of three.

LU-16724 ptlrpc: remove PAGES_PER_POOL macro

The page pool code *also* likes to refer to each page of
pointers it uses to track items in it as a "POOL", which is
incredibly confusing.

Start unwinding this by removing the PAGES_PER_POOL macro.

LU-16724 ptlrpc: change "pool" to "page_ptrs"

The page pool code *also* likes to refer to each page of
pointers it uses to track items in it as a "POOL", which is
incredibly confusing.

This patch works on renaming that to page_ptrs, but leaves
some steps for a future patch.

Change-Id-Was: I56ee54c7f39b52d7cceffec9e3decf71bd313ddc

LU-16724 ptlrpc: rename max_pools to max_ptr_pages

Continue removal of referring to page pointers as pools
with another rename.

Change-Id-Was: I14796f670a7f06fbec3b40ec23b4dd2e50f22d46

LU-16724 ptlrpc: rename npools to nptr_pages

Continue removal of 'pool' as a name for a page of pointers
to items in a pool.

Change-Id-Was: I97b320027a0a6b5870d246e1527fa3fbe15fccb5

LU-16724 ptlrpc: rename 'pools' to 'ptr_pages'

This finalizes the removal of the overloading of 'pools'
to also mean pointers of pages to items in each page pool.

Change-Id-Was: I0f4aba95f573f4afdc6f5d92f22fd67391fa6dab

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: Ie29434f53eeb945b8d35df7c1212ae3f51a2aafa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
2 weeks agoLU-16724 ptlrpc: refactor page pools patch 2 45/52645/10
Patrick Farrell [Wed, 27 Mar 2024 21:24:18 +0000 (17:24 -0400)]
LU-16724 ptlrpc: refactor page pools patch 2

This is a combined series that refactors the page pools
code to make it more readable.  It used to be many
separate patches but has been combined in to just three,
and this is the second.

LU-16724 ptlrpc: stop passing around pool_index

We pass pool_index around from function to function over
and over, but it's easier to just pass the pool around.

This does require the pool to know its own index, but
that seems better anyway.

LU-16724 ptlrpc: convert to void

Convert functions without meaningful return to void.

Change-Id-Was: I81f0baefd5b77b60ba699fa8749eaa83acadd8dd

LU-16724 ptlrpc: refactor pool growing code

This refactors the pool growing code, combining two
separate instances of it in to a single function.

Change-Id-Was: I175abc7e61d55563e989f87207a8c59da852f5f9

LU-16724 ptlrpc: replace ELEMENT_SIZE

The ELEMENT_SIZE macro is fine, but it takes a pool index
and doesn't handle the pool of order 0.  Change it to a
function.  (This is marginally less efficient in one spot,
since it replaces a shift with a divide, but it should be
just fine.)

Change-Id-Was: I322037e50bbdb8e0274b37f82618b6907b6d2906

LU-16724 ptlrpc: simplify pool arrays

Currently, we do a fancy trick where we have a pool of
order 0, then subsequent pools start at
PPOOL_MIN_CHUNK_BITS (which is actually the minimum
compresison size).

So pool index 1 isn't a pool of order 1 (2 pages), it's a
pool of order PPOOL_MIN_CHUNK_BITS.

All this saves us is the cost of the empty pools below
PPOOL_MIN_CHUNK_BITS, but it makes the code notably harder
to read.

With this change, the order of the pool and the pool index
are the same.  This simplification will be embraced more
in subsequent patches.

Change-Id-Was: I650e05d25727f10b0ca2d556cba17e9c4fccc309

LU-16724 ptlrpc: begin renaming pool_index to order

Replace local variables for pool_index with pool_order.

Other renames will be in a subsequent patch, to keep these
as simple as possible.

Change-Id-Was: If347ff39776f9a75c0f7d9d9981d01e19bc2cbc9

LU-16724 ptlrpc: rename ppp_index to ppp_order

Rename ppp_index to ppp_order.

Other renames will be in a subsequent patch, to keep these
as simple as possible.

Change-Id-Was: I96559e27a67b7cc4e56e06378e5686370438850c

LU-16724 ptlrpc: rename INDEX macros

Rename INDEX macros to ORDER.

Change-Id-Was: Ic1123d25bc855dc7671c9cb587a0d6680662b729

LU-16724 ptlrpc: remove PAGES_POOL macro

PAGES_POOL is just the order 0 pool now, so remove the
special naming, and adjust a few associated functions.

Change-Id-Was: I09e1debeadecbce33c7be43a8859815084623358

Signed-off-by: Patrick Farrell <patrick.farrell@oracle.com>
Change-Id: I42dc8b8094212c69b7a29cc3766bd0a10860f7af
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/52645
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
2 weeks agoLU-17700 lnet: properly calculate ping buffer size 73/54673/5
James Simmons [Fri, 5 Apr 2024 00:01:32 +0000 (20:01 -0400)]
LU-17700 lnet: properly calculate ping buffer size

Originally for lnet_ping() we allocated the ping buffer size by
using lnet_ping_sts_size(). The limitation to that approach is
that if the nid passed into lnet_ping_sts_size() is a smaller
NID like IPv4 the buffer could be too small. Say n_ids is 4
and 3 returned NIDs are IPv4 but one is IPv6 then it can overflow.
The solution is allocate maximum possible NID size. That can be
done with LNET_ANY_NID which fills in all the fields. For
lnet_ping_sts_size() we have to properly handle the size when
using LNET_ANY_NID. If struct lnet_nid ever increasing in the
future this code should still work.

Also cap the maximum size of the ping buffer to avoid o2iblnd
failures from using RDMA which sends data that doesn't support
large NIDs.

Fixes: d137e9823ca ("LU-10003 lnet: use Netlink to support LNet ping commands")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I5b61add2b3701cad12074515f45773bbc9fbc583
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54673
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17696 llite: remove LASSERT from ll_ddelete() 76/54676/2
Jian Yu [Thu, 4 Apr 2024 21:30:19 +0000 (14:30 -0700)]
LU-17696 llite: remove LASSERT from ll_ddelete()

On Linux kernel 6.8, the changes in commit 2f42f1eb9093
("Call retain_dentry() with refcount 0") made d_delete()
instances called for dentries with ->d_lock held and
refcount equal to 0, which caused the following assertion
failure on Lustre client:

(dcache.c:136:ll_ddelete()) ASSERTION( d_count(de) == 1 ) failed

The value of d_count(de) became 0 instead of 1. Since
retain_dentry() was called either with refcount 0 or 1,
we can simply remove the LASSERT(ll_d_count(de) == 1)
from ll_ddelete() to avoid the above failure.

Change-Id: Ic4a39d9328326634190cd0719b4c0637e1bf315c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54676
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16915 tests: except sanity-sec test_51 51/54751/15
Andreas Dilger [Fri, 12 Apr 2024 01:18:28 +0000 (19:18 -0600)]
LU-16915 tests: except sanity-sec test_51

Skip sanity-sec test_51 since it has started failing recently with
the move to el9.3 servers.

Add common lustre_os_release infrastructure to make such checking
easier in the future.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec env=ONLY=51,HONOR_EXCEPT=y serverdistro=el9.3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id02223752df4eb3fd3b62b339e8c417eb3e86a12
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54751
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17721 tests: improve sanity-flr/210b 19/54719/3
Alex Zhuravlev [Wed, 10 Apr 2024 11:21:35 +0000 (14:21 +0300)]
LU-17721 tests: improve sanity-flr/210b

do not use total space counters - it's proven to be an unreliable
approach. instead check whether specific object is really removed
(using debugfs).

Test-Parameters: trivial env=ONLY=210b,ONLY_REPEAT=20 testlist=sanity-flr
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8d7af4ef5404115ce94040960b3cae9c05c9832f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54719
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17675 tests: disable sanity-flr/61a on RHEL9.3 31/54731/2
Alex Zhuravlev [Thu, 11 Apr 2024 07:49:12 +0000 (10:49 +0300)]
LU-17675 tests: disable sanity-flr/61a on RHEL9.3

still failing a lot:
Error: 'f61a.sanity-flr: atime: old '1712732591' != new '1712732585''
Failure Rate: 32.61% of most recent 92 runs, 8 skipped (all branches)

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id1b97d884f83b743a9ca6e72d21237689b5d19be
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54731
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-17269 tests: exclude conf-sanity/41c 44/54744/4
Andreas Dilger [Thu, 11 Apr 2024 18:04:14 +0000 (12:04 -0600)]
LU-17269 tests: exclude conf-sanity/41c

It appears that conf-sanity test_41c has started crashing recently,
likely due to running test sessions with multiple MDTs by default.
Exclude this subtest until it is fixed.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=41c,HONOR_EXCEPT=y
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3d88459d6c51d37fd3c489541077541d97384910
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54744
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoNew tag 2.15.62 2.15.62 v2_15_62
Oleg Drokin [Mon, 8 Apr 2024 15:44:43 +0000 (11:44 -0400)]
New tag 2.15.62

Change-Id: I10e862a3b3f0bb23288c382d2341a09071157936
Signed-off-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17690 quota: uninitialized var in qmt_lgd_extend_cb() 10/54610/2
Alex Zhuravlev [Thu, 28 Mar 2024 17:32:46 +0000 (20:32 +0300)]
LU-17690 quota: uninitialized var in qmt_lgd_extend_cb()

fix false warning

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I30e530119edd34c2a73487af4cbcc7ee9da46725
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17683 lnet: ksocknal_startup() leaks iface table 90/54590/6
Alex Zhuravlev [Wed, 27 Mar 2024 17:19:03 +0000 (20:19 +0300)]
LU-17683 lnet: ksocknal_startup() leaks iface table

which is allocated in lnet_inet_enumerate()

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib25402bb82a33c5f4838fc5bd9e9a22c806df89a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54590
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-17673 obdclass: properly free opts string 50/54650/2
James Simmons [Tue, 2 Apr 2024 22:24:59 +0000 (18:24 -0400)]
LU-17673 obdclass: properly free opts string

With lmd_parse() rework being based on the llite match_table work
it inherited the same memory leak. Save the opts pointer to be
used bu kfree at the end.

Fixes: 415fa27540 ("LU-9325 obdclass: use match_table for server mount options")
Test-Parameters: trivial
Change-Id: I016f39f1512118486a0dc119ab075a1408b2a709
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54650
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17685 utils: Allow nocompr flag in lfs mirror extend 40/54640/4
Alexandre Ioffe [Sat, 30 Mar 2024 20:55:24 +0000 (13:55 -0700)]
LU-17685 utils: Allow nocompr flag in lfs mirror extend

Extend the set of allowed optional flags in
'lfs mirror extend' command by LCME_FL_NOCOMPR. Allowed syntax:
--flags=prefer
--flags=nocompr
--flags=prefer,nocompr

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Id1538182eca0142464c19c0c4b1406592e615be1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-930 ofd: improve orphan cleaning message 39/54639/4
Aurelien Degremont [Sat, 30 Mar 2024 20:32:02 +0000 (21:32 +0100)]
LU-930 ofd: improve orphan cleaning message

Reword the orphan cleaning message, happening
at each MDT connection after OST start to make it less
scary and implies a bit more this is just normal thing
going on.

Test-Parameters: trivial

Change-Id: I7b4d726b1e96fe8d39872df5ed85453c99ccdc6a
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-8191 selftest: restore BUILD_BUGs 35/54635/4
Timothy Day [Fri, 29 Mar 2024 20:03:30 +0000 (20:03 +0000)]
LU-8191 selftest: restore BUILD_BUGs

It was wrong to remove lnet_selftest_structure_assertion()
since it contained BUILD_BUGs used to ensure different LNet
Selftest versions can interoperate.

Restore the BUILD_BUGs and add a dummy user in LNet Selftest
init. This should prevent analyzers from picking this up as
an unused function.

Fixes: 43cbc93 ("LU-8191 lnet: remove unused, fix non-static functions")
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I5e498dc411048f0ae3ca69cc5d24a728a711d6f7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54635
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-17678 quota: fix memleak in qmt_setup_lqe_gd() 75/54575/3
Alex Zhuravlev [Tue, 26 Mar 2024 14:41:18 +0000 (17:41 +0300)]
LU-17678 quota: fix memleak in qmt_setup_lqe_gd()

a race in qmt_setup_lqe_gd() can lead to leaked lqe_glbl_data.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I807aa276fa373cec493cae9d8182b28d996f5a8b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/54575
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>