Whamcloud - gitweb
fs/lustre-release.git
10 months agoLU-16846 nrs: Fix console messages 21/51121/6
Etienne AUJAMES [Wed, 24 May 2023 12:35:29 +0000 (14:35 +0200)]
LU-16846 nrs: Fix console messages

Fix format of console messages and missing end-of-line.

CERROR("%s.%d NRS: ....", service_name, cpt, ...);

Test-Parameters: trivial
Fixes: c098c09 ("LU-14976 nrs: change nrs policies at run time")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ib447673c69bcc853ebd1479463ca79bd5aa59964
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51121
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16842 fsx: tolerate delete last non-stale mirror error 90/51090/3
Bobi Jam [Tue, 23 May 2023 03:11:37 +0000 (11:11 +0800)]
LU-16842 fsx: tolerate delete last non-stale mirror error

fsx mirror split test could try to delete the last non-stale mirror
of a file and that's a tolerable error scenario. The fsx FLR test
randomly choose a mirror operation and this situation could happen.

Test-Parameters: trivial testlist=sanity-flr env=ONLY=70a
Fixes: 04ab0cc869c (LU-14156 utils: mirror split to check for last in-sync early)
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I80c294da80740b21e00ae72a092fd8883ec7d60e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12019 build: Recognize Debian Kernel and set KMP dir 66/51066/6
Thomas Stibor [Tue, 13 Jun 2023 17:34:39 +0000 (13:34 -0400)]
LU-12019 build: Recognize Debian Kernel and set KMP dir

Recognize Debian kernel and make sure kernel module package (KMP)
directory matches with KMP_MODDIR of Ubuntu and the Debian building
package system.

Change-Id: Ia3570500ed538c5d3c7a002eafddfc715efbf580
Test-Parameters: trivial clientdistro=ubuntu2204
Signed-off-by: Thomas Stibor <t.stibor@gsi.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51066
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16805 llite: improve readpage debug 92/50892/2
Patrick Farrell [Mon, 8 May 2023 21:44:11 +0000 (17:44 -0400)]
LU-16805 llite: improve readpage debug

LU-16412 (which is a workaround for a kernel bug) added a
debug message in ll_readpage(), but this message is printed
every time rather than only when the kernel bug is hit.

Let's fix this.

Fixes: 209afbe28b "LU-16412 llite: check truncated page in ->readpage()"
Test-parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ice02178eb9c07e03b58fb4e2d64ed3ea878cf137
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50892
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-16697 llite: Set BDI_CAP_* flags for lustre 97/50497/8
Shaun Tancheff [Sat, 1 Apr 2023 08:41:16 +0000 (03:41 -0500)]
LU-16697 llite: Set BDI_CAP_* flags for lustre

Lustre should set the BDI_CAP_* flags and the s_iflags
to indicate support for write back and cgroup write back

HPE-bug-id: LUS-11553
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I49ce07fce8a9d153b9a71d8a0ba28b799354fc7f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
10 months agoLU-16691 ldiskfs: limit length of per-inode prealloc list 81/50481/15
Alex Zhuravlev [Fri, 31 Mar 2023 05:41:07 +0000 (08:41 +0300)]
LU-16691 ldiskfs: limit length of per-inode prealloc list

In the scenario of writing sparse files, the per-inode prealloc list may
be very long, resulting in high overhead for ext4_mb_use_preallocated().
To circumvent this problem, we limit the maximum length of per-inode
prealloc list to 512 and allow users to modify it.

After patching, we observed that the sys ratio of cpu has dropped, and
the system throughput has increased significantly. We created a process
to write the sparse file, and the running time of the process on the
fixed kernel was significantly reduced, as follows:

Running time on unfixed kernel:
    # time taskset 0x01 ./sparse /data1/sparce.dat
    real    0m2.051s
    user    0m0.008s
    sys     0m2.026s

Running time on fixed kernel:
    # time taskset 0x01 ./sparse /data1/sparce.dat
    real    0m0.471s
    user    0m0.004s
    sys     0m0.395s

Link: https://lore.kernel.org/r/d7a98178-056b-6db5-6bce-4ead23f4a257@gmail.com
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5e4ea3acfc07f6e69890690211bf6a34c1230979
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50481
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
10 months agoLU-16651 llite: hold invalidate_lock when invalidate cache pages 71/50371/4
Qian Yingjin [Tue, 21 Mar 2023 08:53:00 +0000 (04:53 -0400)]
LU-16651 llite: hold invalidate_lock when invalidate cache pages

The newer kernel (such as Ubuntu 2204) introduces a new member:
invalidate_lock in the structure @address_space.
The filesystem must exclusively acquire invalidate_lock before
invalidating page cache in truncate / hole punch (and thus calling
into ->invalidatepage) to block races between page cache
invalidation and page cache filling functions (fault, read, ...)

However, current Lustre client does not hold this lock when remove
pages from page cache caused by the revocation of the extent DLM
lock protecting them.
If a client has two overlapped PR DLM extent locks, i.e:
- L1 = <PR, [1M, 4M - 1]
- L2 = <PR, [3M, 5M - 1]
A reader process holds L1 and reads data in range [3M, 4M - 1].
L2 is being revoken due to the conflict access.
Then the page read-in by the reader may be invalidated and deleted
from page cache by the revocation of L2 (in lock blocking AST).

The older kernel will check each page after read whether it was
invalidated and deleted from page cache. If so, it will retry the
page read.

In the newer kernel, it removes this check and retry.
Instead, it introduces a new rw_semaphore in the address_space -
invalidate_lock - that holding the shared lock to protect adding
of pages to page cache for page faults / reads / readahead, and
the exclusive lock to protect invalidating pages, removing them
from page cache for truncate / hole punch.

Thus, in this patch it holds exclusive invalidate_lock in newer
kernels when remove pages from page cache caused by the revocation
of a extent DLM lock protecting them. Otherwsie, it will result in
-EIO error or partial reads in the new added test case sanity/833.

Test-parameters: clientdistro=ubuntu2204 testlist=sanity env=ONLY=833,ONLY_REPEAT=10
Change-Id: If3a27002b89636b9fd4d7b5ea50afa9aeac5d121
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50371
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
10 months agoLU-16594 build: get_random_u32_below, get_acl with dentry 93/50193/7
Shaun Tancheff [Sun, 2 Apr 2023 09:15:58 +0000 (04:15 -0500)]
LU-16594 build: get_random_u32_below, get_acl with dentry

Linux commit v6.1-13825-g3c202d14a9d7
  prandom: remove prandom_u32_max()

Use get_random_u32_below() and provide a replacement
when get_random_u32_below is not available.

Linux commit v6.1-rc1-2-g138060ba92b3
  fs: pass dentry to set acl method
Linux commit v6.1-rc1-4-g7420332a6ff4
  fs: add new get acl method

get_acl() and set_acl() have new signatures

Test-Parameters: trivial
HPE-bug-id: LUS-11556
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1de02f86fd2719fc75de4f014f51d73736d83c33
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50193
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-13340 lustre: Support large nids in LCFG_ADD_UUID 96/50096/9
Mr NeilBrown [Thu, 12 May 2022 04:39:16 +0000 (14:39 +1000)]
LU-13340 lustre: Support large nids in LCFG_ADD_UUID

struct lustre_cfg contains lcfg_nid which is used only for
LCFG_ADD_UUID.  This is not sufficient for large nids.

The LCFG_ADD_UUID config record has room for 4 arbitrary strings only
one of which is used ("node").  So we can use the second string to
store a larger nid.

Specifically: if lcfg_nid is zero, then the second config string
(named "nid") will store the nid in string format.  When a nid with
4-byte address is needed, that will always be stored in lcfg_nid,
never in the config string.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If08477df677f26e0ff450803e79dde707becde0f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
10 months agoLU-16518 build: llvm/clang support 63/50063/16
Timothy Day [Sat, 18 Feb 2023 03:49:34 +0000 (03:49 +0000)]
LU-16518 build: llvm/clang support

Other projects, notably Linux, have build support for LLVM and
Clang via special environment variables. This is implemented
for Lustre, in the style of:

https://www.kernel.org/doc/html/latest/kbuild/llvm.html

Instances in which GCC is explicitly called are replaced by the
use of $CC. The proper environment variables as passed to make
invocations as needed.

All checks which influence global compiler and toolchain settings
are collected in 'config/lustre-toolchain.m4'.

A configure option is added to disable the strict error flags that
are passes to the C compiler by default. CFLAGS and EXTRA_CFLAGS
are made to work in the typical way. Having fine grained control
over compiler options makes experimenting with Clang smoother.

Some compile checks in 'lustre-core.m4' have been improved by using
unused variables and explicitly setting the compile flag to be used
during the test.

This also sets the execute bit on autogen.sh.

Tested with:
Linux (mainline) - 5.15.94
openZFS - 2.1.99
Lustre (latest master) - 2.15.55
CentOS - 8.5
Clang (default on CentOS) - 12.0.1

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia8654c22fa8fca7bfb96c545ac144a1d3737fa00
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50063
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-13343 gss: no sec flavor on loopback connection 04/46704/10
Sebastien Buisson [Fri, 4 Mar 2022 15:45:59 +0000 (16:45 +0100)]
LU-13343 gss: no sec flavor on loopback connection

When using a local client, i.e. a client mounted on a server node,
there is no benefit from a security standpoint to enforce an SSK or
KRB flavor, since the data does not go over the network.
So force the 'null' security flavor for connections on 0@lo,
independently of the currently defined srpc flavor.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If25d69bb1e67735cb0544ca954e49175f7471248
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12511 build: include firewalld files for native Linux client 79/51379/3
James Simmons [Tue, 20 Jun 2023 13:36:27 +0000 (09:36 -0400)]
LU-12511 build: include firewalld files for native Linux client

Build rpms for the native Linux client fails due to xml files for
firewalld not being packaged. These files are useful for the
native client support so package them in that case.

Test-Parameters: trivial
Fixes: 9cb4b10c87d2 ("LU-14224 misc: add firewalld service configuration")
Change-Id: Id2887cef2c9b5e5d27fca3f77589775a31ee94b1
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51379
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
10 months agoLU-12511 llite: use mapping_set_error instead of opencoded set_bit 72/51372/4
Michal Hocko [Tue, 20 Jun 2023 13:34:09 +0000 (09:34 -0400)]
LU-12511 llite: use mapping_set_error instead of opencoded set_bit

The mapping_set_error() helper sets the correct AS_ flag for the mapping
so there is no reason to open code it.  Use the helper directly.

[akpm@linux-foundation.org: be honest about conversion from -ENXIO to -EIO]
Link: http://lkml.kernel.org/r/20160912111608.2588-2-mhocko@kernel.org
Linux-commit: 5114a97a8bce7f4ead29a32b67dee85438699b9e

Change-Id: I153bc04d4745a20013820ba81572cadb37ab8f39
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51372
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16847 ldiskfs: refactor t10 code. 45/51145/6
Alexey Lyashkov [Thu, 25 May 2023 07:47:38 +0000 (10:47 +0300)]
LU-16847 ldiskfs: refactor t10 code.

use a t10 private structure at each time to make testing easy.
use a slab to allocate to do it fast and memory leak will found early,
let move a t10 code into own file.

HPe-bug-id: LUS-11645
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Iaf1cb9fd63db2af866003f4b1f81a5e2c3b8f540
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51145
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16847 ldiskfs: do not copy ldiskfs_chunk_trans_blocks 38/51138/7
Alexey Lyashkov [Thu, 25 May 2023 07:45:29 +0000 (10:45 +0300)]
LU-16847 ldiskfs: do not copy ldiskfs_chunk_trans_blocks

Do not make a copy of ldiskfs_chunk_trans_blocks() function.
Instead, export existing function from the ldiskfs module.

HPe-bug-id: LUS-11645
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ic5d4a8e82b0284241b0e8e2a167271a6dc6fc297
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51138
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-16804 tests: rename 'complete' to 'complete_test' 83/51383/6
Andreas Dilger [Tue, 20 Jun 2023 18:57:56 +0000 (12:57 -0600)]
LU-16804 tests: rename 'complete' to 'complete_test'

The test-framework.sh "complete" function conflicts with "complete"
exported from bash_completion, and this causes lustre-initialization
to fail in some configurations now that the lustre test config
is loaded earlier during test-framework.sh init_test_env() setup.

Rename "complete" to "complete_test" to avoid this conflict.

Test-Parameters: trivial
Fixes: fdbb2bc849 ("LU-16804 tests: load CONFIG at beginning of init_test_env")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic72d8d5cc4a65feec6bfb2a76ac5f9b9d78e3f75
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51383
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-8191 ptlrpc: convert functions to static 53/51353/2
Timothy Day [Mon, 19 Jun 2023 03:44:02 +0000 (03:44 +0000)]
LU-8191 ptlrpc: convert functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in ptlrpc static.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If0d92f7f4e625c146833f360806ae80b8914cc20
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51353
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16905 tests: Fix 'timeout' value under sanity-quota/18 37/51337/5
Arshad Hussain [Thu, 15 Jun 2023 22:42:09 +0000 (04:12 +0530)]
LU-16905 tests: Fix 'timeout' value under sanity-quota/18

Presently timeout is picked up via sysctl command
under sanity-quota/18. Which is picking up timeout
from incorrect location. This patch fixes the issue
by reading via $LCTL directly.

sysctl output:
timeout=$(sysctl -n lustre.timeout)
sysctl: cannot stat /proc/sys/lustre/timeout: No such file or directory

Test-Parameters: trivial fstype=zfs testlist=sanity-quota env=ONLY=18
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I4604d31c615fa65bf79598cc09f05d9c7c7abf1b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51337
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
10 months agoLU-16894 lod: fix stripe_count limit in lod_qos_set_pool 14/51314/5
Etienne AUJAMES [Wed, 14 Jun 2023 08:02:01 +0000 (10:02 +0200)]
LU-16894 lod: fix stripe_count limit in lod_qos_set_pool

For a conflicting pool name and stripe offset parameter, the MDS
should not set the pool on the file layout in favor of the stripe
offset (LU-15658). But lod_qos_set_pool() still limit the
stripe_count to the number of OSTs in the specified pool.

Let's fix this.

Fixes: 06dd5a4 ("LU-15658 lod: ost list and pool name conflict")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: Ic47f9aadd8feea01367e526aaf0ea41a69ade9fa
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16778 tests: sanity-quota_75 fix 58/51158/4
Sergey Cheremencev [Tue, 30 May 2023 08:14:47 +0000 (11:14 +0300)]
LU-16778 tests: sanity-quota_75 fix

Change conf=fsync to oflag=direct to avoid
cache write.

Test-Parameters: trivial testlist=sanity-quota env=ONLY=75,ONLY_REPEAT=100
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iff04bac63f772dc2d0d0ad765d210b2539fbe33e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51158
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16836 lnet: ensure dev notification on lnd startup 57/51057/6
Serguei Smirnov [Fri, 19 May 2023 02:12:19 +0000 (19:12 -0700)]
LU-16836 lnet: ensure dev notification on lnd startup

Look up device and link state on lnd startup so that
the initial NI state may be set properly.

Reduce code duplication by adding lnet_set_link_fatal_state() and
lnet_get_link_status() functions which are shared across LNDs.
LND-specific versions of these are removed.

This fixes the issue with adding LNet NI using an interface with
cable unplugged which results in the NI state initialized as "up".

Fixes: da230373bd ("LU-16563 lnet: use discovered ni status")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I16084092cc21a4e42dfef4624adfbf57eb4fdecb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16709 lnet: fix locking multiple NIDs of the MR peer 30/50530/15
Serguei Smirnov [Tue, 4 Apr 2023 21:02:51 +0000 (14:02 -0700)]
LU-16709 lnet: fix locking multiple NIDs of the MR peer

If Lustre identifies the same peer with multiple NIDs,
as a result of peer discovery it is possible that
the discovered peer is found to contain a NID which is locked
as primary by a different existing peer record.
In this case it is safe to merge the peer records,
but the NID which got locked the earliest should be
kept as primary.

This allows for the first of the two locked NIDs
to stay primary as intended for the purpose of communicating
with Lustre even if peer discovery succeeded
using a different NID of MR peer.

Fixes: aacb16191a ("LU-14668 lnet: Lock primary NID logic")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iec9f8b70053fe24cddee552358500dfad0234b7f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50530
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16518 misc: fix clang build errors 32/50332/4
Timothy Day [Wed, 14 Jun 2023 02:19:41 +0000 (02:19 +0000)]
LU-16518 misc: fix clang build errors

Fix several format security errors by explicitly giving
the format to the affected functions.

A write test in badareaio attempts to write more than
2,147,479,552 bytes. The write will never write that
much, so reduce the size of the write to make the test
useful.

Explicitly cast ll_nfs_get_name_filldir as a filldir_t
and NR_WRITEBACK as a zone_stat_item. This silences
some implicit cast errors. These casts can likely be
removed when older kernel support is dropped.

Refactor some code to avoid strncat, which was being
used incorrectly anyway.

Adjust some variables to use more appropriate types.

Inline some functions which are only sometimes used.

Remove a LASSERTF that will never trigger, since u32
is always smaller than IDIF_MAX_OID.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I3962611de7d012e544636248353c072c9f9c9830
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50332
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-12031 mdt: explicit data version of DoM files 39/47139/7
Mikhail Pershin [Mon, 25 Apr 2022 06:13:53 +0000 (09:13 +0300)]
LU-12031 mdt: explicit data version of DoM files

Use EA to store 'data_version' for DoM files explicitly.

Unlike OST objects the 'inode_version' of DoM file is changed
by metadata operations as well and that leads to problems
during HSM operations, e.g. writing HSM EA with file data
version inside causes DoM object version update making this
HSM EA version obsoleted, also any metadata update on
restored file makes it dirty and prevents second release.

DoM files have now explicitly updated 'data_version' in
addition to ordinary 'inode_version'. The 'data_version'
is updated along with 'inode_version' upon write/truncate and
fallocate operations and is stored as 'trusted.dataver' EA.
Layout swap procedure is updated to move data version between
files being swept along with HSM attributes.
If DoM file is migrated to RAID0 file then 'dataver' EA is
deleted.

Corresponding test 1f is added to sanity-hsm.sh and
207j to sanity.sh.

Test-Parameters: clientversion=2.12.4 testlist=sanity-hsm
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I4689c56394c7323d32cd6f7dd86f58beb6e53353
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47139
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
10 months agoLU-11912 tests: consume precreated objects in parallel 92/51292/4
Andreas Dilger [Tue, 13 Jun 2023 07:02:22 +0000 (01:02 -0600)]
LU-11912 tests: consume precreated objects in parallel

Run the force_new_seq_all() file creations to run in parallel, since
this can take a significant amount of time when there are multiple
MDTs and OSTs (up to 1000s for 4x MDTs and 8x OSTs).

Test-Parameters: trivial testlist=replay-dual mdscount=2 mdtcount=4
Test-Parameters: testlist=replay-ost-single mdscount=2 mdtcount=4
Test-Parameters: testlist=replay-single mdscount=2 mdtcount=4
Test-Parameters: testlist=sanity-pfl env=ONLY=0+1+16+27 mdscount=2 mdtcount=4
Fixes: 2fdb1f8d01b9f ("LU-11912 tests: SEQ rollover fixes")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I849370586fe320d1f7df069f0b83980449658d97
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51292
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
10 months agoLU-15934 tests: add a test case for update llog 08/51208/6
Yang Sheng [Sat, 3 Jun 2023 18:47:30 +0000 (02:47 +0800)]
LU-15934 tests: add a test case for update llog

A test case to simulate the update llog corruption
situation. It replaces the catalog file with a
random data. The recovery of mdt will be blocked
if without the fixing patch.

Fixes: 814691bcff ("LU-15934 lod: renew the update llog")
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I0ade8d0ff33ddc06b622e5e67cf4b4775dfff129
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51208
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-13805 osc: Add debug 88/49988/22
Patrick Farrell [Tue, 14 Feb 2023 18:24:15 +0000 (13:24 -0500)]
LU-13805 osc: Add debug

This adds some minor additional debug for unaligned IO.

The purpose here is just to shorten the length of the
main patch by pulling out supporting bits.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia1e749788b66f1d6d3440eb99ff3e86c893fd3f3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
10 months agoLU-13805 clio: Trivial DIO cleanups 87/49987/22
Patrick Farrell [Sat, 11 Feb 2023 18:12:14 +0000 (13:12 -0500)]
LU-13805 clio: Trivial DIO cleanups

This is some minor DIO refactoring and an additional debug
message discovered while working on this.  Extremely minor.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ica2a7340ac02d3ba0d8f65662ff4b39026078e81
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
10 months agoLU-13805 osc: Don't include lock for srvlock 67/50067/16
Patrick Farrell [Mon, 20 Feb 2023 05:19:40 +0000 (00:19 -0500)]
LU-13805 osc: Don't include lock for srvlock

When doing server side locking, it doesn't make sense to do
the 'search for covering lock and send it to the server'
step when building an RPC, because we will not use that
lock.

This can disguise issues on the client, because prolonging
a lock is supposed to let a client avoid eviction if it is
still doing IO under the lock, but we are not.  This can
result in delaying an eviction which should be occurring
because the client can't give the lock back.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6957925bf2d8b7be2340469337906a94a758953d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50067
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
10 months agoLU-15404 ldiskfs: fix truncate during setxattr for el7.9 35/51335/3
Andreas Dilger [Thu, 15 Jun 2023 19:32:05 +0000 (13:32 -0600)]
LU-15404 ldiskfs: fix truncate during setxattr for el7.9

Backport the ext4-delayed-iput.patch to rhel7.9 kernels so the
delayed osd-ldiskfs truncate can use s_misc_wq consistently.

This moves the call to the final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Test-Parameters: trivial
Fixes: e239a14001 ("LU-15404 ldiskfs: truncate during setxattr leads to kernel panic")
Change-Id: Idd70befa6a83818ece06daccf9bb6256813ebbe5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51335
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16399 tests: add subtests setup/cleanup records 82/49582/45
Alex Deiter [Mon, 9 Jan 2023 16:19:26 +0000 (20:19 +0400)]
LU-16399 tests: add subtests setup/cleanup records

* Added setup/cleanup records for subtests

Change-Id: Icb203a864fa8785e423a073b4ee0f02ea3d3ac77
Signed-off-by: Alex Deiter <adeiter@tintri.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49582
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-12610 cfs: add unlikely to CFS_ macros 91/51291/2
Timothy Day [Tue, 13 Jun 2023 03:33:37 +0000 (03:33 +0000)]
LU-12610 cfs: add unlikely to CFS_ macros

Fix the (hopefully) last few OBD_ users to use
CFS_ macros instead.

Add an 'unlikely()' to CFS_ macros. Some of the
OBD_ macros included this hint. Once those macros
are removed, the hint will be lost. Add it to the
CFS_ macros instead.

The libcfs_fail.h only has a couple style issues
left. Just fix them in this patch.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ie06533b8b408cacf6f6fe2d29a1a8e727ca4280b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16888 gss: fix ptlrpc_gss automatic loading 64/51264/4
Sebastien Buisson [Fri, 9 Jun 2023 12:50:25 +0000 (14:50 +0200)]
LU-16888 gss: fix ptlrpc_gss automatic loading

ptlrpc_gss kernel module is automatically loaded when a GSS security
flavor is enforced. Loading success is recorded in a static variable
in the ptlrpc module, which prevents further reloading in case
ptlrpc_gss is unloaded while keeping ptlrpc loaded.
Get rid of this static variable as it is not required in order to
avoid calling request_module("ptlrpc_gss") when not needed. Indeed,
once loaded, the static array policies[] has an entry at the
SPTLRPC_POLICY_GSS index, indicating that the ptlrpc_gss module is
loaded.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9bb100a202fe9c3fc455a2ffba6ee6398e19b9bf
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16887 scrub: delete OI when inode missing 63/51263/3
Alexander Boyko [Thu, 1 Jun 2023 14:19:40 +0000 (10:19 -0400)]
LU-16887 scrub: delete OI when inode missing

osd_iget_check() function have no ability to check
OI when osd_iget() returns error, because inode is
lost during error. Let's return old logic.

Scrub doesn't check consistency between OI and inode
for items from inconsistent list. When OI points to
worng inode, OI record should be deleted.

Fixes: 716de353b ("LU-15542 osd-ldiskfs: exclude EA inode from processing")
HPE-bug-id: LUS-11540, LUS-11585
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic1618db1c8ee24bb307a9cf3f5ca98441a739b7f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51263
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16807 llite: make lsm_sem nested for ll_update_dir_depth 92/51192/4
James Simmons [Fri, 9 Jun 2023 12:09:45 +0000 (08:09 -0400)]
LU-16807 llite: make lsm_sem nested for ll_update_dir_depth

Lockdep is reporting:

chmod/16751 is trying to acquire lock:
 (&lli->lli_lsm_sem){++++}-{3:3}, at: ll_update_dir_depth+0x8b/0x280

but task is already holding lock:
 (&lli->lli_lsm_sem){++++}-{3:3}, at: ll_update_dir_depth+0x7b/0x280

 other info that might help us debug this:
 Possible unsafe locking scenario:

        CPU0
        ----
 lock(&lli->lli_lsm_sem);
 lock(&lli->lli_lsm_sem);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

Lockdep sees acquiring more than one lock of the same lock class
as a potential dead lock. The execption is if the locks are used
for objects that belong to a hierarchy. For our case of the
lsm_sem we do have a hierarchy since a lsm for a child directory
is related to the parent directory lsm that is being protected
as well. Create new lock classes for the lsm_sem with proper
ordering.

Change-Id: I06c890d16816b83492cbeadabde3515ee0233424
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51192
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-15458 tests: double delay for sync/async test 83/51183/2
Patrick Farrell [Wed, 31 May 2023 16:28:14 +0000 (12:28 -0400)]
LU-15458 tests: double delay for sync/async test

This test verifies that an async request takes less than
half of the 'delay' time, but occasionally the async
request hits a delay on the file open due to some VM issue
on the MDT.  This causes a spurious failure.

This is already pretty rare, so the simplest thing is to
double the delay time, which should make it extremely rare.

Setting trivial so it just runs sanity.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3046303fcc4e10364de9f673fab2142c8cecff64
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51183
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16751 conf: remove outdated config files 50/51150/4
Timothy Day [Fri, 26 May 2023 13:33:17 +0000 (13:33 +0000)]
LU-16751 conf: remove outdated config files

Remove the example modules.conf. This config file isn't really
specific to Lustre, and it very outdated regardless.

The remaining conf files (lustre.dtd, lustre2ldif.xsl, top.ldif)
were added ~15 years ago. They don't appear to account for the
changes in Lustre in the interim. It would be better to remove
them.

Also, update .gitignore.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia96eaac0a57c4b65c98d4b7931c96b344d7abd56
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16694 tests: remove deprecated sanity/62 97/51097/2
Timothy Day [Tue, 23 May 2023 14:11:24 +0000 (14:11 +0000)]
LU-16694 tests: remove deprecated sanity/62

This test was disabled many years ago. It doesn't seem useful
enough to reenable it. Hence, remove it.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1a27a1dc271f76358136efb81e793c81e971f037
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16574 udsp: lnetctl udsp improvements 87/51087/2
Chris Horn [Tue, 9 May 2023 15:58:15 +0000 (09:58 -0600)]
LU-16574 udsp: lnetctl udsp improvements

lnet_udsp_del_policy() did not previously return non-zero, but its
single caller would check for a non-zero and call
lnet_udsp_apply_policies(). This code is removed.
lnet_udsp_del_policy() will now return non-zero but only in the case
where there is no matching policy index. In this case the policies
are not modified and thus we needn't re-apply them.

Modify some error checking for lnetctl udsp commands to provide better
error messages.

Correct typos in lustre_lnet_add_udsp() error messages.

Correct lustre_lnet_del_udsp()'s handling of errno.

Update help text of udsp commands.

Use parse_long() in jt_show_udsp() to parse the idx argument.

Sanity check priority and idx arguments for udsp add/del rather than
silently modifying them when the user passes in bad values.

Implement 'lnetctl udsp del --all' to provide easy way for admin to
delete all configured policies (this is equivalent to
'lnetctl udsp del --idx -1', but it is more user friendly). The
udsp del command now requires either --all or --idx be specified.

Test-Parameters: trivial
HPE-bug-id: LUS-11490
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie5e91d8ac1c810473768566593e993e47070e14c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16808 lfs: return invalid=-1 if no projid retrieved 16/50916/8
Bobi Jam [Wed, 10 May 2023 15:15:24 +0000 (23:15 +0800)]
LU-16808 lfs: return invalid=-1 if no projid retrieved

lfs find --printf "%LP" returns -1 as invalid projid if a
special file is found while it does not contain projid.

Fix bug where "lfs find --printf %LP" always printed "0".

Fixes: 6b8e97b76c ("LU-10378 utils: add formatted printf to lfs find")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ic1b474145212e0ce091f97281440816d9a437a4f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50916
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-16808 lfs: lfs find --printf won't hang on special files 05/50905/6
Bobi Jam [Wed, 10 May 2023 05:11:47 +0000 (13:11 +0800)]
LU-16808 lfs: lfs find --printf won't hang on special files

Add O_NONBLOCK flag on opening special files lest it hang.

Since '--printf' requires gather_all, and in check_projid case and
gather_all case, it open the special file and try to get project id
from it, we shouldn't error out on only gather_all case.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ic2afff7bbccf73ff94c64c62799f9b37b18d10a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50905
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
10 months agoLU-3080 misc: remove checks for pre-3.10 kernels 14/50814/3
Andreas Dilger [Mon, 1 May 2023 03:05:15 +0000 (21:05 -0600)]
LU-3080 misc: remove checks for pre-3.10 kernels

The oldest supported client kernel is el7.9 3.10.x, and even that
is old, so no need to keep checks for 2.6.x kernel versions around.

Test-Parameters: trivial testlist=replay-dual
Test-Parameters: testlist=replay-single
Test-Parameters: testlist=sanity-flr env=ONLY=50d
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0db781c048334e6ef6df102b100d29e13c64fd25
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50814
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16733 tests: wait for recovery done 83/50683/5
Lei Feng [Wed, 19 Apr 2023 03:37:09 +0000 (11:37 +0800)]
LU-16733 tests: wait for recovery done

wait for recovery done in recovery-small.sh/test_153.

Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I163b15e1c38f28eea7051c1886d94a79428039ef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50683
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16747 llapi: fix race in get_root_path_slow() 82/50682/6
Etienne AUJAMES [Tue, 18 Apr 2023 13:34:23 +0000 (15:34 +0200)]
LU-16747 llapi: fix race in get_root_path_slow()

The patch bdf7788d ("LU-8585 llapi: use open_by_handle_at in
llapi_open_by_fid") caches the Lustre root fd to avoid re-openning
it each time an ioctl() is needed on the fs.

For now, only 1 entry is stored. If a llapi call is performed on
another mountpoint, llapi needs to close the old root fd and open a
new one.

A race condition exists at startup, when root_cached.fd is not
initialized yet. Several threads try to determine root information at
the same time (in get_root_path_slow()). Those threads will close(),
open() and update different "root_cached.fd".
The usage of a closed root fd will return EBADFD (e.g: in
llapi_open_by_fid(), llapi_hsm_request() or llapi_fid2path()).

This patch checks if the fs is the same before updating the root
entry. If so, the root entry (and cached root fd) will not be changed.

Add the regresion test sanityn 85 (llapi_root_test).

Test-Parameters: trivial testlist=sanityn env=ONLY=85,ONLY_REPEAT=20
Test-Parameters: testlist=sanityn
Test-Parameters: testlist=sanity
Fixes: bdf7788d ("LU-8585 llapi: use open_by_handle_at in llapi_open_by_fid")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I681aac7d5715022e700cdb092db94deaa6bf6a8f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50682
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Guillaume Courrier <guillaume.courrier@cea.fr>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-16168 tests: fix get_slave_nr in sanity-quota 59/50659/4
Sergey Cheremencev [Mon, 17 Apr 2023 14:44:22 +0000 (17:44 +0300)]
LU-16168 tests: fix get_slave_nr in sanity-quota

Redirect output of wait_update_facet in get_slave_nr
to /dev/null/. Otherwise if wait_update_cond prints
something while waiting, get_slave_nr returns this
output together with slave number causing test_68
to fail.

Test-Parameters: trivial testlist=sanity-quota
Test-Parameters: testlist=sanity-quota fstype=zfs
Fixes: 83dd308db5 ("LU-15460 test: wait quota pool to be prepared")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I813ed31db864897372c7eb6aab4e1f5a4b955f49
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50659
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-15873 osp: don't sync on RO device 97/48097/9
Wang Shilong [Thu, 3 Jun 2021 02:39:35 +0000 (10:39 +0800)]
LU-15873 osp: don't sync on RO device

If lower layer device is mounted as RO, call syncing
will trigger warning of transaction start.

Change-Id: I31be379932f145093285cd87de1fc35a1b5cc305
Signed-off-by: Wang Shilong <wshilong@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48097
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
10 months agoLU-16837 csdc: reserve connect bits for compressed layout 08/51108/5
Bobi Jam [Wed, 24 May 2023 00:25:05 +0000 (08:25 +0800)]
LU-16837 csdc: reserve connect bits for compressed layout

Add connect data bit for compressed layout (OBD_CONNECT2_COMPRESS)
and another connect data bit to be used (OBD_CONNECT2_LARGE_NID).

Also reserve obd_connect_data::ocd_compr_type which is a bitmask of
supported compression type to be negotiated between client and MDS.

Test-Parameters: trivial
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I21029c6c3e8a7e690ecc8d489bbb95aec3ab1fa8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51108
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
10 months agoLU-6142 utils: use list_first/list_entry() on list heads 30/50830/7
Mr NeilBrown [Wed, 6 Nov 2019 22:49:03 +0000 (09:49 +1100)]
LU-6142 utils: use list_first/list_entry() on list heads

This patch changes
   list_entry(foo.next, ...)
to
   list_first_entry(&foo, ...)
and
   list_entry(foo.prev, ...)
to
   list_last_entry(&foo, ...)

in cases where 'foo' is a list head - not a list member.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9daaaed044af596f6407801259cfb672150bfc34
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-6142 ldlm: Fix style issues for dir.c 24/50724/3
Arshad Hussain [Mon, 24 Apr 2023 08:32:45 +0000 (14:02 +0530)]
LU-6142 ldlm: Fix style issues for dir.c

This patch fixes issues reported by checkpatch
for file lustre/llite/dir.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7aa79bbf20271e3a86735260230599df32a50cad
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50724
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15934 lod: clear up the message 28/49528/4
Yang Sheng [Thu, 29 Dec 2022 17:46:56 +0000 (01:46 +0800)]
LU-15934 lod: clear up the message

Print out the precise info while llog context error.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I492201cd3ae5eb39ad34f3a873d7bb346b52430f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49528
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 months agoLU-15671 mds: do not send OST_CREATE transno interop 56/51056/10
Andreas Dilger [Thu, 18 May 2023 21:41:47 +0000 (15:41 -0600)]
LU-15671 mds: do not send OST_CREATE transno interop

Send OST_CREATE RPCs from the MDS with no_resend and no_delay
when communicating with an old OST that does not support the
OBD_CONNECT2_REPLAY_RESEND.  Likewise, the OST should not reply
to the MDS RPC with rq_transno set, or this will trigger:

   osp_precreate_send() ASSERTION(req->rq_transno == 0) failed

This can be avoided if the MDS is upgraded before the OSS, but
will always be hit if OSS is upgraded first.

After 2.20.53 the MDS/OSS assume that this is always true, since
rolling upgrades are unsupported for larger version differences.

Test-Parameters: testgroup=rolling-upgrade-oss
Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I1ab601a2f55540dd75cf24838f7cdb7f823ed42c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51056
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-9798 tests: split server recovery tests 61/43461/14
Alex Deiter [Fri, 14 Apr 2023 18:43:28 +0000 (22:43 +0400)]
LU-9798 tests: split server recovery tests

The Lustre test suite recovery-mds-scale has two tests; one
that fails over MDSs and a second test that fails over OSSs.
The default time for each of these tests to run is 24 hours.

Break up the two server recovery tests into separate test
suites so that they can be run independent of each other and
in parallel.

Collect commmon functions and global variables into a libary
used by all the recovery-*scale tests.

Test-Parameters: trivial env=SLOW=no,FAILURE_MODE=HARD \
iscsi=1 clientcount=4 osscount=2 ostcount=7 mdscount=2 \
mdtcount=2 austeroptions=-R failover=true \
testlist=recovery-mds-scale,recovery-oss-scale,recovery-random-scale

Signed-off-by: Alex Deiter <adeiter@tintri.com>
Change-Id: I4eda5265742270821fa10ab24f0e0637c00059d7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43461
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16420 tests: move overstriping test to sanity-pfl 68/51268/2
Andreas Dilger [Sat, 10 Jun 2023 13:24:25 +0000 (07:24 -0600)]
LU-16420 tests: move overstriping test to sanity-pfl

The overstriping test does not really need to be run for
every test run, so sanity-pfl is good enough.  Also try
to see if moving this subtest out of sanity will stop
the test_51d failures seen since the test was landed.

Fixes: 97d29eb800 ("LU-13748 mdt: remove LASSERT in mdt_dump_lmm")
Test-Parameters: trivial testlist=sanity-pfl
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Test-Parameters: testlist=sanity env=ONLY=0-51,HONOR_EXCEPT=true mdscount=2 mdtcount=4 ostcount=8
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I71ebc76fdcceb2b22994c09c3574cda4602540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51268
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16882 test: change ALWAYS_EXCEPT with function always_except 42/51242/4
Wei Liu [Tue, 6 Jun 2023 17:48:20 +0000 (10:48 -0700)]
LU-16882 test: change ALWAYS_EXCEPT with function always_except

Change ALWAYS_EXCEPT with function always_except in replay-single.sh

Test-Parameters: trivial testlist=replay-single

Change-Id: I6fb4c350b1aa6f8c38920bf52e8ae42b64bdcd64
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51242
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
11 months agoLU-16873 osd: update OI_Scrub file with new magic 26/51226/2
Alexander Zarochentsev [Sun, 28 May 2023 12:42:27 +0000 (08:42 -0400)]
LU-16873 osd: update OI_Scrub file with new magic

The fix for LUS-11542 detects the format change correctly
but does not write new oi scrub file magic, so new mount
triggers the "oi files counter reset" again and again.

Fixes: 126275ba83 ("LU-16655 scrub: upgrade scrub_file from 2.12 format")
HPE-bug-id: LUS-11646
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ia13fcfaf0d8f2c4ee9331dd9fec0ff159d195186
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16848 gss: drop legacy 'lgssd' and pipefs code 32/51032/10
Aurelien Degremont [Wed, 3 May 2023 14:48:49 +0000 (16:48 +0200)]
LU-16848 gss: drop legacy 'lgssd' and pipefs code

Get rid of the old 'lgssd' code which is unused
for a while, and not even buildable.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-sec
Test-Parameters: kerberos=true testlist=sanity-krb5

Change-Id: I821cc5b625e97c7f5fdc758b166bd54e5e4b379e
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16828 sanity: wrong argument for find in test_133g 94/50994/2
Li Xi [Mon, 15 May 2023 04:11:58 +0000 (12:11 +0800)]
LU-16828 sanity: wrong argument for find in test_133g

> proc_regexp="/{proc,sys}/{fs,sys,kernel/debug}/{lustre,lnet}/"
> proc_dirs=$(eval ls -d $proc_regexp)

After running the upper commands, 'echo "$proc_dirs"' woud print
multiple lines like what sanity test 401a would print:

proc_dirs='/proc/fs/lustre/
/sys/fs/lustre/
/sys/kernel/debug/lnet/
/sys/kernel/debug/lustre/'

This will cause wrong command exectuted remotely in test_133g without
"-name" argument and badarea_io pipe after the command:

... |
xargs -n 1 find /proc/fs/lustre/
/sys/fs/lustre/
/sys/kernel/debug/lnet/
/sys/kernel/debug/lustre/ -name |
xargs -n 1 badarea_io

This patch fixes the problem.

Test-Parameters: trivial testlist=sanity env=ONLY=133g
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I93d611c34eecfa6d14d3f83a60faa3e637514128
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50994
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16772 quota: protect lqe_glbl_data in qmt_site_recalc_cb 48/50748/10
Sergey Cheremencev [Tue, 25 Apr 2023 18:10:21 +0000 (22:10 +0400)]
LU-16772 quota: protect lqe_glbl_data in qmt_site_recalc_cb

lqe_glbl_data should be protected with lqe_glbl_data_lock in
qmt_site_recalc_sb like it did in other places. Otherwise it
may cause following panic:

  BUG: unable to handle kernel NULL pointer at 00000000000000f8
  qmt_site_recalc_cb+0x2f8/0x790 [lquota]
  cfs_hash_for_each_tight+0x121/0x310 [libcfs]
  qmt_pool_recalc+0x372/0x9f0 [lquota]

Also protect lqe_glbl_data access with lqe_glbl_data_lock in
qmt_lvbo_free().

Fixes: 1dbcbd70f8 ("LU-15021 quota: protect lqe_glbl_data in lqe")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I030f14b02062151f1708a03ac7414a9991f798f6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50748
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16850 socklnd: remove ksnr_myiface from ksock_conn_cb 48/51148/8
Serguei Smirnov [Fri, 26 May 2023 17:42:23 +0000 (10:42 -0700)]
LU-16850 socklnd: remove ksnr_myiface from ksock_conn_cb

Drop ksnr_myiface: it is no longer needed since socklnd
TCP bonding got removed. There's one interface per
connection cb per peer_ni, and it can be accessed as
net->ksnn_interface.ksni_index.

Fix setting of ksni_nroutes accordingly. Duplication of
interface index in conn_cb and ksnn_interface was causing
the assertion
ASSERTION( net->ksnn_interface.ksni_nroutes == 0 )
in ksocknal_shutdown() to fail if the corresponding
device is deregistered before lnd shutdown.

Modify test_214 of sanity-lnet to create connections so that
the scenario of socklnd shutdown with NI on a deregistered
interface is covered.

Fixes: 3c9282a6 ("LU-16378 lnet: handles unregister/register events)
Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I4de164c9e64aa770164a8320b9460fadce49aa06
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51148
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16623 lod: handle object allocation consistently 50/50250/13
Andreas Dilger [Wed, 8 Mar 2023 23:40:21 +0000 (16:40 -0700)]
LU-16623 lod: handle object allocation consistently

Consistently handle the various OS_STATFS_* flags that indicate
an OST or MDT is full or otherwise marked ineligible for use.

Fix lod_statfs_check() so it skips MDTs with OS_STATFS_ENOINO
for allocating dir stripes instead of only checking OST targets.

In the LOD code, ltd_active=0 indicates that the device is not
usable for new object allocations for a variety of reasons. That
includes out of space or inodes, read-only, max_create_count=0,
or disconnected export, not *only* that the OSP is disconnected
from the OST as with imp_deactive.  Targets marked ltd_active=0
will not be counted in ld_active_tgt_count, so these OSTs will
not count toward stripe_count for stripe_count=-1 files.

Set flags = LOD_USES_DEFAULT_STRIPE in lod_qos_prep_create() for
stripe_count = -1 layouts and pass it to lod_stripe_count_min()
to avoid use of *all* OSTs when free space is imbalanced or OSTs
are not available, and be happy with allocations on 3/4 of OSTs.
It looks like this functionality was missed when object allocations
transitioned from the LOV to LOD module.  Put the LOV_USES_* into
an enum and rename to LOD_USES_* for consistency with current code.

Apply the lod.*.max_stripe_count limits to PFL components as well
as plain file layouts in lod_comp_entry_stripe_count().

Rename ltd_connecting to ltd_discon, since there is no guarantee
that this target is actually *connecting*, only that it is currently
disconnected.  Use ltd_discon in places that checked ltd_active to
decide if the OSP was disconnected from the OST, which shouldn't be
skipped just because the OST is full or has creates disabled.

Fixes: 7b124fef76 ("LU-4277 lod: handle os_state as a flag, check READONLY")
Fixes: 5b147e47de ("LU-11115 lod: skip max_create_count=0 OST in QoS and RR algorithms")
Fixes: c7f2e70a27 ("LU-1303 lod: QoS allocation policy")
Fixes: c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Fixes: 3c9580931d ("LU-9162 lod: option to set max stripe count per filesystem")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Ifb9443fe6c80b4d7f82b442060db7ac8423ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50250
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
11 months agoNew tag 2.15.56 2.15.56 v2_15_56
Oleg Drokin [Fri, 9 Jun 2023 05:30:28 +0000 (01:30 -0400)]
New tag 2.15.56

Change-Id: Ic2aee8234566ccc5b16e8c4fb1039365940965b8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-12610 osc: remove OBD_ -> CFS_ macros 24/51124/2
Timothy Day [Wed, 24 May 2023 16:29:40 +0000 (16:29 +0000)]
LU-12610 osc: remove OBD_ -> CFS_ macros

Remove OBD macros that are simply redefinitions
of CFS macros.

Also, convert some spaces to tabs.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: Icb4f25f51515d833fed2c05581288cde719c1d08
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51124
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-12610 target: remove OBD_ -> CFS_ macros 23/51123/2
Timothy Day [Wed, 24 May 2023 16:30:43 +0000 (16:30 +0000)]
LU-12610 target: remove OBD_ -> CFS_ macros

Remove OBD macros that are simply redefinitions
of CFS macros.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I97e3f74d72d41558f293567b4609fa37aaa3b13d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51123
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-16830 lod: improve rr allocation 96/50996/3
Alexander Boyko [Fri, 12 May 2023 21:32:20 +0000 (17:32 -0400)]
LU-16830 lod: improve rr allocation

Roundrobin allocation uses atomic_inc() % ost_count for
generation OST index. When some OSTs are unavailable and
many threads make object creation, it could happen that
OST idx is the same for all attempts. For example with
4 OSTs configuration when 2 OSTs do faiover, estimation
of probability is 0.5^12=0.024%. The result is ENOSPC for
user application.

Let's try one by one OSTs for a last speed loop.

HPE-bug-id: LUS-11265
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I325cf4ad706c9b0df64cf53792e77c1fad6f7739
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50996
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
11 months agoLU-16799 tests: fix sanity-krb5 64/50864/6
Sebastien Buisson [Thu, 4 May 2023 23:10:50 +0000 (01:10 +0200)]
LU-16799 tests: fix sanity-krb5

sanity-krb5.sh needs to be fixed in several ways.
It cannot assume that the Kerberos credentials cache is FILE, and
expect ccache files to be under /tmp/krb5cc_*.
The lsvcgssd daemon must be launched with -vvv flags for easier
debugging.
Keyring needs to be cleared appropriately after using 'lfs flushctx'.

Test-Parameters: trivial testlist=sanity-krb5 kerberos=true
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I31ca8d2d97e137c7ba9fa478d5432aeedb5135a8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50864
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-9859 libcfs: move percpt_lock into lnet 32/50832/5
Mr NeilBrown [Mon, 23 Nov 2020 04:41:06 +0000 (15:41 +1100)]
LU-9859 libcfs: move percpt_lock into lnet

lnet is the only users of percpt_lock - and there are only two such
locks!
So move the code into lnet, as part of deprecating libcfs.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id7091e88cf61228aa031921747fb9c7b08214931
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16788 tests: sanity should remove temp files 19/50819/9
Alex Zhuravlev [Mon, 1 May 2023 15:05:35 +0000 (18:05 +0300)]
LU-16788 tests: sanity should remove temp files

during the test to fit OSTSIZE

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2f1cfe0511061794d81d0349cf36a50f40470553
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50819
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-12610 misc: remove OBD_ -> CFS_ macros 09/50809/4
Timothy Day [Wed, 24 May 2023 16:31:12 +0000 (16:31 +0000)]
LU-12610 misc: remove OBD_ -> CFS_ macros

Remove OBD macros that are simply redefinitions
of CFS macros.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I15fe8aa22cb0203bed102a35361f4854ddaabecb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50809
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-12610 osd: remove OBD_ -> CFS_ macros 05/50805/3
Timothy Day [Wed, 19 Apr 2023 01:42:01 +0000 (01:42 +0000)]
LU-12610 osd: remove OBD_ -> CFS_ macros

Remove OBD macros that are simply redefinitions
of CFS macros.

The CFS macros are provided by libcfs.h, so add this
header where needed.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: Ia7f7ba611b98500ecf06137159649949a621476f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50805
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16694 misc: remove, update old scripts 20/50720/3
Timothy Day [Mon, 24 Apr 2023 01:44:38 +0000 (01:44 +0000)]
LU-16694 misc: remove, update old scripts

There are two old checkstack.pl in-tree. Remove both and pull down
a new one from upstream.

There's only one script in lustre/contrib (lustre_server.sh). It
is meant for ClusterLabs resource-agents. But the script hasn't
been maintained. Hence, remove it.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id52ff11a7fa525b7ef20656df77c66a728e2b77a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50720
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-16763 tests: add more logging to run-llog.sh 19/50719/3
Timothy Day [Mon, 24 Apr 2023 00:56:48 +0000 (00:56 +0000)]
LU-16763 tests: add more logging to run-llog.sh

Add more logging to run-llog.sh. At the same time, add SPDX text
and fix some minor shellcheck warnings.

Test-Parameters: trivial testlist=sanity env=ONLY=60a,ONLY_REPEAT=10
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I317fac7b872be53a1094022cfcd7d130b4c79c0a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50719
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-12610 ptlrpc: replace OBD_ -> CFS_ macros 84/50684/4
Timothy Day [Wed, 19 Apr 2023 01:40:23 +0000 (01:40 +0000)]
LU-12610 ptlrpc: replace OBD_ -> CFS_ macros

Replace OBD macros that are simply redefinitions
of CFS macros.

Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I634f364d33ac56de678d273d87c9ac54d1f8c1ef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50684
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-11388 tests: replay-single/131b to refresh grants 61/50661/7
Alex Zhuravlev [Mon, 17 Apr 2023 18:13:59 +0000 (21:13 +0300)]
LU-11388 tests: replay-single/131b to refresh grants

so that the write (to be replayed after replay-barrier)
doesn't turn sync due to insufficient grant.

Test-Parameters: trivial testlist=replay-single env=ONLY=131b,ONLY_REPEAT=30
Fixes: cb3b2bb683 ("LU-11388 test: enable replay-single test_131b")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If4656c1028b49c58eedd905abd0c329f3706f491
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50661
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-16548 lnet: report actual timeout used by lnd 20/50620/18
Frank Sehr [Wed, 12 Apr 2023 19:31:33 +0000 (12:31 -0700)]
LU-16548 lnet: report actual timeout used by lnd

lnd_timeout value reported by lnetctl may be different
from what is actually used.
There's an lnd_timeout calculated as a function of transaction
timeout and retry_count. This is the value displayed by
"lnetctl global show". However, each LND may define its own
timeout by setting timeout module parameter to a positive value,
which overrides the higher-level lnd_timeout defined by LNet.
lnetctl net show -v will show the timeout value in the
lnd_tunables section.
The timeout for socklnd, o2iblnd and gnilnd is implemented.
A test for sock, ib and gni is included.

Test-Parameters: trivial
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: I85a107ba6f1259c577f74945b89fd695f191d514
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16517 build: pass extra configure options to "make debs" 64/50464/3
Jian Yu [Thu, 30 Mar 2023 17:43:48 +0000 (10:43 -0700)]
LU-16517 build: pass extra configure options to "make debs"

While running "make debs", the configure command in debian/rules
ignores some user defined configure options. This patch fixes
the issue by adding the detection of the extra options into
debian/rules.

Test-Parameters: trivial clientdistro=ubuntu2204

Change-Id: Ia9db4e05abf33834cb3c853f4f0829dadc8d7400
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50464
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16505 tests: run_info() minor fix 71/49771/4
Elena Gryaznova [Wed, 25 Jan 2023 22:43:12 +0000 (23:43 +0100)]
LU-16505 tests: run_info() minor fix

Address the run_info() parameters correctly.

Fixes: a98728e4fd ("LU-15626 tests: Fix "error" reported by shellcheck for recovery-mds-scale")
Test-Parameters: trivial clientcount=5 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=SERVER_FAILOVER_PERIOD=1600,REQFAIL_PERCENT=100,SLOW=yes testlist=recovery-mds-scale
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11449
Change-Id: Ice8f5b0a85d66708942c7665f028fd6de66165a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49771
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
11 months agoLU-15971 llite: implicit default LMV inherit 89/47789/19
Lai Siyao [Sun, 5 Mar 2023 13:43:08 +0000 (08:43 -0500)]
LU-15971 llite: implicit default LMV inherit

With implicit default LMV inherit, the inherited default LMV is
not stored on disk, but maintained on client side.

Benefits:
* change of directory default LMV will be populated to all sub levels
  at runtime.
* default LMV will be packed into mkdir request, therefore MDT doesn't
  need to read it from disk, as will improve mkdir performance.

Caveats:
* to disable inherited default LMV on subdir, a default LMV need to be
  set on this subdir explicitly like this:
        "lfs setdirstripe -D -i <subdir_mdt_index> --max-inherit 0"

Changes on client side:
* update inherited default LMV after lookup/open/revalidate.
* pack default LMV in mkdir request.
* add "--raw" option for "lfs getdirstripe -D" to print default LMV
  stored in inode, if directory doesn't have default LMV, or its
  default LMV is implicitly inherited, nothing will be printed.

Changes on MDT side:
* use the default LMV from client in lod_ah_init() to mkdir.
* don't save inherited default LMV in mkdir.

Add sanityn 114.

Test-Parameters: clientversion=2.14 testlist=sanity mdtcount=4 mdscount=2 env=SANITY_EXCEPT="39l 39r 134b 150b 160a 205a 208 220 230e 230p 300g 807"
Test-Parameters: serverversion=2.14 testlist=sanity mdtcount=4 mdscount=2 env=SANITY_EXCEPT="27Cg 39r 65n 413a 413b 905"
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Iae109a0ef35a273175c70dd0b394e721a5ce0c45
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47789
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-6142 lnet: use list_first_entry() where appropriate. 27/50827/4
Mr NeilBrown [Wed, 6 Nov 2019 22:43:52 +0000 (09:43 +1100)]
LU-6142 lnet: use list_first_entry() where appropriate.

This patch changes
   list_entry(foo.next, ...)
to
   list_first_entry(&foo, ...)

in cases where 'foo' is a list head, rather than a list member.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ied04412bf976d8fb219bb3c14c56879d2cf83ae7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-10391 mdt: change md_perm.mp_nid to large nid 03/50103/7
Mr NeilBrown [Fri, 27 May 2022 00:14:05 +0000 (10:14 +1000)]
LU-10391 mdt: change md_perm.mp_nid to large nid

mp_nid in struct md_perm is now a large nid.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Id7502a5399191a36550162837cce37d3bfc9797e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50103
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16785 build: Cleanup test for IS_ENCRYPTED macro 18/50818/3
Shaun Tancheff [Tue, 2 May 2023 03:33:46 +0000 (22:33 -0500)]
LU-16785 build: Cleanup test for IS_ENCRYPTED macro

Suppress warning when -Wunused-value is enabled.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iec387286fd8499bb13f43c89e671d7d7aa0de9e1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50818
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16807 ksocklnd: ksocklnd_ni_get_eth_intf_speed() must use only rtnl lock 28/51028/8
James Simmons [Fri, 19 May 2023 12:41:11 +0000 (08:41 -0400)]
LU-16807 ksocklnd: ksocklnd_ni_get_eth_intf_speed() must use only rtnl lock

A kernel with debug enable is reporting the following:

include/linux/inetdevice.h:221 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
5 locks held by lctl/42289:
 #0: ffffffffa04263f8 ((libcfs_ioctl_list).rwsem){++++}-{3:3}, at: __blocking_notifier_call_chain+0x44/0xa0
 #1: ffffffffa04fa7b0 (lnet_config_mutex){+.+.}-{3:3}, at: lnet_configure+0x1d/0xc0 [lnet]
 #2: ffffffffa04f06e8 (the_lnet.ln_api_mutex){+.+.}-{3:3}, at: LNetNIInit+0x4c/0x960 [lnet]
 #3: ffffffffa04f0788 (&the_lnet.ln_lnd_mutex){+.+.}-{3:3}, at: lnet_startup_lndnet+0x11e/0xa90 [lnet]
 #4: ffffffff834a0cd0 (rtnl_mutex){+.+.}-{3:3}, at: rtnl_lock+0x1b/0x30

stack backtrace:
CPU: 2 PID: 42289 Comm: lctl Kdump: loaded Tainted: G        W  O     --------- -  - 4.18.0rh8.5-debug #2
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  Call Trace:
  ? dump_stack+0x119/0x18e
  ? lockdep_rcu_suspicious+0x141/0x14f
  ? ksocklnd_ni_get_eth_intf_speed.isra.1+0x308/0x360 [ksocklnd]
  ? mark_held_locks+0x6a/0xc0
  ? ktime_get_with_offset+0x233/0x2b0
  ? trace_hardirqs_on+0x4e/0x220
  ? ksocknal_tunables_setup+0xed/0x200 [ksocklnd]
  ? ksocknal_startup+0x4ff/0x1180 [ksocklnd]

The function __ethtool_get_link_ksettings() has a hard requirement
to have the in_dev device protected by the rtnl mutex. At the
same time we are aquiring in_dev using the rcu lock. The in_dev
needs to be stabilized by the same lock. So use rtnl functions
instead.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Iec795d62eb33002950cc962f29f9b93905b3bb3f
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51028
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Neil Brown <neilb@suse.de>
11 months agoLU-16751 gss: remove old patches for nfs-utils 1.0.* 03/51003/3
Timothy Day [Mon, 15 May 2023 20:34:01 +0000 (20:34 +0000)]
LU-16751 gss: remove old patches for nfs-utils 1.0.*

Remove patches for an old version of nfs-utils. Remove the README
that suggests using them. lustre/utils/gss has an entire fork of
nfs-utils utils/gssd directory already.

Also, sk_utils.c shouldn't be executable.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia47193073d2403125043d51db889d0ded41ea9b7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/51003
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
11 months agoLU-9329 utils: add large xattr support for lustre_rsync.c 97/50997/3
Li Xi [Mon, 15 May 2023 15:12:00 +0000 (23:12 +0800)]
LU-9329 utils: add large xattr support for lustre_c

lustre_rsync.c had the problem of not able to get the desired
xattr buffer size, thus is not able to support large xattr
(> PATH_MAX).

lustre-rsync-test:1A would fail if /tmp file system on the
test client supports large xattr.

The following lustre_rsync log shows that lgetxattr keeps
on failing on large xattr (user.foo):

(trusted.lma,14307984) rc=0x18
lsetxattr(), rc=0, errno=0
(trusted.lov,14307984) rc=0x38
lsetxattr(), rc=0, errno=0
(trusted.link,14307984) rc=0x2f
lsetxattr(), rc=0, errno=0
(trusted.som,14307984) rc=0x18
lsetxattr(), rc=0, errno=0
(user.foo,14307984) rc=0xffffffff
(lustre.lov,14307984) rc=0x38
lsetxattr(), rc=-1, errno=95

Test-Parameters: trivial testlist=lustre-rsync-test
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: I3ff49721b88dd31aa8af76da8932d5004c82ea09
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50997
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16807 libcfs: give the tcd_lock types different classes. 92/50992/3
Mr NeilBrown [Fri, 20 Nov 2020 02:19:49 +0000 (13:19 +1100)]
LU-16807 libcfs: give the tcd_lock types different classes.

There are three different trace contexts:
   process, softirq, irq.
Each has its own lock (tcd_lock) which is locked as appropriate for
that context.

lockdep currently doesn't see that they are different and so deduces
that the different uses might lead to deadlocks.

So use separate calls to spin_lock_init() so that they each get a
separate lock class, and lockdep sees no problem.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icca7706d8e0d8ae8add4c540d2da090b53d7e65c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50992
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16813 utils: move mirror_end initialization 19/50919/3
Timothy Day [Wed, 10 May 2023 07:32:51 +0000 (10:32 +0300)]
LU-16813 utils: move mirror_end initialization

Move initialization for mirror_end variable in
llapi_mirror_resync_many(), otherwise lfs mirror resync
may fail since mirror_end gets reset on each pass of
the loop.

Fixes: b0297a1056 ("LU-16518 lnet: fix clang build errors")
Test-Parameters: testlist=sanity-flr env=ONLY=42,ONLY_REPEAT=100
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I4edc078cd6e30d7a0ad84383b55b63b885a34d4b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50919
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
11 months agoLU-16796 libcfs: Remove reference to LASSERT_ATOMIC_POS 81/50881/5
Arshad Hussain [Thu, 11 May 2023 06:56:14 +0000 (12:26 +0530)]
LU-16796 libcfs: Remove reference to LASSERT_ATOMIC_POS

This patch removes all reference to LASSERT_ATOMIC_POS macro.
Once all the access is removed it would be easier to just
toggle atomic_* API calls with recount_* counts.

Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I2051de3707106532259e51ec3e4c890c65836b1a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Neil Brown <neilb@suse.de>
11 months agoLU-9859 libcfs: move cfs_expr_list_print to nidstrings.c 34/50834/3
Mr NeilBrown [Tue, 24 Nov 2020 03:02:50 +0000 (14:02 +1100)]
LU-9859 libcfs: move cfs_expr_list_print to nidstrings.c

cfs_expr_list_print() is only called from nidstrings.c
so move it there and make it static.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie30aba19fd7935ba424c9212a81e7d0d91ad6f57
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-12610 mdt: remove OBD_ -> CFS_ macros 06/50806/3
Timothy Day [Wed, 19 Apr 2023 01:49:00 +0000 (01:49 +0000)]
LU-12610 mdt: remove OBD_ -> CFS_ macros

Remove OBD macros that are simply redefinitions
of CFS macros.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I89edc38316bb121849b24f881a8bafaf78038aa1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50806
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-16814 utils: Change llapi_root_path_open() signature 09/50909/3
Arshad Hussain [Wed, 10 May 2023 08:48:11 +0000 (14:18 +0530)]
LU-16814 utils: Change llapi_root_path_open() signature

This patch changes llapi_root_path_open() first argument
form char * to const char * as the argument passed is
not modified.

This patch also fixes llapi_rmfid and llapi_root_path_open
man page.

Fixes: 5d93025240 ("LU-16427 lfs: rmfid does not print anything on error")
Reported-by: Thomas Bertschinger <bertschinger@lanl.gov>
Test-Parameters: trivial testlist=sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I81de8cb9280af91af8a2de8dbb51f8e82807220d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Thomas Bertschinger <bertschinger@lanl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-12610 ldlm: replace OBD_ -> CFS_ macros 85/50685/4
Timothy Day [Wed, 19 Apr 2023 01:55:47 +0000 (01:55 +0000)]
LU-12610 ldlm: replace OBD_ -> CFS_ macros

Replace OBD macros that are simply redefinitions
of CFS macros.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Signed-off-by: Ben Evans <beevans@whamcloud.com>
Change-Id: I4d903f286f138152cff22df5cba411d2c9fcb4a8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50685
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-16748 llite: update comment of ll_swap_layouts_close 60/50660/5
Li Xi [Mon, 17 Apr 2023 15:41:39 +0000 (23:41 +0800)]
LU-16748 llite: update comment of ll_swap_layouts_close

mdc_close_layout_swap_pack() does not exist, and the lease lock
handle is release in mdc_close_intent_pack(). This patch updates
the comment.

Also, mdt_close_swap_layouts() does not exist. This patch removes
its declaration.

Test-Parameters: trivial
Signed-off-by: Li Xi <lixi@ddn.com>
Change-Id: Iecd1754f627803e85f54a91d648e87e235106bd7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
11 months agoLU-16745 mdc: md_open_data should keep ref on close_req 56/50656/3
Li Dongyang [Mon, 17 Apr 2023 11:13:03 +0000 (21:13 +1000)]
LU-16745 mdc: md_open_data should keep ref on close_req

md_open_data should keep a ref on mod_close_req,
otherwise the mod_close_req could be freed before
we try to access mod_close_req via md_open_data.

Change-Id: I621f7db389854326db298d99957a0bce43024b6e
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16829 gss: quiet noisy warnings 13/50613/4
Aurelien Degremont [Wed, 12 Apr 2023 09:44:37 +0000 (11:44 +0200)]
LU-16829 gss: quiet noisy warnings

GSS code has plenty of debugging messages as warnings.
Update the code to make them just debugging messages and
nothing more.

Test-Parameters: trivial

Change-Id: I53c471d758b0309ae10bab000211fa0381cabf87
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16723 libcfs: refactor parser to be simpler 83/50583/19
Timothy Day [Sat, 8 Apr 2023 00:44:30 +0000 (00:44 +0000)]
LU-16723 libcfs: refactor parser to be simpler

The parser code has a lot of unused complexity that can be
streamlined. Refactor the parser makes the interface simpler
from a development perspective and can provide a consistent
user experience for all of the lustre utilities.

All functions which are not used outside of the parser have
been made static. Functions which don't appear to be used at
all have been removed.

The file headers have been standardized and the SDX text
added. The function names have been changed to be more standard.

Test-Parameters: testlist=sanity env=ONLY=60a,ONLY_REPEAT=10
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I3354f213de424f51aef94c840083a4cb781d7aae
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-9859 libcfs: use round_up directly 45/50545/4
James Simmons [Wed, 10 May 2023 13:15:34 +0000 (09:15 -0400)]
LU-9859 libcfs: use round_up directly

The macro cfs_size_round() is just round_up(val, 8). Replace
cfs_size_round() with the Linux standard round_up().

Change-Id: I5a5ba4e663672c0b0bba5c99be9e0bece2dc3c87
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50545
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-16619 build: Ubuntu jammy 5.19 client support 10/50210/10
Shaun Tancheff [Tue, 2 May 2023 05:34:39 +0000 (00:34 -0500)]
LU-16619 build: Ubuntu jammy 5.19 client support

Ubuntu 5.19 kernel removed lsmcontext_init() and changed
security_dentry_init_security to require struct context *

Linux kernel linux-hwe-5.19
LSM: Removed scaffolding function lsmcontext_init

Linux kernel linux-hwe-5.19
LSM: security_dentry_init_security with struct lsmcontext

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib6479a2cd20df5e565ae6203e05df2afa3f3de31
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50210
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-15668 osd-ldiskfs: fix osd_bio_private double free 79/46879/5
Li Dongyang [Tue, 22 Mar 2022 01:12:23 +0000 (12:12 +1100)]
LU-15668 osd-ldiskfs: fix osd_bio_private double free

In osd_do_bio(), if the IO is fragmented and bio_alloc()
fails to allocate new bio, the bio_private still holds
osd_bio_private for last bio and will be double freed
in osd_do_bio() and dio_integrity_complete_routine().

Test-Parameters: trivial
Change-Id: I42eaf95a85ec99a60359122054efb5beb0fb6104
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46879
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
11 months agoLU-16784 tests: fix path to lgss_sk 25/50825/9
Sebastien Buisson [Mon, 1 May 2023 23:44:18 +0000 (16:44 -0700)]
LU-16784 tests: fix path to lgss_sk

Find correct path to lgss_sk utility, by looking inside Lustre build
tree if command is not installed on the local node.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I23920bb2a44d2ec7e9662e75c23bd5302d8dfee2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50825
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
11 months agoLU-16804 tests: load CONFIG at beginning of init_test_env 14/50914/4
Sebastien Buisson [Wed, 10 May 2023 12:13:54 +0000 (14:13 +0200)]
LU-16804 tests: load CONFIG at beginning of init_test_env

In order to have all environment variables properly loaded, make
CONFIG loaded at the beginning of init_test_env().

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1c3caa3d582c4b317ff3d0d10fc0103e046ddf17
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/50914
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 months agoLU-15562 statahead: using try lock for batched RPCs 49/46549/16
Qian Yingjin [Fri, 18 Feb 2022 08:07:54 +0000 (03:07 -0500)]
LU-15562 statahead: using try lock for batched RPCs

To avoid the possible deadlock between the batched statahead RPC
and rename()/migrate() operation, we use trylock to obtain the DLM
PR ibits lock for file attributes in a batched statahead RPC.
A failed trylock means that other users maybe modify the directory
simultaneously as the server only grants read lock to a client in
current Lustre design which is compatible with the PR lock for
attributes in stat()-ahead.
When a trylock failed, the MDT reports the conflict with the error
code -EBUSY, and the client stops the statahead immediately.

In this patch, we set "statahead_batch_max" with 64 to enable
batched statahead by default.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I38394b1547e18ad156f94e49cd81aaef2f6fafb5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46549
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>