Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-16390 tests: check Lustre filefrag in sanity-flr/49a
Andreas Dilger [Tue, 13 Dec 2022 07:01:06 +0000 (00:01 -0700)]
LU-16390 tests: check Lustre filefrag in sanity-flr/49a

Check that a Lustre-patched filefrag is installed when running
sanity-flr test_49a.

Lustre-change: https://review.whamcloud.com/49386
Lustre-commit: 37f18670e49b8150170f9b724b5f7089fa176c4e

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic909ea4ca160d47480004f53a96ce7539ce5076c
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49503
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16412 llite: check truncated page in ->readpage()
Qian Yingjin [Mon, 19 Dec 2022 06:57:39 +0000 (01:57 -0500)]
LU-16412 llite: check truncated page in ->readpage()

The page end offset calculation in filemap_get_read_batch() was
off by one. This bug was introduced in commit v5.11-10234-gcbd59c48ae
("mm/filemap: use head pages in generic_file_buffered_read")

When a read is submitted with end offset 1048575, it calculates
the end page index for read of 256 where it should be 255. This
results in the readpage() call for the page with index 256 is over
stripe boundary and may not be covered by a DLM extent lock.

This happens in a corner race case: filemap_get_read_batch()
batches the page with index 256 for read, but later this page is
removed from page cache due to the lock protected it being revoked,
but has a reference count due to the batch.  This results in this
page in the read path is not covered by any DLM lock.

The solution is simple. We can check whether the page was
truncated and was removed from page cache in ->readpage() by the
address_sapce pointer of the page. If it was truncated, return
AOP_TRUNCATED_PAGE to the upper caller.  This will cause the
kernel to retry to batch pages and the truncated page will not
be added as it was already removed from page cache of the file.

Add sanityn/test_95 to verify it.

Lustre-change: https://review.whamcloud.com/49433
Lustre-commit: TBD (from 02fe613db9517875c03e8a919e1b42cb1ba7c619)

Test-Parameters: testlist=sanityn env=ONLY=95 clientdistro=ubuntu2204
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I192df92b1d1b79057055430cc81cb7cc760cc9ed
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15115 ptlrpc: recalc timer on EINPROGRESS reply
Alexander Zarochentsev [Fri, 15 Oct 2021 18:27:29 +0000 (21:27 +0300)]
LU-15115 ptlrpc: recalc timer on EINPROGRESS reply

ptlrpcd doesn't recalculate wait queue timer after
getting -EINPROGRESS reply. It may delay request resend
till its timing out.

Lustre-change: https://review.whamcloud.com/45266
Lustre-commit: 9a5bace55a5ddb8a928af2de1b199e968f3fbecd

HPE-bug-id: LUS-10366
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Idc76c688a0f7ff8e110446fd1fe13dd83f636f3b
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49513
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x
Li Dongyang [Mon, 19 Dec 2022 10:03:47 +0000 (21:03 +1100)]
LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x

Recreate the currently broken lustre kernel patches
to allow using custom integrity functions for bio.
Note we don't need to save the generate_fn anymore,
it will be used once we call bio_integrity_prep_fn().

Add upstream fix
b13e0c718568 ("block: bio-integrity: Advance seed correctly
for larger interval sizes") for CentOS 8.0 to 8.6.

Handle the kernel api changes for the T10PI generate and
verify functions introduced in CentOS 8.x kernel,
mostly because of switching to blk_integrity_iter.

Update the custom generate and verify functions, to sync
with upstream versions.
- Add T10-DIF-TYPE2, currently only a place holder,
  not used in upstream either.
- Use __be16 instead of __u16 for guard tags.

Only reuse guard tags if the rpc checksum is the same
one supported on the target. We already have some protection
during checksum type negotiation, the server
will mark the target's T10PI type as the only
T10PI checksum type supported. But it's still good to
have the logic in place.

Do not call bio_integrity_prep() if the custom interface
bio_integrity_prep_fn() does not exist, submit_bio() will
do that for us.

On the servers, show the target's T10PI checksum as
the preferred checksum_type even if it's not the fastest.
Note this is only cosmetic and does not impact the checksum
type used, which is still done during negotiation.

Lustre-change: https://review.whamcloud.com/49441
Lustre-commit: TBD (from a0c96829a760a5cf199e5278bf2693f2618b77c9)

Change-Id: I2d0ba0b80ba9cde2977da24db08095671aa5373c
Test-Parameters: trivial
Fixes: 293844d132 ("LU-16222 kernel: RHEL 8.7 client and server support")
Fixes: f176efd183 ("LU-12269 kernel: RHEL 8.0 server support")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49483
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16376 obdclass: NUL terminate long jobid strings
Andreas Dilger [Thu, 8 Dec 2022 18:43:57 +0000 (11:43 -0700)]
LU-16376 obdclass: NUL terminate long jobid strings

It appears that some jobid names can be sent that are using the full
32-byte size, rather than containing an embedded NUL terminator. This
caused errors in lprocfs_job_stats_log() when it overflowed.

If there is no NUL terminator in lustre_msg_get_jobid() then add one
if not found within the buffer, so that the rest of the code doesn't
have to deal with unterminated strings.

This potentially exposes a larger issue that other places may not be
handling the unterminated string properly either, which needs to be
addressed separately on both the client and server.  Terminating the
jobid to 31 chars only on the client does not totally solve the issue,
since there will still be older clients that are not doing this, so
the server needs to handle this in any case.

Lustre-change: https://review.whamcloud.com/49351
Lustre-commit: 9eba5d57297f807fddf046356c846478bbf232f4

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4c05fabdacb6a0bbf6477d3601a628fe1f3ebbe5
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49501
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14552 ptlrpc: NULL pointer dereference in ptlrpc_watchdog_fire
Andriy Skulysh [Mon, 1 Mar 2021 21:41:33 +0000 (23:41 +0200)]
LU-14552 ptlrpc: NULL pointer dereference in ptlrpc_watchdog_fire

thread->t_task isn't initialized by target_recovery_thread()

Lustre-change: https://review.whamcloud.com/43115
Lustre-commit: 14a1102268941d851ef5ef793923e39081b81ff4

Change-Id: Ia38d5ccaab6b9332a1fd60ebe5ed2461f7d5db84
HPE-bug-id: LUS-9748
Fixes: 0496cdf20 ("LU-13608 tgt: abort recovery while reading update llog")
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Andrew Perepechko <c17827@cray.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49486
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15081 vfs: set_nlink() is not race-safe
Andrew Perepechko [Mon, 11 Oct 2021 19:11:05 +0000 (22:11 +0300)]
LU-15081 vfs: set_nlink() is not race-safe

set_nlink() is not atomic wrt race with itself and
the following warning may be triggered by VFS:

WARNING: CPU: 5 PID: 195090 at fs/inode.c:241 __destroy_inode+0xdb/0xf0

It does not seem important what exact nlink value is the result
of the race. However, we need to protect the superblock remove
counter.

Lustre-change: https://review.whamcloud.com/45191
Lustre-commit: 12b05772fdb6d080819b6c213fcd7f8705278412

Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-9825
Change-Id: I67bc345b9a9e43fb88d919a83246759d11604b03
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49452
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6169 lipe: sanity-lipe-find3 reformat to clean lost+found
Alexandre Ioffe [Tue, 13 Dec 2022 06:20:33 +0000 (22:20 -0800)]
EX-6169 lipe: sanity-lipe-find3 reformat to clean lost+found

Reformat file system when .lustre/lost+found/ has garbage

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ib78b06e685aaeabb8356662747285ed7a27dde15
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6187 lipe: lipe_find3 missing option ipath
Alexandre Ioffe [Tue, 20 Dec 2022 07:35:13 +0000 (23:35 -0800)]
EX-6187 lipe: lipe_find3 missing option ipath

Added missing lexical ipath

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I62260e054a9c514aa31d378322b6840f75edf221
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49455
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn70
Andreas Dilger [Sat, 17 Dec 2022 02:30:27 +0000 (19:30 -0700)]
RM-620 build: New tag 2.14.0-ddn70

New tag 2.14.0-ddn70

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iaf30877c3a4c88c20a64814670c7e97e4f0cc5e0

2 years agoLU-15935 tests: add version check to replay-dual test_33
Jian Yu [Wed, 14 Dec 2022 02:13:33 +0000 (18:13 -0800)]
LU-15935 tests: add version check to replay-dual test_33

This patch adds MDS version check to replay-dual test_33
to avoid interop test failure.

Lustre-change: https://review.whamcloud.com/49398
Lustre-commit: TBD (from 0027fba3d3f797407fad9f3995f839a431e49782)

Test-Parameters: trivial \
serverjob=lustre-b_es5_2 serverbuildno=539 \
env=ONLY=33 testlist=replay-dual

Test-Parameters: trivial env=ONLY=33 testlist=replay-dual

Change-Id: I3ec665302a431d3c0f07bc819a08237dbc5b4309
Fixes: 1a79d395dd ("LU-15935 target: keep track of multirpc slots in last_rcvd")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49401
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15234 lnet: add mechanism for dumping lnd peer debug info
Serguei Smirnov [Mon, 28 Feb 2022 19:04:00 +0000 (11:04 -0800)]
LU-15234 lnet: add mechanism for dumping lnd peer debug info

Add ability to dump lnd peer debug info:
lnetctl debug peer --nid=<nid>

The debug info is dumped to the log as D_CONSOLE by the respective
lnd and can be retrieved with "lctl dk" or seen in syslog.
This mechanism has been added for socklnd and o2iblnd peers.

Lustre-change: https://review.whamcloud.com/48566
Lustre-commit: 950e59ced18d49e9fdd31c1e9de43b89a0bc1c1d

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ia9c4d59143206bcb7ec43806594cf0cfaed5f0a9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49038
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5924 lipe: lipe_scan3 ERROR replaced by WARNING
Alexandre Ioffe [Fri, 9 Dec 2022 22:52:55 +0000 (14:52 -0800)]
EX-5924 lipe: lipe_scan3 ERROR replaced by WARNING

Decrease severity of the message down to WARNING.

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I2f4b885248692e042ba9eb0f97736401e6d35de6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49355
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
2 years agoLU-16378 lnet: handles unregister/register events
Cyril Bordage [Mon, 12 Dec 2022 10:49:11 +0000 (11:49 +0100)]
LU-16378 lnet: handles unregister/register events

When network is restarted, devices are unregistered and then
registered again. When a device registers using an index that is
different from the previous one (before network was restarted), LNet
ignores it. Consequently, this device stays with link in fatal state.

To fix that, we catch unregistering events to clear the saved index
value, and when a registering event comes, we save the new value.

Lustre-change: https://review.whamcloud.com/49375/
Lustre-commit: TBD (from 7442710a56a8f38453441c62253c0ad891fe9b8c)

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I17e93a1103d588f3e630a9c7446b345f4d472b97
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49376
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16373 tests: failover mds1 back to the primary server
Jian Yu [Thu, 15 Dec 2022 19:38:56 +0000 (11:38 -0800)]
LU-16373 tests: failover mds1 back to the primary server

This patch fixes recovery-small test 144a to failover
mds1 back to the primary server so that stack_trap can
set timeout parameter on the correct mds node.

Lustre-change: https://review.whamcloud.com/49345
Lustre-commit: TBD (from 68c75d28fe86ac890d242c004c664f872204b660)

Test-Parameters: trivial \
env=SLOW=yes,FAILURE_MODE=HARD,ONLY=144a \
clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
austeroptions=-R failover=true iscsi=1 \
testlist=recovery-small

Change-Id: Idbfdb7b084c7edac8784008e0455f76632aa685b
Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49419
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16329 Revert "LU-8621 utils: cmd help to stdout or short cmd error"
Andreas Dilger [Thu, 15 Dec 2022 15:30:32 +0000 (08:30 -0700)]
LU-16329 Revert "LU-8621 utils: cmd help to stdout or short cmd error"

This reverts commit 608d763955d7e0a9c438c317e595f14825e9423b.
This breaks bash command completion.

Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout, short cmd error")
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I004ea5af499593b0f36ba17ff5f517548f0ea0f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49416
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6349 Revert "LU-14661 obdclass: Add peer/peer NI when processing llog"
Alex Zhuravlev [Wed, 14 Dec 2022 19:00:01 +0000 (22:00 +0300)]
EX-6349 Revert "LU-14661 obdclass: Add peer/peer NI when processing llog"

This reverts commit e8ddb2f550072cdd3489389c107af3e892a21f66.
It is causing problem with reconnection at failover.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I53594f8f93474666c4abd96291d58dadf8ac5969
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49411
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs
Lai Siyao [Tue, 15 Mar 2022 19:43:14 +0000 (15:43 -0400)]
LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs

In osd_fid_lookup(), if the FID mapping found in OI table is insane,
it will be added into a list called os_inconsistent_items, and OI
scrub will be triggered.

Later if OI scrub can't fix this mapping, it should move this mapping
into a list called os_stale_items, and subsequent access of the same
FID should return -ESTALE immediately, other than trigger OI
scrub repeatedly.

Add sanity-scrub 20. Remove sanity-scrub 1d, which is not a sane test
because it altered FID in LMA, which is the last to trust for an
object, and it could pass just by chance.

Lustre-change: https://review.whamcloud.com/46852
Lustre-commit: 558784caad491be50e93ae60a31d4219a1e038bc

Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub,sanity-scrub
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I3ed8928506551416b1008121adbe385dedda29bc
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49424
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn69
Andreas Dilger [Tue, 13 Dec 2022 19:12:09 +0000 (12:12 -0700)]
RM-620 build: New tag 2.14.0-ddn69

New tag 2.14.0-ddn69

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I592bd3a6fdb9db02bbe1a18c6e84d9b61a639f95

2 years agoEX-6497 lipe: Refine stats field name in lamigo
Alexandre Ioffe [Thu, 8 Dec 2022 06:45:35 +0000 (22:45 -0800)]
EX-6497 lipe: Refine stats field name in lamigo

Corrected periodically printed by lamigo INFO
message "processed":
- Added two additional fields:
  "running" - number of currently running jobs such as replication
  "delayed" - current number of failed and other (such as set flag)
  jobs which are awating to be run on next lamigo cycle
- "in queue" field is changed to "awaiting". This is current number
  of files in the internal cache. These files are awating to be
  processed (replicated)

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Iacf0199cfcf56edcbb8ad91e0e4b62c7451900f5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49344
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 lipe: decrease delay before ALR restart
Alexandre Ioffe [Sat, 19 Nov 2022 05:39:08 +0000 (21:39 -0800)]
EX-6298 lipe: decrease delay before ALR restart

- Decrease delay before restarting access log reader and
eliminate this delay when the read from ALR fails
due to timeout. Increase SSH poll/read timeout while
keep-alive message in ofd_access_log_reader is not
implemented
This will decrease probability of missing ALR.
- Remove excluding hot-pools test_72

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I36989e9c3fd877aee5ce1cfb8525db8604e666bd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49196
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16353 config: enable_foo variables mustn't contains space
Mr NeilBrown [Thu, 1 Dec 2022 17:53:01 +0000 (09:53 -0800)]
LU-16353 config: enable_foo variables mustn't contains space

$enable_crypto is in some circumstances set to "embedded llcrypt"
which contains a space.
When the code from lustre-build.m4 then tests the value with:

   if test x$enablecrypto = xyes

we get a syntax error from ./configure

We could add quotes to this comment, but for consistency we would need
to add quotes to ever other test for an enable_foo variable.

It is simpler just to ensure we don't add spaces.  So change the space
to a hyphen.

Lustre-change: https://review.whamcloud.com/49282
Lustre-commit: c8a33e5322b0675680f8d737f04259799d30aa0e

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I097e857409d6ec48a765ccda1cc470d28b90e601
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16051 o2iblnd: detect link state to set fatal error on ni
Serguei Smirnov [Fri, 23 Sep 2022 22:20:51 +0000 (15:20 -0700)]
LU-16051 o2iblnd: detect link state to set fatal error on ni

To avoid selecting lnet ni which corresponds to a downed link
for sending, add a mechanism for detecting ip-layer link events
in o2iblnd. On ip link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag. This complements the
existing mechanism for ib-layer link event handling.

Lustre-change: https://review.whamcloud.com/48644
Lustre-commit: 30d73908087d5b2f0b18cce95826c4825c030ad4

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I4720cd0a7bc577a522c7d40b54f821a4c12b670f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49315
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14992 tests: add more mkdir_on_mdt0 calls
Mr NeilBrown [Tue, 29 Nov 2022 02:31:21 +0000 (18:31 -0800)]
LU-14992 tests: add more mkdir_on_mdt0 calls

A previous patch changed some mkdir calls in test_133a to
mkdir_on_mdt0. This allows stats collected from mdt0 to
reflect the mkdir.

However two mkdir calls were missed, so "crossdir_rename" stats can be
wrong.

Lustre-change: https://review.whamcloud.com/49252
Lustre-commit: d56ea0c80a959ebd9b393f2da048cc179cb16127

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity env=ONLY=133a

Fixes: 543341afc3 ("LU-14992 tests: sanity/replay-vbr mkdir on MDT0")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4e5c2e5504307462bff4012a13ef9deb24f8da8c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49262
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16308 llite: wake_up after cl_object_kill
Lai Siyao [Thu, 10 Nov 2022 13:15:51 +0000 (08:15 -0500)]
LU-16308 llite: wake_up after cl_object_kill

cl_inode_fini() calls cl_object_kill() to set LU_OBJECT_HEARD_BANSHEE,
and then calls cl_object_put_last() to wait for object refcount to
become one, It should wake_up() in the middle in case someone is
waiting on the flag.

Lustre-change: https://review.whamcloud.com/49130
Lustre-commit: 3a0a6c7a88499a78c9bfc6ac514d05eba60312c9

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I244db71ee4ed9c39118e443b99c3b8a3a0aa4bc3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49312
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6468 pcc: add threshold to determine direct I/O during attach
Qian Yingjin [Wed, 30 Nov 2022 14:29:47 +0000 (09:29 -0500)]
EX-6468 pcc: add threshold to determine direct I/O during attach

This patch adds the threshold tunable parameter to determine doing
direct I/O or buffered I/O for data copying during attach:
llite.*.pcc_dio_attach_threshold
The default value is same as direct I/O size: 32MiB.

And the usage of the parameter "pcc_dio_attach_size_mb" is
deprecated, and use "pcc_dio_attach_iosize_mb" instead.

Change-Id: I393d6a06523303e749192ba9978449c3d75886ae
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn68
Andreas Dilger [Tue, 6 Dec 2022 05:15:41 +0000 (22:15 -0700)]
RM-620 build: New tag 2.14.0-ddn68

New tag 2.14.0-ddn68

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id4e3d1a9f28afe251e55582c84acaf98ebfe9954

2 years agoLU-15852 lnet: Don't modify uptodate peer with temp NI
Chris Horn [Wed, 30 Mar 2022 18:35:23 +0000 (13:35 -0500)]
LU-15852 lnet: Don't modify uptodate peer with temp NI

When processing the config log it is possible that we attempt to
add temp NIs after discovery has completed on a peer. These temp
may not actually exist on the peer. Since discovery has already
completed the peer is considered up-to-date and we can end up with
incorrect peer entries. We shouldn't add temp NIs to a peer that
is already up-to-date.

Lustre-change: https://review.whamcloud.com/47322
Lustre-commit: 8f718df474e453fbc69dfe90214e71565963f6db

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia484713b1e6c9e1a46e525589b7c741c6478e417
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49303
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15938 llog: more checks in llog_reader
Mikhail Pershin [Tue, 2 Aug 2022 12:41:52 +0000 (15:41 +0300)]
LU-15938 llog: more checks in llog_reader

Add more correctness checks and reports in llog_reader:
- better report wrong record length and chunk skipping case
- add tail check: tail id and len should be the same as in head
- better report for gap in record indeces
- test case with two corruption types:
  1) llog has bits set in bitmap beyond file end
  2) corruption in the middle

Lustre-change: https://review.whamcloud.com/48112
Lustre-commit: 386ffcdbb4c9b89f798de4c83a51a3f020542c8b

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I0c2af6ae2592c94e14e90ead12e28104409313b2
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-16317 build: dkms build requires flex, bison and libmount-devel
Jian Yu [Tue, 29 Nov 2022 17:14:22 +0000 (09:14 -0800)]
LU-16317 build: dkms build requires flex, bison and libmount-devel

This patch fixes lustre.spec.in and lustre-dkms.spec.in to add
requires for flex, bison, libmount and libmount-devel. The last
two have already been added into lustre.spec.in.

Lustre-change: https://review.whamcloud.com/49183
Lustre-commit: c74c630ff7596317d1b500fd385fca271b31708c

Test-Parameters: trivial

Fixes: 121a79651f ("LU-15967 build: configure script does not check for required build tools")
Fixes: f21b944127 ("LU-15940 build: add a required dependency for libmount")

Change-Id: I9923fc7eb09f974e8c38c3664138486a424e16d7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6373 pcc: asynchronous PCCRO attach command support
Qian Yingjin [Fri, 11 Nov 2022 09:01:02 +0000 (04:01 -0500)]
EX-6373 pcc: asynchronous PCCRO attach command support

Currently PCCRO attach via the command "lfs pcc attach" will block
during the data copying.
There is a requirement that this command can also do data copy
asynchronously. Thus we add an option "--async|-A" to the command
which will not block while the file data is being fetched.

Add sanity-pcc/test_{103, 104} to verify that it works correctly.

Change-Id: I6f31190c8b9e9b9876b34f8e484c6c8b7f16b6db
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49133
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16313 pcc: use two bits to indicate pcc type for attach
Qian Yingjin [Tue, 15 Nov 2022 06:57:08 +0000 (01:57 -0500)]
LU-16313 pcc: use two bits to indicate pcc type for attach

PCC currenty supports two types: readwrite and readonly.
The attach data structure @lu_pcc_attach is using 32 bit value to
indicate the PCC type:
struct lu_pcc_attach {
__u32 pcca_type;
__u32 pcca_id;
};

In this patch, it changes to use 2 bits to represent the PCC type.
The left bits in @pcca_type can be used as flags for attach such
as a flag to indicate using the asynchronous attach via the
command "lfs pcc attach -A" for PCCRO.

Lustre-change: https://review.whamcloud.com/49160
Lustre-commit: 6e90974b1f4ac24c5a5d45ecc9bdb4d47018dab4

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idee26018642a174b04d1d36a81952ea98a06514e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn67
Andreas Dilger [Tue, 6 Dec 2022 02:05:39 +0000 (19:05 -0700)]
RM-620 build: New tag 2.14.0-ddn67

New tag 2.14.0-ddn67

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia40ed3b7d185fa171586d5ca377714518fdc5e2e

2 years agoLU-8585 llapi: use open_by_handle_at in llapi_open_by_fid
Quentin Bouget [Sun, 2 Jan 2022 16:12:42 +0000 (11:12 -0500)]
LU-8585 llapi: use open_by_handle_at in llapi_open_by_fid

Reimplement llapi_open_by_fid() to use llapi_fid_to_handle() and
open_by_handle_at(2) rather than using ioctl().  This works for
opens on subdirectory mountpoints, unlike ".lustre/fid/<fid>".

This patch also adds llapi_open_by_fid_at() which is similar to
llapi_open_by_fid() except that it takes an open directory file
descriptor or AT_CWD rather than a path as its first argument.

[AD:
- Move get_root_*() functions over to a new liblustreapi_root.c
  file in expectation of further enhancements to that code.
- Cache an open file handle on the root directory so repeated
  calls to llapi_open_by_fid() and llapi_fid2path() do not need
  to search for and open the same root directory path many times.
- Add man pages for newly-added functions.

  This reduces the system calls for llapi_fid_test significantly:

      original     patched
         14511        4315   total opens
         64807       34067   total syscalls
]

There may still be a need to have a fallback from open_by_handle_at()
to using ".lustre/fid/<FID>" to open the fid (if available), but
that can be added if this initial patch does not test well.  The
open_by_handle_at() method avoids reopening the "fid/" directory
each time (though this fd could also be cached), but it has the
drawback that it reconnects dentries to the root directory each time.

Lustre-change: https://review.whamcloud.com/36603
Lustre-commit: bdf7788d19985bb7abf2385add15f1d67f3d01e4

Signed-off-by: Quentin Bouget <quentin.bouget@cea.fr>
Change-Id: I8a4904c996389da2b0894cd9fac639a398607535
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49202
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15833 llapi: don't use realpath in llapi_search_fsname()
Etienne AUJAMES [Mon, 9 May 2022 13:44:29 +0000 (15:44 +0200)]
LU-15833 llapi: don't use realpath in llapi_search_fsname()

This patch use st_dev value to determine the fsname in
llapi_search_fsname().
The main purpose of this is to limit the number of lstat()
(realpath()) in this function.

get_root_path() is modified to search a mountpoint by dev.
And the last results of get_root_path() is cached to avoid reading
/proc/mount for each call.

A new api function llapi_search_rootpath_by_dev() is added to get
the path of Lustre mountpoint using the specified device value.

**Testing:**

*Environement:*
VMs: 1 client, 1 MDS (2MDT), 1 OSS (2 OST)
Lustre tree: test{001..100}/test{001..100}/test{01..10}/file{01..05}
(500000 files + 110100 folders)
OS: Centos 7 (no statx)
Lustre: 2.15.50_15_g1116739

*Tests*
cd <rootfs>
strace lfs getstripe -r .
echo 3 > /proc/sys/vm/drop_caches
/usr/bin/time lfs getstripe -r . (2 iterations)

*Results*
times (s):

                 ______________________________
                | user | system | real | real% |
 _______________|______|________|______|_______|
|without patch: | 6.18 | 57.3   | 427  | 0%    |
|_______________|______|________|______|_______|
|with patch:    | 2.88 | 47.3   | 404  |-5.45% |
|_______________|______|________|______|_______|

strace (only significant changes are displayed):
(*stat = lstat + stat + fstat)
                 _____________________________________________
                | *stat  | mmap   | open   | read   | all     |
 _______________|________|________|________|________|_________|
|without patch: | 760545 | 110142 | 330379 | 330325 | 4742658 |
|_______________|________|________|________|________|_________|
|with patch:    | 440484 | 0      | 220277 | 19     | 3541739 |
|_______________|________|________|________|________|_________|

-25.32% syscalls after patching.

Lustre-change: https://review.whamcloud.com/47258
Lustre-commit: 4fd7d5585d33240a658f57bf7399da4415a7eb6c

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I3812d922d5b1d194d52132cba95d11820424c5d7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoDDN-3473 build: support kernel 3.10.0-693.el7
Jian Yu [Wed, 16 Nov 2022 05:54:26 +0000 (21:54 -0800)]
DDN-3473 build: support kernel 3.10.0-693.el7

This patch fixes the following build failures to support
kernel 3.10.0-693.el7 for Lustre client:

- error: implicit declaration of function 'idr_destroy'
- error: implicit declaration of function 'gfpflags_allow_blocking'
- error: implicit declaration of function â€˜cdev_device_add’
- error: passing argument 1 of 'init_wait_var_entry' from
  incompatible pointer type

Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I4b5c5264fb102d3a825c92e7b1e92cf0c52540e5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49197
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-12016 tests: skip sanity/156 in interop
Andreas Dilger [Fri, 25 Nov 2022 02:28:21 +0000 (19:28 -0700)]
LU-12016 tests: skip sanity/156 in interop

Since LU-12071 was backported to b_es5_2 the version check on b_es6_0
is incorrect and this part of the test_156 should be skipped.

Test-Parameters: trivial testlist=sanity env=ONLY=156 serverversion=EXA5
Fixes: 3043c6f189 ("LU-12071 osd-ldiskfs: bypass pagecache if requested")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I3fd96578e36675655fb265d83ba3f661950ab112
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49246
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15139 osp: block reads until the object is created
Alex Zhuravlev [Sun, 13 Nov 2022 14:51:30 +0000 (17:51 +0300)]
LU-15139 osp: block reads until the object is created

it's possible that remote llog can be read and written simultaneously
at recovery. for example, dtx recovery thread is fetching updates
while MDD's orphan cleanup procedure is removing orphans from PENDING.

OSP can be asked to read a just created in OSP cache object while
actual object on remote MDS hasn't been created yet. OSP should
block such reads until the creation is done.

Lustre-change: https://review.whamcloud.com/47003/
Lustre-commit: 4f2914537cc32fe89c4781bcfc87c38e3fe4419c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I5596c791a758dd542746afd961eb1ed9c97845be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16295 kernel: kernel update RHEL 7.9 [3.10.0-1160.80.1.el7]
Jian Yu [Fri, 18 Nov 2022 20:13:08 +0000 (12:13 -0800)]
LU-16295 kernel: kernel update RHEL 7.9 [3.10.0-1160.80.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.80.1.el7.

Lustre-change: https://review.whamcloud.com/49045
Lustre-commit: TBD (from 636e97a22936a1fab8d9e5fde40f6e1f9a1c5bc5)

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I50a0ee572d24ddc73f8af6dc32ef701c260e45b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49194
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-6399 pcc: add tunable parameter for PCC attach thread
Qian Yingjin [Wed, 16 Nov 2022 09:26:33 +0000 (04:26 -0500)]
LU-6399 pcc: add tunable parameter for PCC attach thread

Currently the max number of kernel threads doing asynchronous
attach is a hard code value (1024 by default).
In this patch, we make it a tunable parameter:
llite.*.pcc_max_attach_thread_num

Change-Id: Ic59c15af935dd8dff586fa6be3939d4322c136d5
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49168
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6372 lipe: Remove colocation constraint from lamigo/lpurge resources
Gaurang Tapase [Fri, 11 Nov 2022 18:26:30 +0000 (23:56 +0530)]
EX-6372 lipe: Remove colocation constraint from lamigo/lpurge resources

We now rely on node attribute *-recovered to start HP resources.
Hence, starting ES 5.2.7 colocation constraints are not needed
to start resources. Moreover, with the rules added, base FS
target resources cannot start on the designated nodes as node
get -inf score. This prevents resources failback in case original
server comes back up after failover.

Test-Parameters: trivial

Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Change-Id: I890b12bf8a0d75d618a041be1eb27960dc62cc7e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49179
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artur Novik <anovik@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16160 llite: clear stale page's uptodate bit
Bobi Jam [Tue, 20 Sep 2022 16:27:04 +0000 (00:27 +0800)]
LU-16160 llite: clear stale page's uptodate bit

With truncate_inode_page()->do_invalidatepage()->ll_invalidatepage()
call path before deleting vmpage from page cache, the page could be
possibly picked up by ll_read_ahead_page()->grab_cache_page_nowait().

If ll_invalidatepage()->cl_page_delete() does not clear the vmpage's
uptodate bit, the read ahead could pick it up and think it's already
uptodate wrongly.

In ll_fault()->vvp_io_fault_start()->vvp_io_kernel_fault(), the
filemap_fault() will call ll_readpage() to read vmpage and wait for
the unlock of the vmpage, and when ll_readpage() successfully read
the vmpage then unlock the vmpage, memory pressure or truncate can
get in and delete the cl_page, afterward filemap_fault() find that
the vmpage is not uptodate and VM_FAULT_SIGBUS got returned. To fix
this situation, this patch makes vvp_io_kernel_fault() restart
filemap_fault() to get uptodated vmpage again.

Lustre-change: https://review.whamcloud.com/48607
Lustre-commit: 5b911e03261c3de6b0c2934c86dd191f01af4f2f

Test-Parameters: testlist=sanityn env=ONLY="16f",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16g",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16f 16g",ONLY_REPEAT=50
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I369e1362ffb071ec0a4de3cd5bad27a87cff5e05
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49131
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16304 kernel: kernel update RHEL8.7 [4.18.0-425.3.1.el8]
Jian Yu [Wed, 16 Nov 2022 19:56:58 +0000 (11:56 -0800)]
LU-16304 kernel: kernel update RHEL8.7 [4.18.0-425.3.1.el8]

Update RHEL8.7 kernel to 4.18.0-425.3.1.el8.

Lustre-change: https://review.whamcloud.com/49080
Lustre-commit: TBD (from 8900b469b4d521361d31ca96fed23c49a141fe93)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I13e6d83ada1ec0c4da92f307bf56db5281c41892
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49173
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16294 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.28.1]
Jian Yu [Thu, 10 Nov 2022 18:45:10 +0000 (10:45 -0800)]
LU-16294 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.28.1]

Update SLES15 SP4 kernel to 5.14.21-150400.24.28.1 for Lustre client.

Lustre-change: https://review.whamcloud.com/49046
Lustre-commit: TBD (from 6573047b9b577a908ee3ea4ce0904d34cd867912)

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I651894274a09b6240f321e787736d298c5dc41ce
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49104
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
Alexandre Ioffe [Thu, 17 Nov 2022 22:29:40 +0000 (14:29 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list

This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I5d73cb38d08533656c64b69f814f1d34e5e667ff
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49184
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5758 lipe: complete recovery before hotpools start
Arthur Novik [Wed, 5 Oct 2022 05:16:48 +0000 (22:16 -0700)]
EX-5758 lipe: complete recovery before hotpools start

Added Pacemaker location rules for lamigo and lpurge which force
to start these resources only after OST/MDT recovery complete.
This is conditional on newer Lustre Resource Agent being installed.

Lustre-change: https://review.whamcloud.com/48248
Lustre-commit: f093aef6cbc1a02f8a1b8795f79a4c6d10137a30

Test-Parameters: trivial testlist=hot-pools
Change-Id: Icb3405ca55d5ae940d978b16461d8d4bc2d4d623
Signed-off-by: Arthur Novik <artur_novik@epam.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49142
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6379 lipe: add dump_fids option in help
Alexandre Ioffe [Tue, 15 Nov 2022 07:16:25 +0000 (23:16 -0800)]
EX-6379 lipe: add dump_fids option in help

Added missing dump_fids command line option in
command line help

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I197fb7beb3e8712736fa29bb49d2df1ee4517616
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49161
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13176 mdd: rename file with different project ID
Hongchao Zhang [Tue, 11 Jan 2022 15:12:55 +0000 (23:12 +0800)]
LU-13176 mdd: rename file with different project ID

This patch relaxes the limitation for rename between different
project IDs, and it will allow the normal file rename between
directories with different project IDs.

Lustre-change: https://review.whamcloud.com/45660
Lustre-commit: 88c26912a3237fb63923bbb7c7b09111f9f30bbe

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4a2c21248d1e12ad1d00430e11e5dd50fe5eaf60
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49056
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15435 ptlrpc: unregister reply buffer on rq_err
Alexander Zarochentsev [Fri, 14 Jan 2022 15:35:48 +0000 (10:35 -0500)]
LU-15435 ptlrpc: unregister reply buffer on rq_err

Unregister reply buffer on rq_err and prevent a late reply from
modifying request flags in INTERPRET state.

Fixes: cefabee52586 ("LU-15112 mgc: do not ignore target registration failure")
HPE-bug-id: LUS-10717

Lustre-change: https://review.whamcloud.com/46132
Lustre-commit: d8012811cc6ff9c7f0fb1ddfec9461e9ff963e54

Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I0106e3fd5443c1292c103247cdbf6122f91922e8
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16222 kernel: RHEL 8.7 client and server support
Jian Yu [Fri, 11 Nov 2022 23:17:19 +0000 (15:17 -0800)]
LU-16222 kernel: RHEL 8.7 client and server support

This patch makes changes to support RHEL 8.7 release
with kernel 4.18.0-423.el8 for Lustre client and server.

Lustre-change: https://review.whamcloud.com/48879
Lustre-commit: 293844d132b79a1d256ed4200d5dbd8bb790bfb4

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.7 serverdistro=el8.7 testlist=sanity

Change-Id: Ie97ff67c9a5fbd46bc145ab559665dcbc630b4a0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Co-Authored-By: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49000
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn66
Andreas Dilger [Fri, 11 Nov 2022 09:46:43 +0000 (02:46 -0700)]
RM-620 build: New tag 2.14.0-ddn66

New tag 2.14.0-ddn66

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I04f5c407499930a1893d32c0c699c438264dcaf5

2 years agoEX-6298 lipe: Decrease wait time to reconnect to ALR
Alexandre Ioffe [Thu, 10 Nov 2022 19:42:33 +0000 (11:42 -0800)]
EX-6298 lipe: Decrease wait time to reconnect to ALR

1) Made delay between reconnections to ALR gradually increasing
starting from as little as 5 seconds when ssh session
to ALR fails. It makes attempt to reconnect more often
initially.
2) Enable hot-pools test 72 previously excepted

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools mdtcount=6 env=ONLY=72,ONLY_REPEAT=40
Change-Id: Iafae62d733390f92370f5d224830944f285da934
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6279 lipe: need python and pylint for all builds
Lei Feng [Tue, 1 Nov 2022 23:12:53 +0000 (07:12 +0800)]
EX-6279 lipe: need python and pylint for all builds

Check python and pylint ready for all builds.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: I7e93ec3cdd51d96ed938f6fa85953b9e3f250877
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49012
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]
Jian Yu [Tue, 8 Nov 2022 18:40:24 +0000 (10:40 -0800)]
LU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.30.1.el9_0 for Lustre client.

Lustre-change: https://review.whamcloud.com/49044
Lustre-commit: TBD (from 247849f22a32e85eb8b718d18642f65ac7663a82)

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ide942f88242c80af1e103b226b65cfbce94bfb57
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15935 target: keep track of multirpc slots in last_rcvd
Etienne AUJAMES [Fri, 29 Jul 2022 12:35:33 +0000 (14:35 +0200)]
LU-15935 target: keep track of multirpc slots in last_rcvd

OBD_INCOMPAT_MULTI_RPCS is cleared by tgt_boot_epoch_update() if the
recovery is aborted. This supposes that all the clients are evicted
but that is not true. Some clients could have successfully finished
their recovery. In that case, those clients will keep their last_rcvd
slot.

This patch modifies lut_num_client to keep track of multirpc
slots in last_rcvd.
For now the counter is use only by tgt_fini() to clear
OBD_INCOMPAT_MULTI_RPCS. So we can expand this use case for
tgt_boot_epoch_update().

Add replay-dual test_33.

Lustre-change: https://review.whamcloud.com/48082
Lustre-commit: 1a79d395dd61ea2e21598bfaa5b39375e64ec22c

Test-Parameters: testlist=replay-dual env=ONLY=33,ONLY_REPEAT=30
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I70791c9dcb7cc77f018b9e5c95568598d54f0322
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15404 ldiskfs: truncate during setxattr leads to kernel panic
Andrew Perepechko [Thu, 10 Nov 2022 04:59:27 +0000 (20:59 -0800)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic

When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.

A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Lustre-change: https://review.whamcloud.com/46358
Lustre-commit: e239a14001b62d96c186ae2c9f58402f73e63dcc

Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16251 obdclass: fill jobid in a safe way
Lei Feng [Wed, 19 Oct 2022 04:10:23 +0000 (12:10 +0800)]
LU-16251 obdclass: fill jobid in a safe way

jobid_interpret_string() does not fill jobid in an atomic way.
So in lustre_get_jobid() give it a buffer first, then copy the
buffer to jobid as a whole.

Lustre-change: https://review.whamcloud.com/48915
Lustre-commit: 9a0a89520e8b57bd63a9343fe3cdc56c61c41f6d

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ib8f6aaa93df31867982a0d142f33d7374a27234f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49081
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode
Alexander Zarochentsev [Fri, 29 Jul 2022 19:38:09 +0000 (22:38 +0300)]
LU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode

The flag should be cleared for "fast" symlinks otherwise
e2fsck complains about inode correctness.
New agent inodes of symlink type may have EXT4_EXTENT_FL flag
set if the fs has "extent" feature and it is not cleared as in
other places where "fast" symlinks are created.

Lustre-change: https://review.whamcloud.com/48093
Lustre-commit: 73ac8e35e5d64d3fe4ca6c48514dc57058e3a7b8

HPE-bug-id: LUS-10237
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib7b807bb1298cc3a9fd4fdba35747b4bda6fe034
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16258 llite: Explicitly support .splice_write
Shaun Tancheff [Fri, 21 Oct 2022 04:54:49 +0000 (23:54 -0500)]
LU-16258 llite: Explicitly support .splice_write

Linux commit v5.9-rc1-6-g36e2c7421f02
  fs: don't allow splice read/write without explicit ops

Lustre supports splice_write and previously provide handlers
for splice_read.
Explicitly use iter_file_splice_write, if it exists.

Lustre-change: https://review.whamcloud.com/48928
Lustre-commit: c619b6d6a54235cc0e34a65cf5916a632f4011c3

HPE-bug-id: LUS-11259
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I858688fc9b4dd370b6018c3b134f01e580477b25
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16207 build: add rpm-build BuildRequires for SLES15 SP3
Jian Yu [Tue, 4 Oct 2022 16:24:36 +0000 (09:24 -0700)]
LU-16207 build: add rpm-build BuildRequires for SLES15 SP3

SLES15 SP3 fails to build using rpm-build-4.14.1-29.46
from the main O/S repository with error message:

- Dependency tokens must begin with alpha-numeric,
  '_' or '/': BuildRequires: %kernel_module_package_buildreqs

Updating rpm-build to 4.14.3-150300.46.1 or higher
resolved the build issue.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48760
Lustre-commit: 78c681d9f42cb56e30c8946e5d7b05f0bc6e86f2

Change-Id: I80099e7ba2d98e07b9877183879766f3dd7f3c1a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5473 tests: add version check for interop
Minh Diep [Wed, 9 Nov 2022 21:21:32 +0000 (13:21 -0800)]
EX-5473 tests: add version check for interop

sanity-quota test_75 on 2.12 servers

Test-Parameters: trivial testlist=sanity-quota

Change-Id: I57f5b6415017ec7cf81e3bcb43f289087a8621fd
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6331 lipe: lamigo --help causes Segmentation fault
Alexandre Ioffe [Tue, 8 Nov 2022 18:32:25 +0000 (10:32 -0800)]
EX-6331 lipe: lamigo --help causes Segmentation fault

Fixed printf NULL string argument which causes the seg fault

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I0a9bc3cee308c8cd88d23674bb5127cddb1fdb41
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 target: report multiple transno to debug log
Mikhail Pershin [Wed, 26 Oct 2022 08:17:11 +0000 (11:17 +0300)]
LU-15847 target: report multiple transno to debug log

Don't report multiple transaction cases to console but
make it as debug message.

Lustre-change: https://review.whamcloud.com/49027
Lustre-commit: TBD (1550da71c46f65b72951c0348f32835ed7f617fb)

Fixes: 4e2e8fd2fc0a ("LU-15847 tgt: reply always with the latest assigned transno")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If9b47dfedcaf67487954189e8a75d2029a502469
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49027
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
Jian Yu [Wed, 9 Nov 2022 19:06:37 +0000 (11:06 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list

This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.

Test-Parameters: trivial testlist=hot-pools

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: If214f7285dfb96dee24e6c5968f1f19c81ce1ddf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49085
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15179 tests: add trap cleanup_quota_test
Sergey Cheremencev [Wed, 2 Nov 2022 10:08:50 +0000 (18:08 +0800)]
LU-15179 tests: add trap cleanup_quota_test

Add stack_trap cleanup_quota_test to the tests that
use setup_quota_test. If a test fails without calling
cleanup_quota_test, it may cause later tests to fail
due to used space > 0.

Remove ${tdir}_dom, if exists, in cleanup_quota_test.
sanity-quota_75 doesn't remove test_dom directory.

Lustre-change: https://review.whamcloud.com/#/c/45418/
Lustre-commit: c44b2bea1bacc3cb9173353037cf3a616f13669f

Test-Parameters: trivial  testlist=sanity-quota
Fixes: a4fbe734("LU-14739 quota: nodemap squashed root cannot bypass quota")
Change-Id: Ife4fd499b427bee79f74a5e172d233fe6a83e240
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14958 kernel: use rhashtable for revoke records in jbd2
Alex Zhuravlev [Wed, 12 Oct 2022 07:32:36 +0000 (10:32 +0300)]
LU-14958 kernel: use rhashtable for revoke records in jbd2

resizable hashtable should improve journal replay time when
the latter has got million of revoke records

before:
1048576 records - 95 seconds
2097152 records - 580 seconds

after:
1048576 records - 2 seconds
2097152 records - 3 seconds
4194304 records - 7 seconds

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8f54a51df5e3387277b976e046eea70c26d54dcd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16232 scripts: changelog/updatelog emergency cleanup
Mikhail Pershin [Wed, 12 Oct 2022 09:22:14 +0000 (12:22 +0300)]
LU-16232 scripts: changelog/updatelog emergency cleanup

Emergency cleanup scripts for situations when llogs are
corrupted and can't be cleaned up in a normal way. In such
cases the recommendation is to remove/truncate those llogs.

Scripts make all needed steps and have debugging option to
collect llogs for further analysis.

Scripts possible actions are:
 - dry-run mode to check all actions and files affected
 - create archive with all llogs for analysis
 - remove llogs including all plain llogs

Lustre-change: https://review.whamcloud.com/48838
Lustre-commit: b533700add91fe4220f50d057a470e0b6f4893c9

Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3b197179bc54f451e3c5d7db36b6f1c56c076856
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49023
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16203 llog: skip bad records in llog
Mikhail Pershin [Mon, 3 Oct 2022 15:35:25 +0000 (18:35 +0300)]
LU-16203 llog: skip bad records in llog

This patch is further development of idea to skip bad
(corrupted) llogs data. If llog has fixed-size records
then it is possible to skip one record but not rest of
llog block.

Patch also fixes the skipping to the next chunk:
 - make sure to skip to the next block for partial chunk
   or it causes the same block re-read.
 - handle index == 0 as goal for the llog_next_block() as
   expected exclusion and just return requested block
 - set new index after block was skipped to the first one
   in block
 - don't create fake padding record in llog_osd_next_block()
   as the caller can handle it and would know about
 - restore test_8 functionality to check corruption handling

Lustre-change: https://review.whamcloud.com/48776
Lustre-commit: TBD (from 5896c420d82507f90473414df3e6d342126cc21f)

Fixes: ec4194e4e78c ("LU-11591 llog: add synchronization for the last record")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6f88269e8626269268352f8bfd6d7950de438f3a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer/peer NI when processing llog
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

Lustre-change: https://review.whamcloud.com/43510
Lustre-commit: 16321de596f6395153be6cbb6192250516963077

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 lnet: Provide kernel API for adding peers
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Lustre-change: https://review.whamcloud.com/43509
Lustre-commit: ac201366ad5700edc860c139955af8a09bf53a1a

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]
Jian Yu [Thu, 3 Nov 2022 20:09:15 +0000 (13:09 -0700)]
LU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.32.1.el8_6.

Lustre-change: https://review.whamcloud.com/48969
Lustre-commit: TBD (from c4a23690d3328447c7b4ddbb8f567b2de21457b6)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I5576180ddf10ed2b0a5e2ef85b58fef993de65a4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15259 tests: use existing usernames for setfacl
Andreas Dilger [Tue, 13 Sep 2022 18:06:10 +0000 (02:06 +0800)]
LU-15259 tests: use existing usernames for setfacl

In SLES15.2 and Ubutntu 20 the "bin" and "daemon" users are not
defined in /etc/passwd, causing setfacl to print a cryptic error:

  setfacl -m u:bin:rw f -- failed
  ~     ? setfacl: Option -m: Invalid argument near character 3

Replace "bin" and "daemon" in ACL tests so they are run with user
and group names that exist on all distros currently being tested.
They can also be specified via ACLUSR1/ACLUSR2 in the test config.

The "permission_xattr" test also needs "nobody" user and group.

Also, the "getfacl" command prints users and groups in numerical
order, so the ACL tests will fail if "daemon" < "bin", or if either
group is higher than the "users" group.  Fix them as needed.

Lustre-change: https://review.whamcloud.com/45627
Lustre-commit: 60188994e24b95db5915b8e6802f7963ffb2fd9c

Test-Parameters: trivial testlist=sanity-quota,sanity-sec,pjdfstest
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el7.9 serverdistro=el7.9
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el8.6
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=ubuntu2004

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7003e95577ab3a9314e8d4d29bb6b1784b9f8ae7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-11787 test: Fix checkfilemap tests for 64K page
James Simmons [Mon, 31 Jan 2022 17:44:46 +0000 (12:44 -0500)]
LU-11787 test: Fix checkfilemap tests for 64K page

File mapping is page size aligned. Modify the tests to handle 64K
page.

Lustre-change: https://review.whamcloud.com/45629
Lustre-commit: 7c88dfd28b5cc6114a85f187ecb2473657d42c9d

Test-Parameters: trivial clientdistro=el8.5 clientarch=aarch64 testlist=sanityn env=ONLY="71a 71b"
Change-Id: I316a197db8cdd0f9064431f8c572b43adf6110b8
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48945
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15278 lod: distinguish DIR/REGULAR lod_object members
Bobi Jam [Sat, 25 Dec 2021 14:36:40 +0000 (22:36 +0800)]
LU-15278 lod: distinguish DIR/REGULAR lod_object members

In lod_striping_free_nolock(), we need to distinguish lod_object
type, since DIR/REGULAR lod_object structure share the same memory
region, it could accidently free some unintended memory if it treat
DIR lod_object as REGULAR one, or vice versa.

Lustre-change: https://review.whamcloud.com/45710
Lustre-commit: 7a9c9ccabe93f2d96c80e90f8cbb786faca74835

Fixes: 6a20bdcc608b ("LU-11376 lov: new foreign LOV format")
Fixes: fdad38781ccc ("LU-11376 lmv: new foreign LMV format")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d4c563725b35f7a75f0f1fbf9c1d35b1799eff4
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/45940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoEX-4147 tests: fix interop for sanity test_160h
Xing Huang [Thu, 27 Oct 2022 11:41:11 +0000 (19:41 +0800)]
EX-4147 tests: fix interop for sanity test_160h

Add a check sanity test_160h whether /sbin/umount.lustre is installed
on the MDS, since this subtest is checking whether the MDS unmount
process has completed, and otherwise fails during interop testing.

Test-Parameters: testlist=sanity env=ONLY=160 serverversion=EXA5
Fixes: 6d62073950ac ("EX-3209 lipe: add lpcc util and service")
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I6720b9e27a3a92e543ed877453802d23c0eef36d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48970
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn65
Andreas Dilger [Mon, 31 Oct 2022 04:11:09 +0000 (22:11 -0600)]
RM-620 build: New tag 2.14.0-ddn65

New tag 2.14.0-ddn65

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7bb4b45f5addc0c0d62dcf81c53cb114ad6454c1

2 years agoLU-15829 llite: don't use a kms if it invalid.
Alexey Lyashkov [Thu, 19 May 2022 17:35:18 +0000 (20:35 +0300)]
LU-15829 llite: don't use a kms if it invalid.

Lockless DIO don't update a KMS as other IO type does,
it caused a situation when next read don't known a real file size
to be read. Lets avoid using an invalid KMS.

Lustre-change: https://review.whamcloud.com/47395
Lustre-commit: dc907414db16d99e77aecf6bfd41d82b8cf7c36e

Fixes: 6bce5367 (LU-4198 clio: turn on lockless for some kind of IO)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie71d3f3cc24fc16c03ed07f9f5a3a17c7fdfa684
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4141 lipe: lamigo should detect dead OST and restart ALR
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR

Use '# keepalive' message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message

To make lamigo compatible with older
ofd_access_log_reader lamigo can work in two modes:
1. lamigo does not expect '# keepalive' message.
In this case after timeout it will restart
ofd_access_log_reader silently
2. lamigo expects periodical # keepalive
message. If lamigo does not receive keepalive message
or any other message from ofd_access_log_reader
within timeout it reports error message and
restarts ofd_access_log_reader.
lamigo switches from 1 to 2 once it receives
'# keepalive' message

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I55bc92b03ef5b45b72ff59ffd4b450cd1927cdb0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14719 lod: distributed transaction check space
Lai Siyao [Wed, 30 Mar 2022 21:50:22 +0000 (17:50 -0400)]
LU-14719 lod: distributed transaction check space

Distributed transaction failure may cause file missing or disconnected
directories, to avoid failure on disk full, check remote MDT free
space before transaction start.

The block/inode watermarks in obd_statfs_info are used to check
whether MDT has enough free blocks/inodes.

Add sanity 230x.

Lustre-commit: 6aee406c84b6b8fddf08b560acfcdf7c13c97e63
Lustre-change: https://review.whamcloud.com/47039

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0922e9c8668e8b842d313576bd68b52fa5d434ac
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6193 pcc: dio attach failed on non-blksz-aligned file
Qian Yingjin [Fri, 21 Oct 2022 03:49:35 +0000 (23:49 -0400)]
EX-6193 pcc: dio attach failed on non-blksz-aligned file

PCC attach failed due to do DIO copy on files with blksz unligned
file size.
The reason is that the copy tool ll_fid_path_copy fails on
non-blksize-aligned file for PCC backend (such as a local Ext4
file system) using direct I/O.
In this path, it fixes this bug by falling back from direct I/O to
buffered I/O mode when copy the tail non-blksize-aligned file
part.

This patch also sets the errno with return code in the function
@get_root_path(), thus the call for @llaip_open_by_fid() with
invalid mount path will see the correct errno.

Change-Id: I5287563029269032a91397c0094e2ccede73b9b1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15031 quota: reseed glbe in qmt_lvbo_udate
Sergey Cheremencev [Fri, 28 Oct 2022 10:29:03 +0000 (18:29 +0800)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate

Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.

The patch also adds test 80 to check that OST gets correct
edquot value after failover.

Lustre-change: https://review.whamcloud.com/45032
Lustre-commit: 61ec1e0f2ca8dc4c9f7ed41f782960e65cab0920

HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: move tti_ transaction params to tsi_
Mikhail Pershin [Sat, 28 May 2022 18:16:11 +0000 (21:16 +0300)]
LU-15847 tgt: move tti_ transaction params to tsi_

Move tti_mult_trans and tti_has_trans to tgt_session_info to
be available in all targets. This allows to cleanup old MDT
duplicating code and can be used for complex transaction
handling in MDT/OFD if needed.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47491
Lustre-commit: 0a317b171ebedcba8fc58e548991a884186c350c

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: reply always with the latest assigned transno
Mikhail Pershin [Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)]
LU-15847 tgt: reply always with the latest assigned transno

In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Lustre-commit: 4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15447 tests: sanity-flr/208 reset rotational status
Alex Zhuravlev [Thu, 13 Jan 2022 07:27:21 +0000 (10:27 +0300)]
LU-15447 tests: sanity-flr/208 reset rotational status

new kernels (e.g. 4.18.0-305.25.1) declares loopback devices
in tmpfs as non-rotational one. sanity-flr/208 does wrong
assumption that devices are non-rotational by default. thus,
sanity-flr/208 started to fail with new kernels.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/46088
Lustre-commit: 78dddb423f0dc8571d3c7f8ccd8f77a1c2bc28ae

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib5c42da39667227a6cff5d379e30d2cd6c1e2773
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16106 lnet: allow direct messages regardless of peer NI status
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status

If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.

This change allows non-routed traffic to peer NIs with "down"
status.

Lustre-commit: 3345a8a54e89c342a4ce2d8d4bcb04ee919bcd52
Lustre-change: https://review.whamcloud.com/c/48355

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48669
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16025 llite: adjust read count as file got truncated
Bobi Jam [Thu, 7 Jul 2022 07:38:54 +0000 (15:38 +0800)]
LU-16025 llite: adjust read count as file got truncated

File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.

This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.

Lustre-change: https://review.whamcloud.com/47896
Lustre-commit: 4468f6c9d92448cb72c5a616ec74653e83ee8e10

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie51ba09201a1ca1464c3a3892d367590e978ee34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48848
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14642 test: add fsx mirror file test mode
Bobi Jam [Thu, 2 Sep 2021 16:30:10 +0000 (00:30 +0800)]
LU-14642 test: add fsx mirror file test mode

- add fsx mirror file test mode with "-M" option so that fsx can exert
its IO to FLR file as well as extend/split/resync the FLR file.

- add sanity-flr test_70b() to test fsx with flrmode.

- fix a bug in "lfs mirror verify" to accomodate max mirror count
instead of (max - 1) mirrors.

- improve "lfs mirror verify -v" print proper data range of its crc-32
checksum values.

Lustre-change: https://review.whamcloud.com/43473
Lustre-commit: 90ba8b4ac360b1987178445bd2ccd64f7958d912

Test-Parameters: testlist=sanity-flr env=ONLY=70a,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib55c7b25dcd82fa0b197ad21268b16c82aab5da9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16249 sec: krb5_decrypt_bulk calls decryption primitive
Sebastien Buisson [Tue, 18 Oct 2022 15:19:01 +0000 (17:19 +0200)]
LU-16249 sec: krb5_decrypt_bulk calls decryption primitive

krb5_decrypt_bulk() was mistakenly calling an encryption primitive
instead of a decryption primitive for the confounder.

Lustre-change: https://review.whamcloud.com/48907
Lustre-commit: TBD (851f3915659941db00a0cda58867e68139e5e0d1)

Test-Parameters: trivial
Fixes: 0a65279121 ("LU-13344 gss: Update crypto to use sync_skcipher")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9251172644ed6baa3bb06a59dbe7c1bab401d817
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15097 quota: stop pool_recalc before killing pool
Sergey Cheremencev [Wed, 19 Oct 2022 11:18:04 +0000 (19:18 +0800)]
LU-15097 quota: stop pool_recalc before killing pool

qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:

    qmt_pool_fini() ASSERTION(list_empty(&qmt->qmt_pool_list)) failed

Lustre-change: https://review.whamcloud.com/45256
Lustre-commit: 862f0baa7c21cb631b98d3886ef9e938f4519573

Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4567 kernel: add extra field for snapshot in el8
Hongchao Zhang [Fri, 21 Oct 2022 07:43:11 +0000 (03:43 -0400)]
EX-4567 kernel: add extra field for snapshot in el8

Adding extra fields in "struct jbd2_journal_handle" and
"struct journal_head", which are used by snapshot into the
4-byte hole at the end of struct jbd2_journal_handle so
that they do not increase the structure size and memory
usage for this common allocation.

Use RH_KABI_EXTEND() and RH_KABI_FILL_HOLE() so that the
new fields do not affect the kernel ABI compatibility.

Change-Id: I84f52b18694e56d837d64c5c80076e45dde27eab
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6102 lipe: lipe_scan3 not intended for customer use
Alexandre Ioffe [Tue, 25 Oct 2022 03:06:08 +0000 (20:06 -0700)]
EX-6102 lipe: lipe_scan3 not intended for customer use

Print warning lipe_scan3 is not intended for customer use

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I92f775d77e1d4ffac304d3e46ed6af7c642a3bdd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11388 tests: exclude replay-single/131b for ldiskfs
Andreas Dilger [Fri, 14 Oct 2022 21:09:03 +0000 (15:09 -0600)]
LU-11388 tests: exclude replay-single/131b for ldiskfs

Test is failing about 1/10 of the test runs, even on ldiskfs.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c36d026944876e066a1dc36877927b7a92c537e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48946

2 years agoEX-5099 lipe: Made controllable ssh exec timeout
Alexandre Ioffe [Wed, 13 Apr 2022 05:34:18 +0000 (22:34 -0700)]
EX-5099 lipe: Made controllable ssh exec timeout

- Introduce new lipe ssh API:lipe_ssh_exec_timeout() and
lipe_ssh_start_cmd_timeout().
- Introduce new lamigo command option: --ssh-exec-timeout
to configure ssh connection timeout for ssh exec cmd
- Use lipe_ssh_start_cmd_timeout() to start remote
access log reader with timeout.
Use ssh_channel_read_timeout() with infinite timeout
when reads access log records
- Use lipe_ssh_start_cmd_timeout() to start remote "lfs ..."
commands with a long timeout to prevent premature timeout
when "lfs mirror extend ..." command for a big file
takes too long time.
- Use default timeout 600 seconds for ssh exec cmd.
Such long timeout should allow to finish long lasting
replications
This fixes EX-5429.

Test-Parameters: trivial clientdistro=el8.5 serverdistro=el8.5 testlist=hot-pools env=FAIL_ON_ERROR=false,ONLY="56 59",ONLY_REPEAT=20
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I8de9b1db2014abd1e6f201cda73a0812128f6bb6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]
Jian Yu [Fri, 21 Oct 2022 20:35:40 +0000 (13:35 -0700)]
LU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]

Update SLES15 SP3 kernel to 5.3.18-150300.59.93.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48601
Lustre-commit: c3467db7e7d0652c09bdcef26e2b708ab51cba9e

Change-Id: I1e0afe6974567d13680dbb0d463fbbd873ef2e5f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48864
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
Andreas Dilger [Thu, 6 Oct 2022 17:31:51 +0000 (10:31 -0700)]
LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed

This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.

Lustre-change: https://review.whamcloud.com/48629
Lustre-commit: 9a3e111a2ebdfadec4b6efc65899856edc90ad18

Test-Parameters: testlist=sanity clientdistro=el8.6
Test-Parameters: testlist=sanity clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"
Change-Id: I50f56b87844e8b019053e569767b6c949d2a3f55
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Lustre-change: https://review.whamcloud.com/44984
Lustre-commit: 1711e26ae861c28829870c2433caf7ee232909cf

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16044 osd: discard pagecache in truncate's declaration
Alex Zhuravlev [Mon, 25 Jul 2022 13:26:40 +0000 (16:26 +0300)]
LU-16044 osd: discard pagecache in truncate's declaration

to avoid taking pagelock inside a transaction which conflicts
with the write path where we take pagelock before any another one.
this should be safe as the write path writes the pages out
synchronously, so they should be clean by truncate.

Lustre-change: https://review.whamcloud.com/48033
Lustre-commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iba555ace2ce9ef34ab5517375ecb5c176f738a02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48885
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16076 utils: enhance 'lfs check' command
Lei Feng [Mon, 8 Aug 2022 02:59:25 +0000 (10:59 +0800)]
LU-16076 utils: enhance 'lfs check' command

Add optional argument to 'lfs check' command so that only the
servers related to the specified lustre file system is checked.

lustre-change: https://review.whamcloud.com/48155
lustre-commit: f5ca6853b8d8b918b0228af31fa8249be49d3000

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=113
Change-Id: I826a8e822af0a290f06ffaadadf1bb7f86899d99
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15305 obdclass: fix race in class_del_profile
Li Dongyang [Fri, 7 Oct 2022 12:09:10 +0000 (23:09 +1100)]
LU-15305 obdclass: fix race in class_del_profile

Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.

Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Lustre-commit: 83d3f42118579d7fb7c3002533c047badcf41e0d

Change-Id: I648aa206716213b064d045f546516b219337e0ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15467 tests: fix sanity-hsm test_103a timeout issue
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue

Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.

Lustre-change: https://review.whamcloud.com/46252
Lustre-commit: 98e1e41ce47c95155a8c8d452eef5074492d22f0

Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>