Whamcloud - gitweb
Jian Yu [Tue, 8 Nov 2022 18:40:24 +0000 (10:40 -0800)]
LU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]
Update RHEL9.0 kernel to 5.14.0-70.30.1.el9_0 for Lustre client.
Lustre-change: https://review.whamcloud.com/49044
Lustre-commit: TBD (from
247849f22a32e85eb8b718d18642f65ac7663a82)
Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ide942f88242c80af1e103b226b65cfbce94bfb57
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Etienne AUJAMES [Fri, 29 Jul 2022 12:35:33 +0000 (14:35 +0200)]
LU-15935 target: keep track of multirpc slots in last_rcvd
OBD_INCOMPAT_MULTI_RPCS is cleared by tgt_boot_epoch_update() if the
recovery is aborted. This supposes that all the clients are evicted
but that is not true. Some clients could have successfully finished
their recovery. In that case, those clients will keep their last_rcvd
slot.
This patch modifies lut_num_client to keep track of multirpc
slots in last_rcvd.
For now the counter is use only by tgt_fini() to clear
OBD_INCOMPAT_MULTI_RPCS. So we can expand this use case for
tgt_boot_epoch_update().
Add replay-dual test_33.
Lustre-change: https://review.whamcloud.com/48082
Lustre-commit:
1a79d395dd61ea2e21598bfaa5b39375e64ec22c
Test-Parameters: testlist=replay-dual env=ONLY=33,ONLY_REPEAT=30
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I70791c9dcb7cc77f018b9e5c95568598d54f0322
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andrew Perepechko [Thu, 10 Nov 2022 04:59:27 +0000 (20:59 -0800)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic
When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.
A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.
Lustre-change: https://review.whamcloud.com/46358
Lustre-commit:
e239a14001b62d96c186ae2c9f58402f73e63dcc
Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lei Feng [Wed, 19 Oct 2022 04:10:23 +0000 (12:10 +0800)]
LU-16251 obdclass: fill jobid in a safe way
jobid_interpret_string() does not fill jobid in an atomic way.
So in lustre_get_jobid() give it a buffer first, then copy the
buffer to jobid as a whole.
Lustre-change: https://review.whamcloud.com/48915
Lustre-commit:
9a0a89520e8b57bd63a9343fe3cdc56c61c41f6d
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ib8f6aaa93df31867982a0d142f33d7374a27234f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49081
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alexander Zarochentsev [Fri, 29 Jul 2022 19:38:09 +0000 (22:38 +0300)]
LU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode
The flag should be cleared for "fast" symlinks otherwise
e2fsck complains about inode correctness.
New agent inodes of symlink type may have EXT4_EXTENT_FL flag
set if the fs has "extent" feature and it is not cleared as in
other places where "fast" symlinks are created.
Lustre-change: https://review.whamcloud.com/48093
Lustre-commit:
73ac8e35e5d64d3fe4ca6c48514dc57058e3a7b8
HPE-bug-id: LUS-10237
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib7b807bb1298cc3a9fd4fdba35747b4bda6fe034
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Fri, 21 Oct 2022 04:54:49 +0000 (23:54 -0500)]
LU-16258 llite: Explicitly support .splice_write
Linux commit v5.9-rc1-6-g36e2c7421f02
fs: don't allow splice read/write without explicit ops
Lustre supports splice_write and previously provide handlers
for splice_read.
Explicitly use iter_file_splice_write, if it exists.
Lustre-change: https://review.whamcloud.com/48928
Lustre-commit:
c619b6d6a54235cc0e34a65cf5916a632f4011c3
HPE-bug-id: LUS-11259
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I858688fc9b4dd370b6018c3b134f01e580477b25
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Tue, 4 Oct 2022 16:24:36 +0000 (09:24 -0700)]
LU-16207 build: add rpm-build BuildRequires for SLES15 SP3
SLES15 SP3 fails to build using rpm-build-4.14.1-29.46
from the main O/S repository with error message:
- Dependency tokens must begin with alpha-numeric,
'_' or '/': BuildRequires: %kernel_module_package_buildreqs
Updating rpm-build to 4.14.3-150300.46.1 or higher
resolved the build issue.
Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity
Lustre-change: https://review.whamcloud.com/48760
Lustre-commit:
78c681d9f42cb56e30c8946e5d7b05f0bc6e86f2
Change-Id: I80099e7ba2d98e07b9877183879766f3dd7f3c1a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Minh Diep [Wed, 9 Nov 2022 21:21:32 +0000 (13:21 -0800)]
EX-5473 tests: add version check for interop
sanity-quota test_75 on 2.12 servers
Test-Parameters: trivial testlist=sanity-quota
Change-Id: I57f5b6415017ec7cf81e3bcb43f289087a8621fd
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Tue, 8 Nov 2022 18:32:25 +0000 (10:32 -0800)]
EX-6331 lipe: lamigo --help causes Segmentation fault
Fixed printf NULL string argument which causes the seg fault
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I0a9bc3cee308c8cd88d23674bb5127cddb1fdb41
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Wed, 26 Oct 2022 08:17:11 +0000 (11:17 +0300)]
LU-15847 target: report multiple transno to debug log
Don't report multiple transaction cases to console but
make it as debug message.
Lustre-change: https://review.whamcloud.com/49027
Lustre-commit: TBD (
1550da71c46f65b72951c0348f32835ed7f617fb)
Fixes:
4e2e8fd2fc0a ("LU-15847 tgt: reply always with the latest assigned transno")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If9b47dfedcaf67487954189e8a75d2029a502469
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49027
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Wed, 9 Nov 2022 19:06:37 +0000 (11:06 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: If214f7285dfb96dee24e6c5968f1f19c81ce1ddf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49085
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Wed, 2 Nov 2022 10:08:50 +0000 (18:08 +0800)]
LU-15179 tests: add trap cleanup_quota_test
Add stack_trap cleanup_quota_test to the tests that
use setup_quota_test. If a test fails without calling
cleanup_quota_test, it may cause later tests to fail
due to used space > 0.
Remove ${tdir}_dom, if exists, in cleanup_quota_test.
sanity-quota_75 doesn't remove test_dom directory.
Lustre-change: https://review.whamcloud.com/#/c/45418/
Lustre-commit:
c44b2bea1bacc3cb9173353037cf3a616f13669f
Test-Parameters: trivial testlist=sanity-quota
Fixes:
a4fbe734("LU-14739 quota: nodemap squashed root cannot bypass quota")
Change-Id: Ife4fd499b427bee79f74a5e172d233fe6a83e240
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Wed, 12 Oct 2022 07:32:36 +0000 (10:32 +0300)]
LU-14958 kernel: use rhashtable for revoke records in jbd2
resizable hashtable should improve journal replay time when
the latter has got million of revoke records
before:
1048576 records - 95 seconds
2097152 records - 580 seconds
after:
1048576 records - 2 seconds
2097152 records - 3 seconds
4194304 records - 7 seconds
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8f54a51df5e3387277b976e046eea70c26d54dcd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Wed, 12 Oct 2022 09:22:14 +0000 (12:22 +0300)]
LU-16232 scripts: changelog/updatelog emergency cleanup
Emergency cleanup scripts for situations when llogs are
corrupted and can't be cleaned up in a normal way. In such
cases the recommendation is to remove/truncate those llogs.
Scripts make all needed steps and have debugging option to
collect llogs for further analysis.
Scripts possible actions are:
- dry-run mode to check all actions and files affected
- create archive with all llogs for analysis
- remove llogs including all plain llogs
Lustre-change: https://review.whamcloud.com/48838
Lustre-commit:
b533700add91fe4220f50d057a470e0b6f4893c9
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3b197179bc54f451e3c5d7db36b6f1c56c076856
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49023
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Mon, 3 Oct 2022 15:35:25 +0000 (18:35 +0300)]
LU-16203 llog: skip bad records in llog
This patch is further development of idea to skip bad
(corrupted) llogs data. If llog has fixed-size records
then it is possible to skip one record but not rest of
llog block.
Patch also fixes the skipping to the next chunk:
- make sure to skip to the next block for partial chunk
or it causes the same block re-read.
- handle index == 0 as goal for the llog_next_block() as
expected exclusion and just return requested block
- set new index after block was skipped to the first one
in block
- don't create fake padding record in llog_osd_next_block()
as the caller can handle it and would know about
- restore test_8 functionality to check corruption handling
Lustre-change: https://review.whamcloud.com/48776
Lustre-commit: TBD (from
5896c420d82507f90473414df3e6d342126cc21f)
Fixes:
ec4194e4e78c ("LU-11591 llog: add synchronization for the last record")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6f88269e8626269268352f8bfd6d7950de438f3a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog
Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.
These are "temporary" peers which can be overwritten by discovery.
In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().
Lustre-change: https://review.whamcloud.com/43510
Lustre-commit:
16321de596f6395153be6cbb6192250516963077
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers
Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.
Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.
Lustre-change: https://review.whamcloud.com/43509
Lustre-commit:
ac201366ad5700edc860c139955af8a09bf53a1a
Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Thu, 3 Nov 2022 20:09:15 +0000 (13:09 -0700)]
LU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]
Update RHEL8.6 kernel to 4.18.0-372.32.1.el8_6.
Lustre-change: https://review.whamcloud.com/48969
Lustre-commit: TBD (from
c4a23690d3328447c7b4ddbb8f567b2de21457b6)
Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity
Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity
Change-Id: I5576180ddf10ed2b0a5e2ef85b58fef993de65a4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Tue, 13 Sep 2022 18:06:10 +0000 (02:06 +0800)]
LU-15259 tests: use existing usernames for setfacl
In SLES15.2 and Ubutntu 20 the "bin" and "daemon" users are not
defined in /etc/passwd, causing setfacl to print a cryptic error:
setfacl -m u:bin:rw f -- failed
~ ? setfacl: Option -m: Invalid argument near character 3
Replace "bin" and "daemon" in ACL tests so they are run with user
and group names that exist on all distros currently being tested.
They can also be specified via ACLUSR1/ACLUSR2 in the test config.
The "permission_xattr" test also needs "nobody" user and group.
Also, the "getfacl" command prints users and groups in numerical
order, so the ACL tests will fail if "daemon" < "bin", or if either
group is higher than the "users" group. Fix them as needed.
Lustre-change: https://review.whamcloud.com/45627
Lustre-commit:
60188994e24b95db5915b8e6802f7963ffb2fd9c
Test-Parameters: trivial testlist=sanity-quota,sanity-sec,pjdfstest
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el7.9 serverdistro=el7.9
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el8.6
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=ubuntu2004
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7003e95577ab3a9314e8d4d29bb6b1784b9f8ae7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
James Simmons [Mon, 31 Jan 2022 17:44:46 +0000 (12:44 -0500)]
LU-11787 test: Fix checkfilemap tests for 64K page
File mapping is page size aligned. Modify the tests to handle 64K
page.
Lustre-change: https://review.whamcloud.com/45629
Lustre-commit:
7c88dfd28b5cc6114a85f187ecb2473657d42c9d
Test-Parameters: trivial clientdistro=el8.5 clientarch=aarch64 testlist=sanityn env=ONLY="71a 71b"
Change-Id: I316a197db8cdd0f9064431f8c572b43adf6110b8
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48945
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Bobi Jam [Sat, 25 Dec 2021 14:36:40 +0000 (22:36 +0800)]
LU-15278 lod: distinguish DIR/REGULAR lod_object members
In lod_striping_free_nolock(), we need to distinguish lod_object
type, since DIR/REGULAR lod_object structure share the same memory
region, it could accidently free some unintended memory if it treat
DIR lod_object as REGULAR one, or vice versa.
Lustre-change: https://review.whamcloud.com/45710
Lustre-commit:
7a9c9ccabe93f2d96c80e90f8cbb786faca74835
Fixes:
6a20bdcc608b ("LU-11376 lov: new foreign LOV format")
Fixes:
fdad38781ccc ("LU-11376 lmv: new foreign LMV format")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d4c563725b35f7a75f0f1fbf9c1d35b1799eff4
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/45940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Xing Huang [Thu, 27 Oct 2022 11:41:11 +0000 (19:41 +0800)]
EX-4147 tests: fix interop for sanity test_160h
Add a check sanity test_160h whether /sbin/umount.lustre is installed
on the MDS, since this subtest is checking whether the MDS unmount
process has completed, and otherwise fails during interop testing.
Test-Parameters: testlist=sanity env=ONLY=160 serverversion=EXA5
Fixes:
6d62073950ac ("EX-3209 lipe: add lpcc util and service")
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I6720b9e27a3a92e543ed877453802d23c0eef36d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48970
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Mon, 31 Oct 2022 04:11:09 +0000 (22:11 -0600)]
RM-620 build: New tag 2.14.0-ddn65
New tag 2.14.0-ddn65
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7bb4b45f5addc0c0d62dcf81c53cb114ad6454c1
Alexey Lyashkov [Thu, 19 May 2022 17:35:18 +0000 (20:35 +0300)]
LU-15829 llite: don't use a kms if it invalid.
Lockless DIO don't update a KMS as other IO type does,
it caused a situation when next read don't known a real file size
to be read. Lets avoid using an invalid KMS.
Lustre-change: https://review.whamcloud.com/47395
Lustre-commit:
dc907414db16d99e77aecf6bfd41d82b8cf7c36e
Fixes:
6bce5367 (LU-4198 clio: turn on lockless for some kind of IO)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie71d3f3cc24fc16c03ed07f9f5a3a17c7fdfa684
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR
Use '# keepalive' message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message
To make lamigo compatible with older
ofd_access_log_reader lamigo can work in two modes:
1. lamigo does not expect '# keepalive' message.
In this case after timeout it will restart
ofd_access_log_reader silently
2. lamigo expects periodical # keepalive
message. If lamigo does not receive keepalive message
or any other message from ofd_access_log_reader
within timeout it reports error message and
restarts ofd_access_log_reader.
lamigo switches from 1 to 2 once it receives
'# keepalive' message
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I55bc92b03ef5b45b72ff59ffd4b450cd1927cdb0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lai Siyao [Wed, 30 Mar 2022 21:50:22 +0000 (17:50 -0400)]
LU-14719 lod: distributed transaction check space
Distributed transaction failure may cause file missing or disconnected
directories, to avoid failure on disk full, check remote MDT free
space before transaction start.
The block/inode watermarks in obd_statfs_info are used to check
whether MDT has enough free blocks/inodes.
Add sanity 230x.
Lustre-commit:
6aee406c84b6b8fddf08b560acfcdf7c13c97e63
Lustre-change: https://review.whamcloud.com/47039
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0922e9c8668e8b842d313576bd68b52fa5d434ac
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Qian Yingjin [Fri, 21 Oct 2022 03:49:35 +0000 (23:49 -0400)]
EX-6193 pcc: dio attach failed on non-blksz-aligned file
PCC attach failed due to do DIO copy on files with blksz unligned
file size.
The reason is that the copy tool ll_fid_path_copy fails on
non-blksize-aligned file for PCC backend (such as a local Ext4
file system) using direct I/O.
In this path, it fixes this bug by falling back from direct I/O to
buffered I/O mode when copy the tail non-blksize-aligned file
part.
This patch also sets the errno with return code in the function
@get_root_path(), thus the call for @llaip_open_by_fid() with
invalid mount path will see the correct errno.
Change-Id: I5287563029269032a91397c0094e2ccede73b9b1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Fri, 28 Oct 2022 10:29:03 +0000 (18:29 +0800)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate
Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.
The patch also adds test 80 to check that OST gets correct
edquot value after failover.
Lustre-change: https://review.whamcloud.com/45032
Lustre-commit:
61ec1e0f2ca8dc4c9f7ed41f782960e65cab0920
HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Sat, 28 May 2022 18:16:11 +0000 (21:16 +0300)]
LU-15847 tgt: move tti_ transaction params to tsi_
Move tti_mult_trans and tti_has_trans to tgt_session_info to
be available in all targets. This allows to cleanup old MDT
duplicating code and can be used for complex transaction
handling in MDT/OFD if needed.
Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47491
Lustre-commit:
0a317b171ebedcba8fc58e548991a884186c350c
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)]
LU-15847 tgt: reply always with the latest assigned transno
In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.
The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.
Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.
Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Lustre-commit:
4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Thu, 13 Jan 2022 07:27:21 +0000 (10:27 +0300)]
LU-15447 tests: sanity-flr/208 reset rotational status
new kernels (e.g. 4.18.0-305.25.1) declares loopback devices
in tmpfs as non-rotational one. sanity-flr/208 does wrong
assumption that devices are non-rotational by default. thus,
sanity-flr/208 started to fail with new kernels.
Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/46088
Lustre-commit:
78dddb423f0dc8571d3c7f8ccd8f77a1c2bc28ae
Fixes:
8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib5c42da39667227a6cff5d379e30d2cd6c1e2773
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status
If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.
This change allows non-routed traffic to peer NIs with "down"
status.
Lustre-commit:
3345a8a54e89c342a4ce2d8d4bcb04ee919bcd52
Lustre-change: https://review.whamcloud.com/c/48355
Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48669
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bobi Jam [Thu, 7 Jul 2022 07:38:54 +0000 (15:38 +0800)]
LU-16025 llite: adjust read count as file got truncated
File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.
This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.
Lustre-change: https://review.whamcloud.com/47896
Lustre-commit:
4468f6c9d92448cb72c5a616ec74653e83ee8e10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie51ba09201a1ca1464c3a3892d367590e978ee34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48848
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bobi Jam [Thu, 2 Sep 2021 16:30:10 +0000 (00:30 +0800)]
LU-14642 test: add fsx mirror file test mode
- add fsx mirror file test mode with "-M" option so that fsx can exert
its IO to FLR file as well as extend/split/resync the FLR file.
- add sanity-flr test_70b() to test fsx with flrmode.
- fix a bug in "lfs mirror verify" to accomodate max mirror count
instead of (max - 1) mirrors.
- improve "lfs mirror verify -v" print proper data range of its crc-32
checksum values.
Lustre-change: https://review.whamcloud.com/43473
Lustre-commit:
90ba8b4ac360b1987178445bd2ccd64f7958d912
Test-Parameters: testlist=sanity-flr env=ONLY=70a,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib55c7b25dcd82fa0b197ad21268b16c82aab5da9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Tue, 18 Oct 2022 15:19:01 +0000 (17:19 +0200)]
LU-16249 sec: krb5_decrypt_bulk calls decryption primitive
krb5_decrypt_bulk() was mistakenly calling an encryption primitive
instead of a decryption primitive for the confounder.
Lustre-change: https://review.whamcloud.com/48907
Lustre-commit: TBD (
851f3915659941db00a0cda58867e68139e5e0d1)
Test-Parameters: trivial
Fixes:
0a65279121 ("LU-13344 gss: Update crypto to use sync_skcipher")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9251172644ed6baa3bb06a59dbe7c1bab401d817
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Wed, 19 Oct 2022 11:18:04 +0000 (19:18 +0800)]
LU-15097 quota: stop pool_recalc before killing pool
qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:
qmt_pool_fini() ASSERTION(list_empty(&qmt->qmt_pool_list)) failed
Lustre-change: https://review.whamcloud.com/45256
Lustre-commit:
862f0baa7c21cb631b98d3886ef9e938f4519573
Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Hongchao Zhang [Fri, 21 Oct 2022 07:43:11 +0000 (03:43 -0400)]
EX-4567 kernel: add extra field for snapshot in el8
Adding extra fields in "struct jbd2_journal_handle" and
"struct journal_head", which are used by snapshot into the
4-byte hole at the end of struct jbd2_journal_handle so
that they do not increase the structure size and memory
usage for this common allocation.
Use RH_KABI_EXTEND() and RH_KABI_FILL_HOLE() so that the
new fields do not affect the kernel ABI compatibility.
Change-Id: I84f52b18694e56d837d64c5c80076e45dde27eab
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Tue, 25 Oct 2022 03:06:08 +0000 (20:06 -0700)]
EX-6102 lipe: lipe_scan3 not intended for customer use
Print warning lipe_scan3 is not intended for customer use
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I92f775d77e1d4ffac304d3e46ed6af7c642a3bdd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 14 Oct 2022 21:09:03 +0000 (15:09 -0600)]
LU-11388 tests: exclude replay-single/131b for ldiskfs
Test is failing about 1/10 of the test runs, even on ldiskfs.
Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c36d026944876e066a1dc36877927b7a92c537e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48946
Alexandre Ioffe [Wed, 13 Apr 2022 05:34:18 +0000 (22:34 -0700)]
EX-5099 lipe: Made controllable ssh exec timeout
- Introduce new lipe ssh API:lipe_ssh_exec_timeout() and
lipe_ssh_start_cmd_timeout().
- Introduce new lamigo command option: --ssh-exec-timeout
to configure ssh connection timeout for ssh exec cmd
- Use lipe_ssh_start_cmd_timeout() to start remote
access log reader with timeout.
Use ssh_channel_read_timeout() with infinite timeout
when reads access log records
- Use lipe_ssh_start_cmd_timeout() to start remote "lfs ..."
commands with a long timeout to prevent premature timeout
when "lfs mirror extend ..." command for a big file
takes too long time.
- Use default timeout 600 seconds for ssh exec cmd.
Such long timeout should allow to finish long lasting
replications
This fixes EX-5429.
Test-Parameters: trivial clientdistro=el8.5 serverdistro=el8.5 testlist=hot-pools env=FAIL_ON_ERROR=false,ONLY="56 59",ONLY_REPEAT=20
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I8de9b1db2014abd1e6f201cda73a0812128f6bb6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Fri, 21 Oct 2022 20:35:40 +0000 (13:35 -0700)]
LU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]
Update SLES15 SP3 kernel to 5.3.18-150300.59.93.1 for Lustre client.
Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity
Lustre-change: https://review.whamcloud.com/48601
Lustre-commit:
c3467db7e7d0652c09bdcef26e2b708ab51cba9e
Change-Id: I1e0afe6974567d13680dbb0d463fbbd873ef2e5f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48864
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Thu, 6 Oct 2022 17:31:51 +0000 (10:31 -0700)]
LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.
Lustre-change: https://review.whamcloud.com/48629
Lustre-commit:
9a3e111a2ebdfadec4b6efc65899856edc90ad18
Test-Parameters: testlist=sanity clientdistro=el8.6
Test-Parameters: testlist=sanity clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"
Change-Id: I50f56b87844e8b019053e569767b6c949d2a3f55
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT
It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.
Lustre-change: https://review.whamcloud.com/44984
Lustre-commit:
1711e26ae861c28829870c2433caf7ee232909cf
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Mon, 25 Jul 2022 13:26:40 +0000 (16:26 +0300)]
LU-16044 osd: discard pagecache in truncate's declaration
to avoid taking pagelock inside a transaction which conflicts
with the write path where we take pagelock before any another one.
this should be safe as the write path writes the pages out
synchronously, so they should be clean by truncate.
Lustre-change: https://review.whamcloud.com/48033
Lustre-commit:
0bb491b2ecf494c3f78fa08a101af8af7853a0fe
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iba555ace2ce9ef34ab5517375ecb5c176f738a02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48885
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Mon, 8 Aug 2022 02:59:25 +0000 (10:59 +0800)]
LU-16076 utils: enhance 'lfs check' command
Add optional argument to 'lfs check' command so that only the
servers related to the specified lustre file system is checked.
lustre-change: https://review.whamcloud.com/48155
lustre-commit:
f5ca6853b8d8b918b0228af31fa8249be49d3000
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=113
Change-Id: I826a8e822af0a290f06ffaadadf1bb7f86899d99
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Li Dongyang [Fri, 7 Oct 2022 12:09:10 +0000 (23:09 +1100)]
LU-15305 obdclass: fix race in class_del_profile
Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.
Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.
Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Lustre-commit:
83d3f42118579d7fb7c3002533c047badcf41e0d
Change-Id: I648aa206716213b064d045f546516b219337e0ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue
Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.
Lustre-change: https://review.whamcloud.com/46252
Lustre-commit:
98e1e41ce47c95155a8c8d452eef5074492d22f0
Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Mon, 17 Oct 2022 19:53:44 +0000 (12:53 -0700)]
LU-15646 llog: correct llog FID and path output
- fix wrong LLOG_ID-to-FID convertion to output llog FID by
introducing PLOGID macro to expand llog ID for DFID format
- stop printing lgl_ogen along with llog FID as it always zero
since 2.3.51 and is not used anymore
- output correct path for update llog in llog_reader
- always print header info in llog_reader if available
- print llog flags in header info
Lustre-change: https://review.whamcloud.com/48430
Lustre-commit:
e28f3ee185b2ef7bad8046f46444772fac214a40
Fixes:
5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Bruno Faccini [Mon, 17 Oct 2022 19:46:25 +0000 (12:46 -0700)]
LU-6612 utils: strengthen llog_reader vs wrong format/header
The following snippet shows that llog_reader can be puzzled due to
an invalid 0 for the number of records when parsing an expected
LLOG file header :
root# dd if=/dev/zero bs=4096 count=1 of=/tmp/zeroes
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.
000263962 s, 15.5 MB/s
root# llog_reader /tmp/zeroes
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12
Lustre-change: https://review.whamcloud.com/15654
Lustre-commit:
45291b8c06eebf33d3654db3a7d3cfc5836004a6
Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I12be79e6c6a5da384a5fd81878a76a7ea8aa5834
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48895
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Etienne AUJAMES [Mon, 17 Oct 2022 19:37:39 +0000 (12:37 -0700)]
LU-15000 llog: read canceled records in llog_backup
llog_backup() do not reproduce index "holes" in the generated copy.
This could result to a llog copy indexes different from the source.
Then it might confuse the configuration update mechanism that rely on
indexes between the MGS source and the target copy.
This index gaps can be caused by "lctl --device MGS llog_cancel".
This patch add "raw" read mode to llog_process* to read canceled
records. So now llog_backup is able to reproduce an exact copy of
the original.
Lustre-change: https://review.whamcloud.com/46552
Lustre-commit:
d8e2723b4e9409954846939026c599b0b1170e6e
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I811e23de8f4545bed36a44fedc2638d7418830dd
Reviewed-by: Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Mon, 17 Oct 2022 19:31:56 +0000 (12:31 -0700)]
LU-14098 obdclass: try to skip corrupted llog records
if llog's header or record is found corrupted, then
ignore the remaining records and try with the next one.
Lustre-change: https://review.whamcloud.com/40754
Lustre-commit:
910eb97c1b43a44a9da2ae14c3b83e28ca6342fc
Fixes:
186f083722 ("LU-11924 osp: combine llog cancel operations")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If47ec1fc1e2eaf64be7ba08d3aa9c2b93903c0cf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Yang Sheng [Mon, 17 Oct 2022 18:53:47 +0000 (11:53 -0700)]
LU-14044 llog: check fid after convert
We should convert from llog_id and then check fid. Also
change fid-lookup to error check instead LASSERT.
Lustre-change: https://review.whamcloud.com/40294
Lustre-commit:
6df76d3357fc5896b6902399ed7ce6d7c7835f58
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I673d8f16ff9e57a0482d6a3ec3ee3db33699f57f
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48892
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Fri, 14 Oct 2022 22:04:53 +0000 (16:04 -0600)]
EX-5909 tests: clean up in sanity-quota/16a
Clean up the test file in sanity-quota test_16a. If test_16b is
run (DNE config) then the filesystem is reformatted, but in the
non-DNE config test_17 will fail if there is used quota.
Test-Parameters: trivial testlist=sanity-quota
Fixes:
b54b7ce43929 ("LU-14472 quota: skip non-exist or inact tgt for lfs_quota")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id1faeab9df246d8010bf114582ab17a75846db68
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48899
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Fri, 14 Oct 2022 20:06:26 +0000 (14:06 -0600)]
RM-620 build: New tag 2.14.0-ddn64
New tag 2.14.0-ddn64
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia86edfc375e1dda7205db1a32c8c1933153a3e92
Hongchao Zhang [Fri, 22 Jul 2022 15:02:24 +0000 (23:02 +0800)]
LU-15738 test: check lfsck status before starting
If the LFSCK has been started before calling "lfsck_start"
to start it, the test shouldn't fail for starting LFSCK.
Lustre-change: https://review.whamcloud.com/48018/
Lustre-commit:
29aaf679afac89359e1b116b8de0480f24b4e8ac
Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I266d9e2b9c5f37eb9e08b489fab428268b90d895
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Mon, 19 Sep 2022 16:00:15 +0000 (19:00 +0300)]
EX-5964 lamigo: disable idle disconnects
on the connections lamigo uses locally to avoid storms
of reconnects.
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3bc2742853e9636e38fbd8f7c2f238b3af55e0ba
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48840
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Zhuravlev [Fri, 6 Aug 2021 06:34:31 +0000 (09:34 +0300)]
EX-3142 tests: changelog processing verification
add extra counter to lamigo stats to catch gaps in changelog
processing. add a new test (hot-pools/60) to verify that no
gaps happen (i.e. lamigo gets all changelog records), verify
that the changelog is purged properly.
Test-Parameters: trivial testlist=hot-pools mdscount=2 mdtcount=4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I34d9d6f6f7f5766d945df43ae7d43dab7c70cef1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
John L. Hammond [Wed, 8 Jun 2022 02:15:39 +0000 (19:15 -0700)]
LU-13578 test: sleep longer in sanity test_39
In sanity test_39r(), sleep for 2 * atime_diff rather than atime_diff + 1.
Lustre-change: https://review.whamcloud.com/47346
Lustre-commit:
be2525ffddb4bf55fde77e97b00d1c349119daed
Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=50
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ied508e12c848f6935d2317fb86bddc5341a6156e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Andriy Skulysh [Fri, 5 Nov 2021 10:55:08 +0000 (12:55 +0200)]
LU-15472 ldlm: optimize flock reprocess
Resource reprocess on flock unlock can be done once
after all pending unlock requests.
It allows to reduce spinlock contention.
Lustre-change: https://review.whamcloud.com/46257
Lustre-commit:
42f377db4a24cefa7a041fcd3106dd58771eb319
Change-Id: I2809070f27fe3af7e1fc34e2b4b22603931f3dff
HPE-bug-id: LUS-10471, LUS-10909
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Etienne AUJAMES [Mon, 2 May 2022 12:27:17 +0000 (14:27 +0200)]
LU-15132 mdc: Use early cancels for hsm requests
HSM RELEASE and RESTORE requests take EX layout lock on the MDT side.
So the client can use early cancel for its local lock on the resource
to limit the contention (mdt side).
This patch does not pack ldlm request inside the hsm request because
the field (RMF_DLM_REQ) does not exist in the request. Adding this
field inside the request would break compatibility with _old_ servers.
Lustre-change: https://review.whamcloud.com/47181
Lustre-commit:
60d2a4b0efa4a944b558bd9b63b6334f7e70419b
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I30a57b4855c28eef9c55a9645d3b6c491f962b13
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Serguei Smirnov [Thu, 8 Sep 2022 22:27:12 +0000 (15:27 -0700)]
LU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE
RDMA_CM_EVENT_UNREACHABLE may be received not only when connection
is being connected, but also when it is being closed. Fix handing
of this event accordingly.
Lustre-change: https://review.whamcloud.com/48492
Lustre-commit:
3925b1669d519e6c038ecce1287c1ced3de623d3
Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I79428188c159b2d80d36326589b2977db065d4a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Wed, 12 Oct 2022 06:35:42 +0000 (09:35 +0300)]
LU-14428 libcfs: discard cfs_trace_copyin_string()
Instead of cfs_trace_copyin_string(), use memdup_user_nul().
This combines the allocation with the copyin, and nul-terminates.
The resulting code is a lot simpler.
Lustre-change: https://review.whamcloud.com/41490
Lustre-commit:
67af976c806994cec27414d24b43f6519d72c240
LU-14788 lnet: check memdup_user_nul using IS_ERR
Crash in __proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.
Lustre-change: https://review.whamcloud.com/44091
Lustre-commit:
449d046e55a42cc4d1c4ab0217551cded1864bc4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I089c5da96b59ec62d177aea2f3d170bf751c6fec
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexander Boyko [Tue, 24 Nov 2020 09:05:36 +0000 (04:05 -0500)]
LU-13974 tests: update log corruption
Test case reproduce missing object for sub transaction during
set xattr operation.
First setattr got -2, second already started, but didn't
make llog_add yet. In this case llog osp object is stale after
top_trans_start. So declaration phase can not refresh llogs. And
at llog_osd_write_rec osp object changes stale state to
valid(dt_attr_get), but llog handle and llog header are invalid.
A new record would be added to updatelog with wrong index.
In that case processing of update log fails with
fs1-MDT0001-osp-MDT0003: [0x2:0x400024d0:0x2] Invalid record: index
112926 but expected 112925
lod_sub_recovery_thread()) fs1-MDT0001-osp-MDT0003 get update log
failed: rc = -34
Recovery aborted, and clients are evicted.
Lustre-change: https://review.whamcloud.com/40743
Lustre-commit:
562837124ec7bffeba7edb4b4b899bc271833374
HPE-bug-id: LUS-9030
Test-Parameters: testlist=sanity envdefinitions=ONLY="427"
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I6a47fed1bc01f4be62216d1d0787adc413df0cf5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Aleksei Alyaev [Thu, 23 Dec 2021 08:48:22 +0000 (11:48 +0300)]
LU-8621 utils: cmd help to stdout or short cmd error
- Changed to print command help to stdout
- Changed to output short error message for an unrecognized command
Lustre-change: https://review.whamcloud.com/47162/
Lustre-commit:
bc69a8d058f5bcdb75e062df57a6ccd23243d1e0
Test-Parameters: trivial
Signed-off-by: Aleksei Alyaev <aalyaev@ddn.com>
Change-Id: I67616ddb576e3347a2da130b3a731a6bf8730185
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Thu, 13 Oct 2022 19:19:47 +0000 (12:19 -0700)]
LU-16233 build: Add always target for SUSE15 SP3 LTSS
SUSE 15 SP3 LTSS kernel version 5.3.18-150300.59.93
(and later) breaks lustre build tests which expect
conftest.i to be generated.
Lustre-change: https://review.whamcloud.com/48833
Lustre-commit: TBD (from
274b34c4d3a20937ebb17d139dbde0eaaed503b2)
HPE-bug-id: LUS-11286
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If23e9b31b537878a43075ffff62a99906f47fd9a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Wed, 28 Sep 2022 07:00:22 +0000 (00:00 -0700)]
LU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2]
Update SLES15 SP4 kernel to 5.14.21-150400.24.21.2 for Lustre client.
Lustre-change: https://review.whamcloud.com/48604
Lustre-commit: TBD (from
896fd88c35b6685a586c1279c83c739b48cbe846)
Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity
Change-Id: Ia68e1c960c79f40d0f725b0f440cd562b820a19f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Wed, 28 Sep 2022 06:46:30 +0000 (23:46 -0700)]
LU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0]
Update RHEL9.0 kernel to 5.14.0-70.26.1.el9_0 for Lustre client.
Lustre-change: https://review.whamcloud.com/48676
Lustre-commit: TBD (from
9951a56c26b1ce6639cd2db350fdf6b81b3b4707)
Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity
Change-Id: I9da2ccdf419d6490fdba80199eda69f4f19361be
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Wed, 12 Oct 2022 00:53:20 +0000 (17:53 -0700)]
EX-6130 lipe: s_volume_name not NUL terminated
s_volume_name field stores string, but the field may have no
termination NUL if the string size equal the size of the field.
Therefore on some target systems the definition of
struct ext2_super_block s_volume_name in
/usr/include/ext2fs/ext2_fs.h may have
attribute "nonstring". In such case it conflicts with calls
which require NUL terminated string.
The fix replaces NUL-terminated string calls by calls with
limited string size (e.g. strlen() -> strnlen())
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ieb1921a289328a8f9bfae9bb658c6c74f8ec43b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48829
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Tue, 11 Oct 2022 08:04:59 +0000 (02:04 -0600)]
RM-620 build: New tag 2.14.0-ddn63
New tag 2.14.0-ddn63
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e4b8d3d863cbd504fc7470b413d2083bb15e371
Etienne AUJAMES [Wed, 5 Oct 2022 07:10:05 +0000 (00:10 -0700)]
LU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain
Add the catalog callback return LLOG_SKIP_PLAIN to conditionally skip
an entire llog plain.
This could speedup the catalog processing for specific usages when a
record need to be access in the "middle" of the catalog. This could
be usefull for changelog with several users or HSM.
This patch modify chlg_read_cat_process_cb() to use LLOG_SKIP_PLAIN.
The main idea came from:
d813c75d ("LU-14688 mdt: changelog purge
deletes plain llog")
**Performance test:**
* Environement:
2474195 changelogs record store on the mds0 (40 llog plain):
mds# lctl get_param -n mdd.lustrefs-MDT0000.changelog_users
current index: 2474195
ID index (idle seconds)
cl1 0 (3509)
* Test
Access to records at the end of the catalog (offset: 2474194):
client# time lfs changelog lustrefs-MDT0000 2474194 >/dev/null
* Results
- with the patch: real 0m0.592s
- without the patch: real 0m17.835s (x30)
Lustre-change: https://review.whamcloud.com/46310
Lustre-commit:
aa22a6826ee521ab14994a4533b0dbffb529aab0
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I887d5bef1f3a6a31c46bc58959e0f508266c53d2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Gaurang Tapase [Thu, 29 Sep 2022 05:57:45 +0000 (23:57 -0600)]
EX-6033 lipe: Add note to developers for HP scripts
stratagem-hp-* scripts are moved to EMF repo as
they are tightly coupled with EXA release because of
HA configuration. They are kept in lustre repo so that
hotfixes should not delete them.
Test-Parameters: trivial
Change-Id: I33eecaa4ed0c9342a83973bac313322a007d72d0
Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48698
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Fri, 23 Sep 2022 09:34:09 +0000 (05:34 -0400)]
EX-5936 pcc: dont take UPDATE lock when set lustre.pin xattr
In this patch, we do not take UPDATE lock whan set lustre.pin
XATTR during the PCC pin command.
The reason is that it may revoke the combined UPDATE|LAYOUT lock
cached on the client namespace, and invalidate the layout and PCC
cache.
As we disable to cache lustre.pin xattr on the client XATTR cache,
so it does not cause problem without taking UPDATE lock bit during
set lustre.pin XATTR.
Add test case: sanity-pcc/204d.
Change-Id: I35a0e399294020efdb0e4710500e8f7b846c290f
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexander Boyko [Wed, 5 Oct 2022 07:06:59 +0000 (00:06 -0700)]
LU-14599 osp: limit allocation at osp_sync_process_committed
Sometimes osp cancels very large cookie list with 64K elements.
In this case osp_sync_process_committed() tries to allocate 64 pages
and uses vmalloc.
The fix limits memory allocation size to 4 page with kmalloc, and
reuse it in a loop.
Lustre-change: https://review.whamcloud.com/43250
Lustre-commit:
9b692e2e7d105f4926649ea46007ac65b24c4b6d
HPE-bug-id: LUS-9815
Fixes:
6d7332102 ("LU-11924 osp: combine llog cancel operations")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic875335a28f78494fdb3cbc4b0145e5a43831ee8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48773
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Mon, 5 Sep 2022 07:41:37 +0000 (10:41 +0300)]
LU-16135 lod: prohibit DoM pattern in plain layout
DoM pattern can be set as default directory plain layout by
older LFS version. It misses DoM component sanity checks if
plain layout is used. Such layout is not allowed and causes
later crashed when file is created under that directory.
While LFS can prevent this but not in all Lustre versions,
so LOD should do the check as well
Lustre-change: https://review.whamcloud.com/48433
Lustre-commit:
a8272168e3888ec4ced18035182159a8ee56a51a
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic58fdda2ab3e63083128cb6cf949fcb43ccd2c02
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Etienne AUJAMES [Thu, 21 Oct 2021 14:31:01 +0000 (16:31 +0200)]
LU-15132 hsm: Protect against parallel HSM restore requests
Multiple parallel accesses (read/write) to the same released file
could cause multiple HSM restore requests to be sent.
On the MDT side, each restore request waits the first one to complete
before grabbing the MDS_INODELOCK_LAYOUT LCK_EX and registering the
llog record.
This could cause several MDT threads to hang for the same restore
request sent in parallel. In the worst case, all MDT threads can
hang and the MDS is not longer able to handle requests.
This patch checks if an HSM restore handle exists before taking the
lock.
Lustre-change: https://review.whamcloud.com/45367
Lustre-commit:
66b3e74bccf1451d135b7f331459b6af1c06431b
Test-Parameters: testlist=sanity-hsm,sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=12s,ONLY_REPEAT=50
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I9584edc2c7411aa41b2e318e55f57c117d1c3dfb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48650
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaly Fertman [Tue, 4 Oct 2022 17:30:08 +0000 (10:30 -0700)]
LU-16062 ldlm: improve bl_timeout for prolong
If there is a client's RPC in hand, we can do a better job for
calculating the lock callback timeout as RPC has the info what
client thinks about this RPC timeout. Let's use it.
Lustre-change: https://review.whamcloud.com/48094
Lustre-commit:
34b2246e4a6c8ce827c404cb4e52f7c6a0a1b90b
HPE-bug-id: LUS-8866, LUS-11074
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ibd67d37c1073d0d3cb2e08b532c801af0de116fe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaly Fertman [Tue, 4 Oct 2022 17:24:31 +0000 (10:24 -0700)]
LU-14183 ldlm: wrong ldlm_add_waiting_lock usage
exp_bl_lock_at accounted the period since BLAST send until cancel RPC
came to server originally. LU-6032 started to update l_blast_sent for
expired locks which are still busy - prolonged locks when the timeout
expired. In fact, this is a good idea to cover not the whole period
but until any involved RPC comes - it avoids excessively large lock
callback timeouts - and the IO which does the lock prolong is also
able to re-start the AT cycle by updating the l_blast_sent.
Unfortunately, the change seems to be made occasionally as the main
prolong code was not adjusted accordingly.
Lustre-change: https://review.whamcloud.com/40868
Lustre-commit:
af07c9a79e263f940fea06a911803097b57b55f4
Fixes:
292aa42e08 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock")
HPE-bug-id: LUS-9278
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Idc598508fc13aa33ac9fce56f13310ca6fc819d4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lei Feng [Thu, 30 Jun 2022 02:46:31 +0000 (10:46 +0800)]
LU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()
There is a race condition that: on server side, one thread sent
early replay and is deleting the reply message, another is
searching for existing request and print some debug information
in _debug_req() if there is a duplicated request. They both operate on
req->rq_repmsg but it is not protected in ptlrpc_req_drop_rs().
So we protected it with req->rq_early_free_lock.
Lustre-change: https://review.whamcloud.com/47839
Lustre-commit:
aaef545cff2dd958418ec9fb364d4bbe1408edb9
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ied55427ee15c3ef84bdd2d579844eba398dbf010
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47860
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Yang Sheng [Mon, 19 Sep 2022 05:46:27 +0000 (13:46 +0800)]
LU-16166 ptlrpc: lower the message level in no resend case
Don't report the wrong generation as a error message in
rq_no_resend case.
Lustre-change: https://review.whamcloud.com/48585
Lustre-commit:
d13cca56a5ae2ad44d8083025e37263e408b8f62
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I534cadc916fcd1eb6840439b6507e646d0e5d974
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Artem Blagodarenko [Wed, 28 Sep 2022 14:28:11 +0000 (10:28 -0400)]
EX-6069 ldiskfs: ext4-simple-blockalloc.patch small fixes
The LU-14305 requires cleanup to do.
MB_DEFAULT_MAX_CX_BYTES #defines are not used anymore,
and should be removed.
Also, in the el8 version of the patch for b_es6_0,
the THRESHOLD_BLOCKS() function should explicitly take "sbi"
as a parameter.
Test-Parameters: trivial
Fixes:
d5d5cfdde2 ("add persistent tuning for mb_c3_threshold")
Change-Id: Idcb93432fdfa7694b4e7cabbf46a0bf21a412f87
Signed-off-by:Artem Blagodarenko <ablagodarenko@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48714
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Serguei Smirnov [Fri, 23 Sep 2022 19:29:59 +0000 (12:29 -0700)]
LU-16184 o2iblnd: fix deadline for tx on peer queue
In o2iblnd, deadline is checked for txs on peer queue,
but not set prior to adding the tx to the queue. This
may cause the tx to be dropped unnecessarily with
"Timed out tx for ..." warning.
Fix it by setting the tx_deadline when adding tx to peer queue.
Lustre-change: https://review.whamcloud.com/48640
Lustre-commit:
4c89ee7d7b098c7f1e6566f49fa2940db577518d
Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie7cf5590b440b60f71527049953a64bb31d53578
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48641
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Bobi Jam [Thu, 15 Sep 2022 06:46:34 +0000 (14:46 +0800)]
LU-16160 osc: take ldlm lock when queue sync pages
osc_queue_sync_pages() add osc_extent to osc_object's IO extent
list without taking ldlm locks, and then it calls
osc_io_unplug_async() to queue the IO work for the client.
This patch make sync page queuing take ldlm lock in the
osc_extent.
Lustre-change: https://review.whamcloud.com/48557
Lustre-commit: 67aca1fcc6bed20794832decdba590a758d67d8fp
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idefa2981e62a2a6e10d8b8a7692c0337b61b9052
Reviewed-on: https://review.whamcloud.com/48597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Wed, 21 Sep 2022 19:15:03 +0000 (12:15 -0700)]
EX-5932 lipe: stratagem-hp-config.sh has wrong MDTLIST
stratagem-hp-config.sh doesn't pick up proper MDTLIST
if snapshot agents are running. Fix MDTLIST which is used
to configure lpurge
Test-Parameters: trivial
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ic1d58d56f1acae140122d0b582410c140759e89e
Reviewed-on: https://review.whamcloud.com/48619
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Emoly Liu [Thu, 15 Sep 2022 01:42:47 +0000 (09:42 +0800)]
LU-16154 obdclass: free inst_name correctly
In functon class_config_llog_handler(), inst_name should be freed
correctly before break.
Lustre-change: https://review.whamcloud.com/48542
Lustre-commit:
e7f17c5e0c95dba3b80e192e4ca3628cc42e64b9
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I6adc0ed62c3c637237834b799f25666d0e7e1ecb
Reviewed-on: https://review.whamcloud.com/48670
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Mon, 26 Sep 2022 18:22:56 +0000 (11:22 -0700)]
LU-16050 build: replace ofed_info with dpkg/rpm
After installing MLNX_OFED by running mlnxofedinstall command,
mlnx-ofed-kernel-modules package is not listed by ofed_info,
which causes Lustre configure fail as follows:
checking whether to use Compat RDMA... /usr/bin/ofed_info
dpkg-query: error: --listfiles needs at least one package name argument
This patch fixes the above issue by replacing ofed_info with
"dpkg -l" and "rpm -qa" commands to find OFED package.
Lustre-change: https://review.whamcloud.com/48047
Lustre-commit:
3a7930e63c15b0fbe51ac73db81a1186939115bb
Test-Parameters: trivial
Fixes:
ec03c9628cae ("LU-15417 build: find the new path for MOFED 5.5")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ia3c2d6bf10e147ca2761221741eff6f93008556c
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Wed, 28 Sep 2022 16:54:33 +0000 (09:54 -0700)]
EX-6014 tests: Revert "EX-4093 tests: hot-pools don't recreate pools"
This reverts commit
116cbacc52d8 to resolve the hot-pools
regression test failures.
After running sub-test 1, the OST pools were destroyed by
the following stack_trap in create_pool():
stack_trap "destroy_test_pools $fsname" EXIT
If the pools are not recreated in the successive sub-tests,
then they will fail. We have to revert commit
116cbacc52d8
before we find out a way to avoid triggering the stack_trap
between sub-tests.
Test-Parameters: trivial mdscount=2 mdtcount=4 \
testlist=parallel-scale-nfsv4,hot-pools
Fixes:
116cbacc52d8 ("EX-4093 tests: hot-pools don't recreate pools")
Change-Id: I464a1f9f380c55e70b78a0dd7e52723d5b0a298d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48690
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 23 Sep 2022 22:24:58 +0000 (16:24 -0600)]
RM-620 build: New tag 2.14.0-ddn62
New tag 2.14.0-ddn62
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I21b71b04905a70acbaada6d5a7fbab6c9184ca51
Andreas Dilger [Fri, 23 Sep 2022 19:36:53 +0000 (19:36 +0000)]
Revert "EX-4141 lipe: lamigo should detect dead OST and restart ALR"
This reverts commit
028bee14d2c6d8feb5eb418302f8751643e731c6 due to build error.
Change-Id: I6193f3e99192b618a3e6616524e28b230659fc0b
Reviewed-on: https://review.whamcloud.com/48639
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Fri, 23 Sep 2022 17:19:23 +0000 (11:19 -0600)]
RM-620 build: New tag 2.14.0-ddn61
New tag 2.14.0-ddn61
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34c78bc6ce2fbac65e4e8b017cad1da05c78d53a
Minh Diep [Thu, 15 Sep 2022 03:41:37 +0000 (20:41 -0700)]
LU-16183 tests: sanity-hsm/70 should detect python
Check for python2 and python3 explicitly, since the
generic python command does not exist in newer distros.
Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=sles15sp3 testlist=sanity-hsm
Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=el7.9 testlist=sanity-hsm
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: I2251be461129310868868277bf9d46015545ffe2
Reviewed-on: https://review.whamcloud.com/48577
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR
Use #keepalive message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message
Duplicate ofd_access_log_reader from lustre/utils into
lipe/src/es_ofd_access_log_reader
Use common lamigo_hash.h for lamigo and
es_ofd_access_log_reader
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I26dc631a8663046821e049fc6e091108b2a62f87
Reviewed-on: https://review.whamcloud.com/46944
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John Hammond <jhammond@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Chris Horn [Tue, 24 Aug 2021 16:16:17 +0000 (11:16 -0500)]
LU-14962 lnet: Check for -ESHUTDOWN in lnet_parse
The fix for LU-8106, http://review.whamcloud.com/19993, no longer
works because rc does not have the return value from
lnet_nid2peerni_locked(). Use PTR_ERR to get the return value and
restore the LU-8106 fix.
Lustre-change: https://review.whamcloud.com/44743
Lustre-commit:
cce82630cbf2c7badbbdd16a8ca9c8c0065ded13
Test-Parameters: trivial
HPE-bug-id: LUS-10333
Fixes:
fa8b4e6357 ("LU-7734 lnet: peer/peer_ni handling adjustments")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9cc2bc2d6e675d38cf06d99c524bdd95110bf0e9
Reviewed-on: https://review.whamcloud.com/48487
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Thu, 3 Mar 2022 07:12:32 +0000 (01:12 -0600)]
LU-15618 lnet: Return ESHUTDOWN in lnet_parse()
If the peer NI lookup in lnet_parse() fails with ESHUTDOWN then we
should return that value back to the LNDs so that they can treat the
failed call the same way as other lnet_parse() failures.
Returning zero results in at least one bug in socklnd where a
reference on a ksock_conn can be leaked which prevents socklnd from
shutting down.
Lustre-change: https://review.whamcloud.com/46711
Lustre-commit:
4fbd0705a3d25bbc85e953f81e697e5006b215ce
Fixes:
47b7b31978 ("LU-8106 lnet: Do not drop message when shutting down LNet")
Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-15794
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic403619c6dccf3921c46a674808c404adad7a30e
Reviewed-on: https://review.whamcloud.com/48485
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Mon, 7 Mar 2022 17:03:50 +0000 (11:03 -0600)]
LU-15616 lnet: ln_api_mutex deadlocks
LNetNIFini() acquires the ln_api_mutex and holds onto it throughout
various shutdown routines. Meanwhile, LND threads (via
lnet_nid2peerni_locked()) or the discovery thread (via
lnet_peer_data_present()) may need to acquire this mutex in order to
progress.
Address these potential deadlocks by setting the_lnet.ln_state to
LNET_STATE_STOPPING earlier in LNetNIFini(), and release the mutex
prior to any call into LND module or before any wait.
LNetNIInit() is modified to return -ESHUTDOWN if it finds that there
is a concurrent shutdown in progress.
Lustre-change: https://review.whamcloud.com/46727
Lustre-commit:
22de0bd145b649768b16dd42559d326af3c13200
Test-Parameters: trivial
HPE-bug-id: LUS-10681
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia8b28cc95ff71e66a0f99aed4f2c22ec9d44ce1e
Reviewed-on: https://review.whamcloud.com/48384
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Fri, 11 Dec 2020 18:04:32 +0000 (12:04 -0600)]
LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy
The MR design dictates that the peer nets and peer NIs are ordered
such that the peer net and peer NI for a peer's primary NID appears
first, followed by other peer NIs in the primary NID's peer net,
followed by other peer nets/NIs. This ordering is broken and it can
result in tripping an assertion if the primary NID of a peer is
deleted. Modify lnet_peer_attach_peer_ni() to check whether the
NI being attached is the peer's primary, and place it, and its
associated peer net, appropriately.
Modify lnet_peer_set_primary_nid() so that it updates the
lp_primary_nid before calling lnet_peer_add_nid() so that
lnet_peer_attach_peer_ni() can detect the situation where the
primary is changing and act appropriately.
Finally, modify lnet_peer_merge_data() to enforce the hierarchy
after it has finished merging the contents of the ping buffer. This
ensures we maintain the correct hierarchy in certain edge cases where
we've needed to reconcile two peers. e.g. if a peer adds a new
interface, the discovery push may arrive from that new interface
which will result in a second peer object being created which will
need to be reconciled with the original peer object.
Lustre-change: https://review.whamcloud.com/40985
Lustre-commit:
9eb9474c41c823c70f34e6bb102a8861ca21a3d1
HPE-bug-id: LUS-9630
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8397a24ba1ba0bba33846e7e97b8d60a8f26a1be
Reviewed-on: https://review.whamcloud.com/48508
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Chris Horn [Sat, 5 Feb 2022 23:15:30 +0000 (23:15 +0000)]
LU-15538 lnet: DLC sets map_on_demand incorrectly
When any NET or LND tunable is specified via CLI or yaml, then the
whole tunables struct gets memset to 0, or in the case of yaml config,
0 gets assigned to any tunable that isn't specified in the yaml. This
causes a problem for map_on_demand because 0 is a valid value for that
parameter, and ko2iblnd cannot know whether the user specified that 0
should be used or if DLC is specifying that the parameter was unset.
Rather than setting this parameter to 0 in the LND tunables struct,
have DLC set it to UINT_MAX to indicate that ko2iblnd should use the
value of the kernel module parameter.
Lustre-change: https://review.whamcloud.com/46492
Lustre-commit:
896f4a082b93453f5e7168f685faff4fba594ff3
Test-Parameters: trivial
HPE-bug-id: LUS-10740
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I303e64d4d402ba61b5ae3e3910873f192a4a2845
Reviewed-on: https://review.whamcloud.com/48491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Wed, 21 Sep 2022 00:40:46 +0000 (17:40 -0700)]
EX-4093 tests: hot-pools don't recreate pools
the test can save some time skipping pools recreating in every
subtest.
before: 1371 seconds
after: 1058 seconds
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9304e29b6fc59dd68626b44844dc81500009a80f
Reviewed-on: https://review.whamcloud.com/48614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Thu, 8 Sep 2022 08:37:31 +0000 (01:37 -0700)]
EX-5824 test: hot-pools test_57: data copy failed: mirror failed
Add debug prints in hot-pools test_57
Test-Parameters: trivial env=FAIL_ON_ERROR=false,ONLY=56-57 testlist=hot-pools
Change-Id: I863b580f5483c14c24c6f79ebdddbc782b65e945
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-on: https://review.whamcloud.com/48477
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
James Nunez [Mon, 13 Sep 2021 16:35:30 +0000 (10:35 -0600)]
LU-14992 tests: sanity/replay-vbr mkdir on MDT0
Replace mkdir with mkdir_on_mdt0() for sanity test 133a
and relay-vbr test 7a. These tests expect the newly
created directory is on MDT0.
Lustre-change: https://review.whamcloud.com/44902/
Lustre-commit: TBD
Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: env=SLOW=yes mdscount=2 mdtcount=4 testlist=replay-vbr
Change-Id: Icea2923a8d8d3a3aa0ddf0401f0a025480b2f6f0
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Tue, 30 Mar 2021 05:57:14 +0000 (08:57 +0300)]
LU-13358 libcfs: add timeout to cfs_race() to fix race
there is no guarantee for the branches in cfs_race() to be executed
in strict order, thus it's possible that the second branch (with
cfs_race_state=1) is executed before the first branch and then another
thread executing the first branch gets stuck.
this construction is used for testing only and as a
workaround it's enough to timeout.
Lustre-change: https://review.whamcloud.com/43161
Lustre-commit:
2d2d381f35ee004319a20f5d2d8e70d13480d6c7
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie1cc0accedb3e1a198d4b17d1ab00ce298c560f2
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>