Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]
Jian Yu [Tue, 8 Nov 2022 18:40:24 +0000 (10:40 -0800)]
LU-16293 kernel: kernel update RHEL9.0 [5.14.0-70.30.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.30.1.el9_0 for Lustre client.

Lustre-change: https://review.whamcloud.com/49044
Lustre-commit: TBD (from 247849f22a32e85eb8b718d18642f65ac7663a82)

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ide942f88242c80af1e103b226b65cfbce94bfb57
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49074
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15935 target: keep track of multirpc slots in last_rcvd
Etienne AUJAMES [Fri, 29 Jul 2022 12:35:33 +0000 (14:35 +0200)]
LU-15935 target: keep track of multirpc slots in last_rcvd

OBD_INCOMPAT_MULTI_RPCS is cleared by tgt_boot_epoch_update() if the
recovery is aborted. This supposes that all the clients are evicted
but that is not true. Some clients could have successfully finished
their recovery. In that case, those clients will keep their last_rcvd
slot.

This patch modifies lut_num_client to keep track of multirpc
slots in last_rcvd.
For now the counter is use only by tgt_fini() to clear
OBD_INCOMPAT_MULTI_RPCS. So we can expand this use case for
tgt_boot_epoch_update().

Add replay-dual test_33.

Lustre-change: https://review.whamcloud.com/48082
Lustre-commit: 1a79d395dd61ea2e21598bfaa5b39375e64ec22c

Test-Parameters: testlist=replay-dual env=ONLY=33,ONLY_REPEAT=30
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I70791c9dcb7cc77f018b9e5c95568598d54f0322
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15404 ldiskfs: truncate during setxattr leads to kernel panic
Andrew Perepechko [Thu, 10 Nov 2022 04:59:27 +0000 (20:59 -0800)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic

When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.

A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Lustre-change: https://review.whamcloud.com/46358
Lustre-commit: e239a14001b62d96c186ae2c9f58402f73e63dcc

Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48999
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16251 obdclass: fill jobid in a safe way
Lei Feng [Wed, 19 Oct 2022 04:10:23 +0000 (12:10 +0800)]
LU-16251 obdclass: fill jobid in a safe way

jobid_interpret_string() does not fill jobid in an atomic way.
So in lustre_get_jobid() give it a buffer first, then copy the
buffer to jobid as a whole.

Lustre-change: https://review.whamcloud.com/48915
Lustre-commit: 9a0a89520e8b57bd63a9343fe3cdc56c61c41f6d

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ib8f6aaa93df31867982a0d142f33d7374a27234f
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49081
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode
Alexander Zarochentsev [Fri, 29 Jul 2022 19:38:09 +0000 (22:38 +0300)]
LU-16061 osd-ldiskfs: clear EXTENT_FL for symlink agent inode

The flag should be cleared for "fast" symlinks otherwise
e2fsck complains about inode correctness.
New agent inodes of symlink type may have EXT4_EXTENT_FL flag
set if the fs has "extent" feature and it is not cleared as in
other places where "fast" symlinks are created.

Lustre-change: https://review.whamcloud.com/48093
Lustre-commit: 73ac8e35e5d64d3fe4ca6c48514dc57058e3a7b8

HPE-bug-id: LUS-10237
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib7b807bb1298cc3a9fd4fdba35747b4bda6fe034
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49016
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16258 llite: Explicitly support .splice_write
Shaun Tancheff [Fri, 21 Oct 2022 04:54:49 +0000 (23:54 -0500)]
LU-16258 llite: Explicitly support .splice_write

Linux commit v5.9-rc1-6-g36e2c7421f02
  fs: don't allow splice read/write without explicit ops

Lustre supports splice_write and previously provide handlers
for splice_read.
Explicitly use iter_file_splice_write, if it exists.

Lustre-change: https://review.whamcloud.com/48928
Lustre-commit: c619b6d6a54235cc0e34a65cf5916a632f4011c3

HPE-bug-id: LUS-11259
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I858688fc9b4dd370b6018c3b134f01e580477b25
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16207 build: add rpm-build BuildRequires for SLES15 SP3
Jian Yu [Tue, 4 Oct 2022 16:24:36 +0000 (09:24 -0700)]
LU-16207 build: add rpm-build BuildRequires for SLES15 SP3

SLES15 SP3 fails to build using rpm-build-4.14.1-29.46
from the main O/S repository with error message:

- Dependency tokens must begin with alpha-numeric,
  '_' or '/': BuildRequires: %kernel_module_package_buildreqs

Updating rpm-build to 4.14.3-150300.46.1 or higher
resolved the build issue.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48760
Lustre-commit: 78c681d9f42cb56e30c8946e5d7b05f0bc6e86f2

Change-Id: I80099e7ba2d98e07b9877183879766f3dd7f3c1a
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49079
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5473 tests: add version check for interop
Minh Diep [Wed, 9 Nov 2022 21:21:32 +0000 (13:21 -0800)]
EX-5473 tests: add version check for interop

sanity-quota test_75 on 2.12 servers

Test-Parameters: trivial testlist=sanity-quota

Change-Id: I57f5b6415017ec7cf81e3bcb43f289087a8621fd
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6331 lipe: lamigo --help causes Segmentation fault
Alexandre Ioffe [Tue, 8 Nov 2022 18:32:25 +0000 (10:32 -0800)]
EX-6331 lipe: lamigo --help causes Segmentation fault

Fixed printf NULL string argument which causes the seg fault

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I0a9bc3cee308c8cd88d23674bb5127cddb1fdb41
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 target: report multiple transno to debug log
Mikhail Pershin [Wed, 26 Oct 2022 08:17:11 +0000 (11:17 +0300)]
LU-15847 target: report multiple transno to debug log

Don't report multiple transaction cases to console but
make it as debug message.

Lustre-change: https://review.whamcloud.com/49027
Lustre-commit: TBD (1550da71c46f65b72951c0348f32835ed7f617fb)

Fixes: 4e2e8fd2fc0a ("LU-15847 tgt: reply always with the latest assigned transno")
Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: If9b47dfedcaf67487954189e8a75d2029a502469
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49027
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list
Jian Yu [Wed, 9 Nov 2022 19:06:37 +0000 (11:06 -0800)]
EX-6298 tests: add hot-pools test 72 into ALWAYS_EXCEPT list

This patch adds hot-pools test 72 into ALWAYS_EXCEPT list before
it gets a real fix.

Test-Parameters: trivial testlist=hot-pools

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: If214f7285dfb96dee24e6c5968f1f19c81ce1ddf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49085
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15179 tests: add trap cleanup_quota_test
Sergey Cheremencev [Wed, 2 Nov 2022 10:08:50 +0000 (18:08 +0800)]
LU-15179 tests: add trap cleanup_quota_test

Add stack_trap cleanup_quota_test to the tests that
use setup_quota_test. If a test fails without calling
cleanup_quota_test, it may cause later tests to fail
due to used space > 0.

Remove ${tdir}_dom, if exists, in cleanup_quota_test.
sanity-quota_75 doesn't remove test_dom directory.

Lustre-change: https://review.whamcloud.com/#/c/45418/
Lustre-commit: c44b2bea1bacc3cb9173353037cf3a616f13669f

Test-Parameters: trivial  testlist=sanity-quota
Fixes: a4fbe734("LU-14739 quota: nodemap squashed root cannot bypass quota")
Change-Id: Ife4fd499b427bee79f74a5e172d233fe6a83e240
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14958 kernel: use rhashtable for revoke records in jbd2
Alex Zhuravlev [Wed, 12 Oct 2022 07:32:36 +0000 (10:32 +0300)]
LU-14958 kernel: use rhashtable for revoke records in jbd2

resizable hashtable should improve journal replay time when
the latter has got million of revoke records

before:
1048576 records - 95 seconds
2097152 records - 580 seconds

after:
1048576 records - 2 seconds
2097152 records - 3 seconds
4194304 records - 7 seconds

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I8f54a51df5e3387277b976e046eea70c26d54dcd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16232 scripts: changelog/updatelog emergency cleanup
Mikhail Pershin [Wed, 12 Oct 2022 09:22:14 +0000 (12:22 +0300)]
LU-16232 scripts: changelog/updatelog emergency cleanup

Emergency cleanup scripts for situations when llogs are
corrupted and can't be cleaned up in a normal way. In such
cases the recommendation is to remove/truncate those llogs.

Scripts make all needed steps and have debugging option to
collect llogs for further analysis.

Scripts possible actions are:
 - dry-run mode to check all actions and files affected
 - create archive with all llogs for analysis
 - remove llogs including all plain llogs

Lustre-change: https://review.whamcloud.com/48838
Lustre-commit: b533700add91fe4220f50d057a470e0b6f4893c9

Test-Parameters: trivial
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3b197179bc54f451e3c5d7db36b6f1c56c076856
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49023
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16203 llog: skip bad records in llog
Mikhail Pershin [Mon, 3 Oct 2022 15:35:25 +0000 (18:35 +0300)]
LU-16203 llog: skip bad records in llog

This patch is further development of idea to skip bad
(corrupted) llogs data. If llog has fixed-size records
then it is possible to skip one record but not rest of
llog block.

Patch also fixes the skipping to the next chunk:
 - make sure to skip to the next block for partial chunk
   or it causes the same block re-read.
 - handle index == 0 as goal for the llog_next_block() as
   expected exclusion and just return requested block
 - set new index after block was skipped to the first one
   in block
 - don't create fake padding record in llog_osd_next_block()
   as the caller can handle it and would know about
 - restore test_8 functionality to check corruption handling

Lustre-change: https://review.whamcloud.com/48776
Lustre-commit: TBD (from 5896c420d82507f90473414df3e6d342126cc21f)

Fixes: ec4194e4e78c ("LU-11591 llog: add synchronization for the last record")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6f88269e8626269268352f8bfd6d7950de438f3a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 obdclass: Add peer/peer NI when processing llog
Chris Horn [Thu, 3 Sep 2020 20:06:08 +0000 (15:06 -0500)]
LU-14661 obdclass: Add peer/peer NI when processing llog

Construct peers when processing the config log so that LNet has
complete information about peer info stored in the config log.

These are "temporary" peers which can be overwritten by discovery.

In client_import_add_nids_to_conn(), we do not need to hold the
import lock when adding NIDs to the obd_uuid, and LNet needs to take
the LNet API mutex when adding/modifying peers. We don't want to take
the mutex while a spin lock is already being held, so drop the spin
lock prior to calling class_add_nids_to_uuid().

Lustre-change: https://review.whamcloud.com/43510
Lustre-commit: 16321de596f6395153be6cbb6192250516963077

HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie0e35434c9b76f917c1448064c5217c821b1ad87
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14661 lnet: Provide kernel API for adding peers
Chris Horn [Wed, 2 Sep 2020 20:07:25 +0000 (15:07 -0500)]
LU-14661 lnet: Provide kernel API for adding peers

Implement LNetAddPeer() API to allow other kernel modules to add
peers to LNet.

Peers created via this API are not marked as having been configured
by DLC. As such, they can be overwritten by discovery.

Lustre-change: https://review.whamcloud.com/43509
Lustre-commit: ac201366ad5700edc860c139955af8a09bf53a1a

Test-Parameters: trivial
HPE-bug-id: LUS-9293
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ibb057f702ea29d60233fbd1680d8caec98064d5d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]
Jian Yu [Thu, 3 Nov 2022 20:09:15 +0000 (13:09 -0700)]
LU-16269 kernel: kernel update RHEL8.6 [4.18.0-372.32.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.32.1.el8_6.

Lustre-change: https://review.whamcloud.com/48969
Lustre-commit: TBD (from c4a23690d3328447c7b4ddbb8f567b2de21457b6)

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I5576180ddf10ed2b0a5e2ef85b58fef993de65a4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15259 tests: use existing usernames for setfacl
Andreas Dilger [Tue, 13 Sep 2022 18:06:10 +0000 (02:06 +0800)]
LU-15259 tests: use existing usernames for setfacl

In SLES15.2 and Ubutntu 20 the "bin" and "daemon" users are not
defined in /etc/passwd, causing setfacl to print a cryptic error:

  setfacl -m u:bin:rw f -- failed
  ~     ? setfacl: Option -m: Invalid argument near character 3

Replace "bin" and "daemon" in ACL tests so they are run with user
and group names that exist on all distros currently being tested.
They can also be specified via ACLUSR1/ACLUSR2 in the test config.

The "permission_xattr" test also needs "nobody" user and group.

Also, the "getfacl" command prints users and groups in numerical
order, so the ACL tests will fail if "daemon" < "bin", or if either
group is higher than the "users" group.  Fix them as needed.

Lustre-change: https://review.whamcloud.com/45627
Lustre-commit: 60188994e24b95db5915b8e6802f7963ffb2fd9c

Test-Parameters: trivial testlist=sanity-quota,sanity-sec,pjdfstest
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el7.9 serverdistro=el7.9
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el8.6
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=ubuntu2004

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7003e95577ab3a9314e8d4d29bb6b1784b9f8ae7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-11787 test: Fix checkfilemap tests for 64K page
James Simmons [Mon, 31 Jan 2022 17:44:46 +0000 (12:44 -0500)]
LU-11787 test: Fix checkfilemap tests for 64K page

File mapping is page size aligned. Modify the tests to handle 64K
page.

Lustre-change: https://review.whamcloud.com/45629
Lustre-commit: 7c88dfd28b5cc6114a85f187ecb2473657d42c9d

Test-Parameters: trivial clientdistro=el8.5 clientarch=aarch64 testlist=sanityn env=ONLY="71a 71b"
Change-Id: I316a197db8cdd0f9064431f8c572b43adf6110b8
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48945
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15278 lod: distinguish DIR/REGULAR lod_object members
Bobi Jam [Sat, 25 Dec 2021 14:36:40 +0000 (22:36 +0800)]
LU-15278 lod: distinguish DIR/REGULAR lod_object members

In lod_striping_free_nolock(), we need to distinguish lod_object
type, since DIR/REGULAR lod_object structure share the same memory
region, it could accidently free some unintended memory if it treat
DIR lod_object as REGULAR one, or vice versa.

Lustre-change: https://review.whamcloud.com/45710
Lustre-commit: 7a9c9ccabe93f2d96c80e90f8cbb786faca74835

Fixes: 6a20bdcc608b ("LU-11376 lov: new foreign LOV format")
Fixes: fdad38781ccc ("LU-11376 lmv: new foreign LMV format")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I2d4c563725b35f7a75f0f1fbf9c1d35b1799eff4
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/45940
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoEX-4147 tests: fix interop for sanity test_160h
Xing Huang [Thu, 27 Oct 2022 11:41:11 +0000 (19:41 +0800)]
EX-4147 tests: fix interop for sanity test_160h

Add a check sanity test_160h whether /sbin/umount.lustre is installed
on the MDS, since this subtest is checking whether the MDS unmount
process has completed, and otherwise fails during interop testing.

Test-Parameters: testlist=sanity env=ONLY=160 serverversion=EXA5
Fixes: 6d62073950ac ("EX-3209 lipe: add lpcc util and service")
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I6720b9e27a3a92e543ed877453802d23c0eef36d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48970
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn65
Andreas Dilger [Mon, 31 Oct 2022 04:11:09 +0000 (22:11 -0600)]
RM-620 build: New tag 2.14.0-ddn65

New tag 2.14.0-ddn65

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7bb4b45f5addc0c0d62dcf81c53cb114ad6454c1

2 years agoLU-15829 llite: don't use a kms if it invalid.
Alexey Lyashkov [Thu, 19 May 2022 17:35:18 +0000 (20:35 +0300)]
LU-15829 llite: don't use a kms if it invalid.

Lockless DIO don't update a KMS as other IO type does,
it caused a situation when next read don't known a real file size
to be read. Lets avoid using an invalid KMS.

Lustre-change: https://review.whamcloud.com/47395
Lustre-commit: dc907414db16d99e77aecf6bfd41d82b8cf7c36e

Fixes: 6bce5367 (LU-4198 clio: turn on lockless for some kind of IO)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie71d3f3cc24fc16c03ed07f9f5a3a17c7fdfa684
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4141 lipe: lamigo should detect dead OST and restart ALR
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR

Use '# keepalive' message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message

To make lamigo compatible with older
ofd_access_log_reader lamigo can work in two modes:
1. lamigo does not expect '# keepalive' message.
In this case after timeout it will restart
ofd_access_log_reader silently
2. lamigo expects periodical # keepalive
message. If lamigo does not receive keepalive message
or any other message from ofd_access_log_reader
within timeout it reports error message and
restarts ofd_access_log_reader.
lamigo switches from 1 to 2 once it receives
'# keepalive' message

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I55bc92b03ef5b45b72ff59ffd4b450cd1927cdb0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14719 lod: distributed transaction check space
Lai Siyao [Wed, 30 Mar 2022 21:50:22 +0000 (17:50 -0400)]
LU-14719 lod: distributed transaction check space

Distributed transaction failure may cause file missing or disconnected
directories, to avoid failure on disk full, check remote MDT free
space before transaction start.

The block/inode watermarks in obd_statfs_info are used to check
whether MDT has enough free blocks/inodes.

Add sanity 230x.

Lustre-commit: 6aee406c84b6b8fddf08b560acfcdf7c13c97e63
Lustre-change: https://review.whamcloud.com/47039

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0922e9c8668e8b842d313576bd68b52fa5d434ac
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-6193 pcc: dio attach failed on non-blksz-aligned file
Qian Yingjin [Fri, 21 Oct 2022 03:49:35 +0000 (23:49 -0400)]
EX-6193 pcc: dio attach failed on non-blksz-aligned file

PCC attach failed due to do DIO copy on files with blksz unligned
file size.
The reason is that the copy tool ll_fid_path_copy fails on
non-blksize-aligned file for PCC backend (such as a local Ext4
file system) using direct I/O.
In this path, it fixes this bug by falling back from direct I/O to
buffered I/O mode when copy the tail non-blksize-aligned file
part.

This patch also sets the errno with return code in the function
@get_root_path(), thus the call for @llaip_open_by_fid() with
invalid mount path will see the correct errno.

Change-Id: I5287563029269032a91397c0094e2ccede73b9b1
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48927
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15031 quota: reseed glbe in qmt_lvbo_udate
Sergey Cheremencev [Fri, 28 Oct 2022 10:29:03 +0000 (18:29 +0800)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate

Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.

The patch also adds test 80 to check that OST gets correct
edquot value after failover.

Lustre-change: https://review.whamcloud.com/45032
Lustre-commit: 61ec1e0f2ca8dc4c9f7ed41f782960e65cab0920

HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: move tti_ transaction params to tsi_
Mikhail Pershin [Sat, 28 May 2022 18:16:11 +0000 (21:16 +0300)]
LU-15847 tgt: move tti_ transaction params to tsi_

Move tti_mult_trans and tti_has_trans to tgt_session_info to
be available in all targets. This allows to cleanup old MDT
duplicating code and can be used for complex transaction
handling in MDT/OFD if needed.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47491
Lustre-commit: 0a317b171ebedcba8fc58e548991a884186c350c

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15847 tgt: reply always with the latest assigned transno
Mikhail Pershin [Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)]
LU-15847 tgt: reply always with the latest assigned transno

In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Lustre-commit: 4e2e8fd2fc0a9a30f47e70dc285a2101d2cbc4c2

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48978
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15447 tests: sanity-flr/208 reset rotational status
Alex Zhuravlev [Thu, 13 Jan 2022 07:27:21 +0000 (10:27 +0300)]
LU-15447 tests: sanity-flr/208 reset rotational status

new kernels (e.g. 4.18.0-305.25.1) declares loopback devices
in tmpfs as non-rotational one. sanity-flr/208 does wrong
assumption that devices are non-rotational by default. thus,
sanity-flr/208 started to fail with new kernels.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/46088
Lustre-commit: 78dddb423f0dc8571d3c7f8ccd8f77a1c2bc28ae

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib5c42da39667227a6cff5d379e30d2cd6c1e2773
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48952
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16106 lnet: allow direct messages regardless of peer NI status
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status

If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.

This change allows non-routed traffic to peer NIs with "down"
status.

Lustre-commit: 3345a8a54e89c342a4ce2d8d4bcb04ee919bcd52
Lustre-change: https://review.whamcloud.com/c/48355

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48669
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16025 llite: adjust read count as file got truncated
Bobi Jam [Thu, 7 Jul 2022 07:38:54 +0000 (15:38 +0800)]
LU-16025 llite: adjust read count as file got truncated

File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.

This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.

Lustre-change: https://review.whamcloud.com/47896
Lustre-commit: 4468f6c9d92448cb72c5a616ec74653e83ee8e10

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie51ba09201a1ca1464c3a3892d367590e978ee34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48848
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14642 test: add fsx mirror file test mode
Bobi Jam [Thu, 2 Sep 2021 16:30:10 +0000 (00:30 +0800)]
LU-14642 test: add fsx mirror file test mode

- add fsx mirror file test mode with "-M" option so that fsx can exert
its IO to FLR file as well as extend/split/resync the FLR file.

- add sanity-flr test_70b() to test fsx with flrmode.

- fix a bug in "lfs mirror verify" to accomodate max mirror count
instead of (max - 1) mirrors.

- improve "lfs mirror verify -v" print proper data range of its crc-32
checksum values.

Lustre-change: https://review.whamcloud.com/43473
Lustre-commit: 90ba8b4ac360b1987178445bd2ccd64f7958d912

Test-Parameters: testlist=sanity-flr env=ONLY=70a,ONLY_REPEAT=10
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib55c7b25dcd82fa0b197ad21268b16c82aab5da9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16249 sec: krb5_decrypt_bulk calls decryption primitive
Sebastien Buisson [Tue, 18 Oct 2022 15:19:01 +0000 (17:19 +0200)]
LU-16249 sec: krb5_decrypt_bulk calls decryption primitive

krb5_decrypt_bulk() was mistakenly calling an encryption primitive
instead of a decryption primitive for the confounder.

Lustre-change: https://review.whamcloud.com/48907
Lustre-commit: TBD (851f3915659941db00a0cda58867e68139e5e0d1)

Test-Parameters: trivial
Fixes: 0a65279121 ("LU-13344 gss: Update crypto to use sync_skcipher")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9251172644ed6baa3bb06a59dbe7c1bab401d817
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15097 quota: stop pool_recalc before killing pool
Sergey Cheremencev [Wed, 19 Oct 2022 11:18:04 +0000 (19:18 +0800)]
LU-15097 quota: stop pool_recalc before killing pool

qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:

    qmt_pool_fini() ASSERTION(list_empty(&qmt->qmt_pool_list)) failed

Lustre-change: https://review.whamcloud.com/45256
Lustre-commit: 862f0baa7c21cb631b98d3886ef9e938f4519573

Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/46656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4567 kernel: add extra field for snapshot in el8
Hongchao Zhang [Fri, 21 Oct 2022 07:43:11 +0000 (03:43 -0400)]
EX-4567 kernel: add extra field for snapshot in el8

Adding extra fields in "struct jbd2_journal_handle" and
"struct journal_head", which are used by snapshot into the
4-byte hole at the end of struct jbd2_journal_handle so
that they do not increase the structure size and memory
usage for this common allocation.

Use RH_KABI_EXTEND() and RH_KABI_FILL_HOLE() so that the
new fields do not affect the kernel ABI compatibility.

Change-Id: I84f52b18694e56d837d64c5c80076e45dde27eab
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48880
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6102 lipe: lipe_scan3 not intended for customer use
Alexandre Ioffe [Tue, 25 Oct 2022 03:06:08 +0000 (20:06 -0700)]
EX-6102 lipe: lipe_scan3 not intended for customer use

Print warning lipe_scan3 is not intended for customer use

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I92f775d77e1d4ffac304d3e46ed6af7c642a3bdd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-11388 tests: exclude replay-single/131b for ldiskfs
Andreas Dilger [Fri, 14 Oct 2022 21:09:03 +0000 (15:09 -0600)]
LU-11388 tests: exclude replay-single/131b for ldiskfs

Test is failing about 1/10 of the test runs, even on ldiskfs.

Test-Parameters: trivial testlist=replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9c36d026944876e066a1dc36877927b7a92c537e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48946

2 years agoEX-5099 lipe: Made controllable ssh exec timeout
Alexandre Ioffe [Wed, 13 Apr 2022 05:34:18 +0000 (22:34 -0700)]
EX-5099 lipe: Made controllable ssh exec timeout

- Introduce new lipe ssh API:lipe_ssh_exec_timeout() and
lipe_ssh_start_cmd_timeout().
- Introduce new lamigo command option: --ssh-exec-timeout
to configure ssh connection timeout for ssh exec cmd
- Use lipe_ssh_start_cmd_timeout() to start remote
access log reader with timeout.
Use ssh_channel_read_timeout() with infinite timeout
when reads access log records
- Use lipe_ssh_start_cmd_timeout() to start remote "lfs ..."
commands with a long timeout to prevent premature timeout
when "lfs mirror extend ..." command for a big file
takes too long time.
- Use default timeout 600 seconds for ssh exec cmd.
Such long timeout should allow to finish long lasting
replications
This fixes EX-5429.

Test-Parameters: trivial clientdistro=el8.5 serverdistro=el8.5 testlist=hot-pools env=FAIL_ON_ERROR=false,ONLY="56 59",ONLY_REPEAT=20
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I8de9b1db2014abd1e6f201cda73a0812128f6bb6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47057
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]
Jian Yu [Fri, 21 Oct 2022 20:35:40 +0000 (13:35 -0700)]
LU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]

Update SLES15 SP3 kernel to 5.3.18-150300.59.93.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Lustre-change: https://review.whamcloud.com/48601
Lustre-commit: c3467db7e7d0652c09bdcef26e2b708ab51cba9e

Change-Id: I1e0afe6974567d13680dbb0d463fbbd873ef2e5f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48864
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed
Andreas Dilger [Thu, 6 Oct 2022 17:31:51 +0000 (10:31 -0700)]
LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed

This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.

Lustre-change: https://review.whamcloud.com/48629
Lustre-commit: 9a3e111a2ebdfadec4b6efc65899856edc90ad18

Test-Parameters: testlist=sanity clientdistro=el8.6
Test-Parameters: testlist=sanity clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"
Change-Id: I50f56b87844e8b019053e569767b6c949d2a3f55
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Lustre-change: https://review.whamcloud.com/44984
Lustre-commit: 1711e26ae861c28829870c2433caf7ee232909cf

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48923
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16044 osd: discard pagecache in truncate's declaration
Alex Zhuravlev [Mon, 25 Jul 2022 13:26:40 +0000 (16:26 +0300)]
LU-16044 osd: discard pagecache in truncate's declaration

to avoid taking pagelock inside a transaction which conflicts
with the write path where we take pagelock before any another one.
this should be safe as the write path writes the pages out
synchronously, so they should be clean by truncate.

Lustre-change: https://review.whamcloud.com/48033
Lustre-commit: 0bb491b2ecf494c3f78fa08a101af8af7853a0fe

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iba555ace2ce9ef34ab5517375ecb5c176f738a02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48885
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16076 utils: enhance 'lfs check' command
Lei Feng [Mon, 8 Aug 2022 02:59:25 +0000 (10:59 +0800)]
LU-16076 utils: enhance 'lfs check' command

Add optional argument to 'lfs check' command so that only the
servers related to the specified lustre file system is checked.

lustre-change: https://review.whamcloud.com/48155
lustre-commit: f5ca6853b8d8b918b0228af31fa8249be49d3000

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=113
Change-Id: I826a8e822af0a290f06ffaadadf1bb7f86899d99
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15305 obdclass: fix race in class_del_profile
Li Dongyang [Fri, 7 Oct 2022 12:09:10 +0000 (23:09 +1100)]
LU-15305 obdclass: fix race in class_del_profile

Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.

Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.

Lustre-change: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Lustre-commit: 83d3f42118579d7fb7c3002533c047badcf41e0d

Change-Id: I648aa206716213b064d045f546516b219337e0ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15467 tests: fix sanity-hsm test_103a timeout issue
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue

Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.

Lustre-change: https://review.whamcloud.com/46252
Lustre-commit: 98e1e41ce47c95155a8c8d452eef5074492d22f0

Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15646 llog: correct llog FID and path output
Mikhail Pershin [Mon, 17 Oct 2022 19:53:44 +0000 (12:53 -0700)]
LU-15646 llog: correct llog FID and path output

- fix wrong LLOG_ID-to-FID convertion to output llog FID by
  introducing PLOGID macro to expand llog ID for DFID format
- stop printing lgl_ogen along with llog FID as it always zero
  since 2.3.51 and is not used anymore
- output correct path for update llog in llog_reader
- always print header info in llog_reader if available
- print llog flags in header info

Lustre-change: https://review.whamcloud.com/48430
Lustre-commit: e28f3ee185b2ef7bad8046f46444772fac214a40

Fixes: 5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-6612 utils: strengthen llog_reader vs wrong format/header
Bruno Faccini [Mon, 17 Oct 2022 19:46:25 +0000 (12:46 -0700)]
LU-6612 utils: strengthen llog_reader vs wrong format/header

The following snippet shows that llog_reader can be puzzled due to
an invalid 0 for the number of records when parsing an expected
LLOG file header :
root# dd if=/dev/zero bs=4096 count=1 of=/tmp/zeroes
1+0 records in
1+0 records out
4096 bytes (4.1 kB) copied, 0.000263962 s, 15.5 MB/s
root# llog_reader /tmp/zeroes
Memory Alloc for recs_buf error.
Could not pack buffer; rc=-12

Lustre-change: https://review.whamcloud.com/15654
Lustre-commit: 45291b8c06eebf33d3654db3a7d3cfc5836004a6

Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Bruno Faccini <bruno.faccini@intel.com>
Change-Id: I12be79e6c6a5da384a5fd81878a76a7ea8aa5834
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48895
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15000 llog: read canceled records in llog_backup
Etienne AUJAMES [Mon, 17 Oct 2022 19:37:39 +0000 (12:37 -0700)]
LU-15000 llog: read canceled records in llog_backup

llog_backup() do not reproduce index "holes" in the generated copy.
This could result to a llog copy indexes different from the source.
Then it might confuse the configuration update mechanism that rely on
indexes between the MGS source and the target copy.

This index gaps can be caused by "lctl --device MGS llog_cancel".

This patch add "raw" read mode to llog_process* to read canceled
records. So now llog_backup is able to reproduce an exact copy of
the original.

Lustre-change: https://review.whamcloud.com/46552
Lustre-commit: d8e2723b4e9409954846939026c599b0b1170e6e

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I811e23de8f4545bed36a44fedc2638d7418830dd
Reviewed-by: Dominique Martinet <qhufhnrynczannqp.f@noclue.notk.org>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Stephane Thiell <sthiell@stanford.edu>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14098 obdclass: try to skip corrupted llog records
Alex Zhuravlev [Mon, 17 Oct 2022 19:31:56 +0000 (12:31 -0700)]
LU-14098 obdclass: try to skip corrupted llog records

if llog's header or record is found corrupted, then
ignore the remaining records and try with the next one.

Lustre-change: https://review.whamcloud.com/40754
Lustre-commit: 910eb97c1b43a44a9da2ae14c3b83e28ca6342fc

Fixes: 186f083722 ("LU-11924 osp: combine llog cancel operations")

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If47ec1fc1e2eaf64be7ba08d3aa9c2b93903c0cf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-14044 llog: check fid after convert
Yang Sheng [Mon, 17 Oct 2022 18:53:47 +0000 (11:53 -0700)]
LU-14044 llog: check fid after convert

We should convert from llog_id and then check fid. Also
change fid-lookup to error check instead LASSERT.

Lustre-change: https://review.whamcloud.com/40294
Lustre-commit: 6df76d3357fc5896b6902399ed7ce6d7c7835f58

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I673d8f16ff9e57a0482d6a3ec3ee3db33699f57f
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48892
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-5909 tests: clean up in sanity-quota/16a
Andreas Dilger [Fri, 14 Oct 2022 22:04:53 +0000 (16:04 -0600)]
EX-5909 tests: clean up in sanity-quota/16a

Clean up the test file in sanity-quota test_16a.  If test_16b is
run (DNE config) then the filesystem is reformatted, but in the
non-DNE config test_17 will fail if there is used quota.

Test-Parameters: trivial testlist=sanity-quota
Fixes: b54b7ce43929 ("LU-14472 quota: skip non-exist or inact tgt for lfs_quota")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id1faeab9df246d8010bf114582ab17a75846db68
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48899
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn64
Andreas Dilger [Fri, 14 Oct 2022 20:06:26 +0000 (14:06 -0600)]
RM-620 build: New tag 2.14.0-ddn64

New tag 2.14.0-ddn64

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia86edfc375e1dda7205db1a32c8c1933153a3e92

2 years agoLU-15738 test: check lfsck status before starting
Hongchao Zhang [Fri, 22 Jul 2022 15:02:24 +0000 (23:02 +0800)]
LU-15738 test: check lfsck status before starting

If the LFSCK has been started before calling "lfsck_start"
to start it, the test shouldn't fail for starting LFSCK.

Lustre-change: https://review.whamcloud.com/48018/
Lustre-commit: 29aaf679afac89359e1b116b8de0480f24b4e8ac

Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I266d9e2b9c5f37eb9e08b489fab428268b90d895
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5964 lamigo: disable idle disconnects
Alex Zhuravlev [Mon, 19 Sep 2022 16:00:15 +0000 (19:00 +0300)]
EX-5964 lamigo: disable idle disconnects

on the connections lamigo uses locally to avoid storms
of reconnects.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I3bc2742853e9636e38fbd8f7c2f238b3af55e0ba
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48840
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoEX-3142 tests: changelog processing verification
Alex Zhuravlev [Fri, 6 Aug 2021 06:34:31 +0000 (09:34 +0300)]
EX-3142 tests: changelog processing verification

add extra counter to lamigo stats to catch gaps in changelog
processing. add a new test (hot-pools/60) to verify that no
gaps happen (i.e. lamigo gets all changelog records), verify
that the changelog is purged properly.

Test-Parameters: trivial testlist=hot-pools mdscount=2 mdtcount=4
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I34d9d6f6f7f5766d945df43ae7d43dab7c70cef1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13578 test: sleep longer in sanity test_39
John L. Hammond [Wed, 8 Jun 2022 02:15:39 +0000 (19:15 -0700)]
LU-13578 test: sleep longer in sanity test_39

In sanity test_39r(), sleep for 2 * atime_diff rather than atime_diff + 1.

Lustre-change: https://review.whamcloud.com/47346
Lustre-commit: be2525ffddb4bf55fde77e97b00d1c349119daed

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=50
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ied508e12c848f6935d2317fb86bddc5341a6156e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48831
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
2 years agoLU-15472 ldlm: optimize flock reprocess
Andriy Skulysh [Fri, 5 Nov 2021 10:55:08 +0000 (12:55 +0200)]
LU-15472 ldlm: optimize flock reprocess

Resource reprocess on flock unlock can be done once
after all pending unlock requests.
It allows to reduce spinlock contention.

Lustre-change: https://review.whamcloud.com/46257
Lustre-commit: 42f377db4a24cefa7a041fcd3106dd58771eb319

Change-Id: I2809070f27fe3af7e1fc34e2b4b22603931f3dff
HPE-bug-id: LUS-10471, LUS-10909
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48818
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15132 mdc: Use early cancels for hsm requests
Etienne AUJAMES [Mon, 2 May 2022 12:27:17 +0000 (14:27 +0200)]
LU-15132 mdc: Use early cancels for hsm requests

HSM RELEASE and RESTORE requests take EX layout lock on the MDT side.
So the client can use early cancel for its local lock on the resource
to limit the contention (mdt side).

This patch does not pack ldlm request inside the hsm request because
the field (RMF_DLM_REQ) does not exist in the request. Adding this
field inside the request would break compatibility with _old_ servers.

Lustre-change: https://review.whamcloud.com/47181
Lustre-commit: 60d2a4b0efa4a944b558bd9b63b6334f7e70419b

Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I30a57b4855c28eef9c55a9645d3b6c491f962b13
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48652
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE
Serguei Smirnov [Thu, 8 Sep 2022 22:27:12 +0000 (15:27 -0700)]
LU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE

RDMA_CM_EVENT_UNREACHABLE may be received not only when connection
is being connected, but also when it is being closed. Fix handing
of this event accordingly.

Lustre-change: https://review.whamcloud.com/48492
Lustre-commit: 3925b1669d519e6c038ecce1287c1ced3de623d3

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I79428188c159b2d80d36326589b2977db065d4a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48827
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14428 libcfs: discard cfs_trace_copyin_string()
Alex Zhuravlev [Wed, 12 Oct 2022 06:35:42 +0000 (09:35 +0300)]
LU-14428 libcfs: discard cfs_trace_copyin_string()

Instead of cfs_trace_copyin_string(), use memdup_user_nul().
This combines the allocation with the copyin, and nul-terminates.

The resulting code is a lot simpler.

Lustre-change: https://review.whamcloud.com/41490
Lustre-commit: 67af976c806994cec27414d24b43f6519d72c240

LU-14788 lnet: check memdup_user_nul using IS_ERR

Crash in __proc_lnet_portal_rotor. memdup_user_nul returns an ERR_PTR
on error, not a NULL pointer. IS_ERR and PTR_ERR functions have to be
used to check and return the correct error code. The fix has been
applied in other locations having the wrong check.

Lustre-change: https://review.whamcloud.com/44091
Lustre-commit: 449d046e55a42cc4d1c4ab0217551cded1864bc4

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I089c5da96b59ec62d177aea2f3d170bf751c6fec
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13974 tests: update log corruption
Alexander Boyko [Tue, 24 Nov 2020 09:05:36 +0000 (04:05 -0500)]
LU-13974 tests: update log corruption

Test case reproduce missing object for sub transaction during
set xattr operation.
First setattr got -2, second already started, but didn't
make llog_add yet. In this case llog osp object is stale after
top_trans_start. So declaration phase can not refresh llogs. And
at llog_osd_write_rec osp object changes stale state to
valid(dt_attr_get), but llog handle and llog header are invalid.
A new record would be added to updatelog with wrong index.
In that case processing of update log fails with

fs1-MDT0001-osp-MDT0003: [0x2:0x400024d0:0x2] Invalid record: index
112926 but expected 112925
lod_sub_recovery_thread()) fs1-MDT0001-osp-MDT0003 get update log
failed: rc = -34
Recovery aborted, and clients are evicted.

Lustre-change: https://review.whamcloud.com/40743
Lustre-commit: 562837124ec7bffeba7edb4b4b899bc271833374

HPE-bug-id: LUS-9030
Test-Parameters: testlist=sanity  envdefinitions=ONLY="427"
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I6a47fed1bc01f4be62216d1d0787adc413df0cf5
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48832
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-8621 utils: cmd help to stdout or short cmd error
Aleksei Alyaev [Thu, 23 Dec 2021 08:48:22 +0000 (11:48 +0300)]
LU-8621 utils: cmd help to stdout or short cmd error

- Changed to print command help to stdout
- Changed to output short error message for an unrecognized command

Lustre-change: https://review.whamcloud.com/47162/
Lustre-commit: bc69a8d058f5bcdb75e062df57a6ccd23243d1e0

Test-Parameters: trivial
Signed-off-by: Aleksei Alyaev <aalyaev@ddn.com>
Change-Id: I67616ddb576e3347a2da130b3a731a6bf8730185
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48851
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16233 build: Add always target for SUSE15 SP3 LTSS
Shaun Tancheff [Thu, 13 Oct 2022 19:19:47 +0000 (12:19 -0700)]
LU-16233 build: Add always target for SUSE15 SP3 LTSS

SUSE 15 SP3 LTSS kernel version 5.3.18-150300.59.93
(and later) breaks lustre build tests which expect
conftest.i to be generated.

Lustre-change: https://review.whamcloud.com/48833
Lustre-commit: TBD (from 274b34c4d3a20937ebb17d139dbde0eaaed503b2)

HPE-bug-id: LUS-11286
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If23e9b31b537878a43075ffff62a99906f47fd9a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2]
Jian Yu [Wed, 28 Sep 2022 07:00:22 +0000 (00:00 -0700)]
LU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2]

Update SLES15 SP4 kernel to 5.14.21-150400.24.21.2 for Lustre client.

Lustre-change: https://review.whamcloud.com/48604
Lustre-commit: TBD (from 896fd88c35b6685a586c1279c83c739b48cbe846)

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Change-Id: Ia68e1c960c79f40d0f725b0f440cd562b820a19f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48689
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0]
Jian Yu [Wed, 28 Sep 2022 06:46:30 +0000 (23:46 -0700)]
LU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.26.1.el9_0 for Lustre client.

Lustre-change: https://review.whamcloud.com/48676
Lustre-commit: TBD (from 9951a56c26b1ce6639cd2db350fdf6b81b3b4707)

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="101j 130 244a" testlist=sanity

Change-Id: I9da2ccdf419d6490fdba80199eda69f4f19361be
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6130 lipe: s_volume_name not NUL terminated
Alexandre Ioffe [Wed, 12 Oct 2022 00:53:20 +0000 (17:53 -0700)]
EX-6130 lipe: s_volume_name not NUL terminated

s_volume_name field stores string, but the field may have no
termination NUL if the string size equal the size of the field.
Therefore on some target systems the definition of
struct ext2_super_block s_volume_name in
/usr/include/ext2fs/ext2_fs.h may have
attribute "nonstring". In such case it conflicts with calls
which require NUL terminated string.
The fix replaces NUL-terminated string calls by calls with
limited string size (e.g. strlen() -> strnlen())

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ieb1921a289328a8f9bfae9bb658c6c74f8ec43b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48829
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn63
Andreas Dilger [Tue, 11 Oct 2022 08:04:59 +0000 (02:04 -0600)]
RM-620 build: New tag 2.14.0-ddn63

New tag 2.14.0-ddn63

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e4b8d3d863cbd504fc7470b413d2083bb15e371

2 years agoLU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain
Etienne AUJAMES [Wed, 5 Oct 2022 07:10:05 +0000 (00:10 -0700)]
LU-15481 llog: Add LLOG_SKIP_PLAIN to skip llog plain

Add the catalog callback return LLOG_SKIP_PLAIN to conditionally skip
an entire llog plain.

This could speedup the catalog processing for specific usages when a
record need to be access in the "middle" of the catalog. This could
be usefull for changelog with several users or HSM.

This patch modify chlg_read_cat_process_cb() to use LLOG_SKIP_PLAIN.
The main idea came from: d813c75d ("LU-14688 mdt: changelog purge
deletes plain llog")

**Performance test:**

* Environement:
2474195 changelogs record store on the mds0 (40 llog plain):
mds# lctl get_param -n mdd.lustrefs-MDT0000.changelog_users
current index: 2474195
ID    index (idle seconds)
cl1   0 (3509)

* Test
Access to records at the end of the catalog (offset: 2474194):
client# time lfs changelog lustrefs-MDT0000 2474194 >/dev/null

* Results
- with the patch:  real    0m0.592s
- without the patch: real    0m17.835s (x30)

Lustre-change: https://review.whamcloud.com/46310
Lustre-commit: aa22a6826ee521ab14994a4533b0dbffb529aab0

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I887d5bef1f3a6a31c46bc58959e0f508266c53d2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6033 lipe: Add note to developers for HP scripts
Gaurang Tapase [Thu, 29 Sep 2022 05:57:45 +0000 (23:57 -0600)]
EX-6033 lipe: Add note to developers for HP scripts

stratagem-hp-* scripts are moved to EMF repo as
they are tightly coupled with EXA release because of
HA configuration. They are kept in lustre repo so that
hotfixes should not delete them.

Test-Parameters: trivial
Change-Id: I33eecaa4ed0c9342a83973bac313322a007d72d0
Signed-off-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48698
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5936 pcc: dont take UPDATE lock when set lustre.pin xattr
Qian Yingjin [Fri, 23 Sep 2022 09:34:09 +0000 (05:34 -0400)]
EX-5936 pcc: dont take UPDATE lock when set lustre.pin xattr

In this patch, we do not take UPDATE lock whan set lustre.pin
XATTR during the PCC pin command.
The reason is that it may revoke the combined UPDATE|LAYOUT lock
cached on the client namespace, and invalidate the layout and PCC
cache.

As we disable to cache lustre.pin xattr on the client XATTR cache,
so it does not cause problem without taking UPDATE lock bit during
set lustre.pin XATTR.

Add test case: sanity-pcc/204d.

Change-Id: I35a0e399294020efdb0e4710500e8f7b846c290f
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14599 osp: limit allocation at osp_sync_process_committed
Alexander Boyko [Wed, 5 Oct 2022 07:06:59 +0000 (00:06 -0700)]
LU-14599 osp: limit allocation at osp_sync_process_committed

Sometimes osp cancels very large cookie list with 64K elements.
In this case osp_sync_process_committed() tries to allocate 64 pages
and uses vmalloc.
The fix limits memory allocation size to 4 page with kmalloc, and
reuse it in a loop.

Lustre-change: https://review.whamcloud.com/43250
Lustre-commit: 9b692e2e7d105f4926649ea46007ac65b24c4b6d

HPE-bug-id: LUS-9815
Fixes: 6d7332102 ("LU-11924 osp: combine llog cancel operations")
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic875335a28f78494fdb3cbc4b0145e5a43831ee8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48773
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16135 lod: prohibit DoM pattern in plain layout
Mikhail Pershin [Mon, 5 Sep 2022 07:41:37 +0000 (10:41 +0300)]
LU-16135 lod: prohibit DoM pattern in plain layout

DoM pattern can be set as default directory plain layout by
older LFS version. It misses DoM component sanity checks if
plain layout is used. Such layout is not allowed and causes
later crashed when file is created under that directory.

While LFS can prevent this but not in all Lustre versions,
so LOD should do the check as well

Lustre-change: https://review.whamcloud.com/48433
Lustre-commit: a8272168e3888ec4ced18035182159a8ee56a51a

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic58fdda2ab3e63083128cb6cf949fcb43ccd2c02
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15132 hsm: Protect against parallel HSM restore requests
Etienne AUJAMES [Thu, 21 Oct 2021 14:31:01 +0000 (16:31 +0200)]
LU-15132 hsm: Protect against parallel HSM restore requests

Multiple parallel accesses (read/write) to the same released file
could cause multiple HSM restore requests to be sent.
On the MDT side, each restore request waits the first one to complete
before grabbing the MDS_INODELOCK_LAYOUT LCK_EX and registering the
llog record.

This could cause several MDT threads to hang for the same restore
request sent in parallel. In the worst case, all MDT threads can
hang and the MDS is not longer able to handle requests.

This patch checks if an HSM restore handle exists before taking the
lock.

Lustre-change: https://review.whamcloud.com/45367
Lustre-commit: 66b3e74bccf1451d135b7f331459b6af1c06431b

Test-Parameters: testlist=sanity-hsm,sanity-hsm
Test-Parameters: testlist=sanity-hsm env=ONLY=12s,ONLY_REPEAT=50
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: I9584edc2c7411aa41b2e318e55f57c117d1c3dfb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48650
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16062 ldlm: improve bl_timeout for prolong
Vitaly Fertman [Tue, 4 Oct 2022 17:30:08 +0000 (10:30 -0700)]
LU-16062 ldlm: improve bl_timeout for prolong

If there is a client's RPC in hand, we can do a better job for
calculating the lock callback timeout as RPC has the info what
client thinks about this RPC timeout. Let's use it.

Lustre-change: https://review.whamcloud.com/48094
Lustre-commit: 34b2246e4a6c8ce827c404cb4e52f7c6a0a1b90b

HPE-bug-id: LUS-8866, LUS-11074
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ibd67d37c1073d0d3cb2e08b532c801af0de116fe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-14183 ldlm: wrong ldlm_add_waiting_lock usage
Vitaly Fertman [Tue, 4 Oct 2022 17:24:31 +0000 (10:24 -0700)]
LU-14183 ldlm: wrong ldlm_add_waiting_lock usage

exp_bl_lock_at accounted the period since BLAST send until cancel RPC
came to server originally. LU-6032 started to update l_blast_sent for
expired locks which are still busy - prolonged locks when the timeout
expired. In fact, this is a good idea to cover not the whole period
but until any involved RPC comes - it avoids excessively large lock
callback timeouts - and the IO which does the lock prolong is also
able to re-start the AT cycle by updating the l_blast_sent.

Unfortunately, the change seems to be made occasionally as the main
prolong code was not adjusted accordingly.

Lustre-change: https://review.whamcloud.com/40868
Lustre-commit: af07c9a79e263f940fea06a911803097b57b55f4

Fixes: 292aa42e08 ("LU-6032 ldlm: don't disable softirq for exp_rpc_lock")
HPE-bug-id: LUS-9278
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Idc598508fc13aa33ac9fce56f13310ca6fc819d4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()
Lei Feng [Thu, 30 Jun 2022 02:46:31 +0000 (10:46 +0800)]
LU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()

There is a race condition that: on server side, one thread sent
early replay and is deleting the reply message, another is
searching for existing request and print some debug information
in _debug_req() if there is a duplicated request. They both operate on
req->rq_repmsg but it is not protected in ptlrpc_req_drop_rs().
So we protected it with req->rq_early_free_lock.

Lustre-change: https://review.whamcloud.com/47839
Lustre-commit: aaef545cff2dd958418ec9fb364d4bbe1408edb9

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ied55427ee15c3ef84bdd2d579844eba398dbf010
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/47860
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16166 ptlrpc: lower the message level in no resend case
Yang Sheng [Mon, 19 Sep 2022 05:46:27 +0000 (13:46 +0800)]
LU-16166 ptlrpc: lower the message level in no resend case

Don't report the wrong generation as a error message in
rq_no_resend case.

Lustre-change: https://review.whamcloud.com/48585
Lustre-commit: d13cca56a5ae2ad44d8083025e37263e408b8f62

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I534cadc916fcd1eb6840439b6507e646d0e5d974
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6069 ldiskfs: ext4-simple-blockalloc.patch small fixes
Artem Blagodarenko [Wed, 28 Sep 2022 14:28:11 +0000 (10:28 -0400)]
EX-6069 ldiskfs: ext4-simple-blockalloc.patch small fixes

The LU-14305 requires cleanup to do.
MB_DEFAULT_MAX_CX_BYTES #defines are not used anymore,
and should be removed.

Also, in the el8 version of the patch for b_es6_0,
the THRESHOLD_BLOCKS() function should explicitly take "sbi"
as a parameter.

Test-Parameters: trivial
Fixes: d5d5cfdde2 ("add persistent tuning for mb_c3_threshold")
Change-Id: Idcb93432fdfa7694b4e7cabbf46a0bf21a412f87
Signed-off-by:Artem Blagodarenko <ablagodarenko@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48714
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-16184 o2iblnd: fix deadline for tx on peer queue
Serguei Smirnov [Fri, 23 Sep 2022 19:29:59 +0000 (12:29 -0700)]
LU-16184 o2iblnd: fix deadline for tx on peer queue

In o2iblnd, deadline is checked for txs on peer queue,
but not set prior to adding the tx to the queue. This
may cause the tx to be dropped unnecessarily with
"Timed out tx for ..." warning.

Fix it by setting the tx_deadline when adding tx to peer queue.

Lustre-change: https://review.whamcloud.com/48640
Lustre-commit: 4c89ee7d7b098c7f1e6566f49fa2940db577518d

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie7cf5590b440b60f71527049953a64bb31d53578
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/48641
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
2 years agoLU-16160 osc: take ldlm lock when queue sync pages
Bobi Jam [Thu, 15 Sep 2022 06:46:34 +0000 (14:46 +0800)]
LU-16160 osc: take ldlm lock when queue sync pages

osc_queue_sync_pages() add osc_extent to osc_object's IO extent
list without taking ldlm locks, and then it calls
osc_io_unplug_async() to queue the IO work for the client.

This patch make sync page queuing take ldlm lock in the
osc_extent.

Lustre-change: https://review.whamcloud.com/48557
Lustre-commit: 67aca1fcc6bed20794832decdba590a758d67d8fp

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idefa2981e62a2a6e10d8b8a7692c0337b61b9052
Reviewed-on: https://review.whamcloud.com/48597
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5932 lipe: stratagem-hp-config.sh has wrong MDTLIST
Alexandre Ioffe [Wed, 21 Sep 2022 19:15:03 +0000 (12:15 -0700)]
EX-5932 lipe: stratagem-hp-config.sh has wrong MDTLIST

stratagem-hp-config.sh doesn't pick up proper MDTLIST
if snapshot agents are running. Fix MDTLIST which is used
to configure lpurge

Test-Parameters: trivial
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ic1d58d56f1acae140122d0b582410c140759e89e
Reviewed-on: https://review.whamcloud.com/48619
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16154 obdclass: free inst_name correctly
Emoly Liu [Thu, 15 Sep 2022 01:42:47 +0000 (09:42 +0800)]
LU-16154 obdclass: free inst_name correctly

In functon class_config_llog_handler(), inst_name should be freed
correctly before break.

Lustre-change: https://review.whamcloud.com/48542
Lustre-commit: e7f17c5e0c95dba3b80e192e4ca3628cc42e64b9

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I6adc0ed62c3c637237834b799f25666d0e7e1ecb
Reviewed-on: https://review.whamcloud.com/48670
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-16050 build: replace ofed_info with dpkg/rpm
Jian Yu [Mon, 26 Sep 2022 18:22:56 +0000 (11:22 -0700)]
LU-16050 build: replace ofed_info with dpkg/rpm

After installing MLNX_OFED by running mlnxofedinstall command,
mlnx-ofed-kernel-modules package is not listed by ofed_info,
which causes Lustre configure fail as follows:

checking whether to use Compat RDMA... /usr/bin/ofed_info
dpkg-query: error: --listfiles needs at least one package name argument

This patch fixes the above issue by replacing ofed_info with
"dpkg -l" and "rpm -qa" commands to find OFED package.

Lustre-change: https://review.whamcloud.com/48047
Lustre-commit: 3a7930e63c15b0fbe51ac73db81a1186939115bb

Test-Parameters: trivial
Fixes: ec03c9628cae ("LU-15417 build: find the new path for MOFED 5.5")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ia3c2d6bf10e147ca2761221741eff6f93008556c
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-6014 tests: Revert "EX-4093 tests: hot-pools don't recreate pools"
Jian Yu [Wed, 28 Sep 2022 16:54:33 +0000 (09:54 -0700)]
EX-6014 tests: Revert "EX-4093 tests: hot-pools don't recreate pools"

This reverts commit 116cbacc52d8 to resolve the hot-pools
regression test failures.

After running sub-test 1, the OST pools were destroyed by
the following stack_trap in create_pool():

  stack_trap "destroy_test_pools $fsname" EXIT

If the pools are not recreated in the successive sub-tests,
then they will fail. We have to revert commit 116cbacc52d8
before we find out a way to avoid triggering the stack_trap
between sub-tests.

Test-Parameters: trivial mdscount=2 mdtcount=4 \
testlist=parallel-scale-nfsv4,hot-pools

Fixes: 116cbacc52d8 ("EX-4093 tests: hot-pools don't recreate pools")
Change-Id: I464a1f9f380c55e70b78a0dd7e52723d5b0a298d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48690
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn62
Andreas Dilger [Fri, 23 Sep 2022 22:24:58 +0000 (16:24 -0600)]
RM-620 build: New tag 2.14.0-ddn62

New tag 2.14.0-ddn62

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I21b71b04905a70acbaada6d5a7fbab6c9184ca51

2 years agoRevert "EX-4141 lipe: lamigo should detect dead OST and restart ALR"
Andreas Dilger [Fri, 23 Sep 2022 19:36:53 +0000 (19:36 +0000)]
Revert "EX-4141 lipe: lamigo should detect dead OST and restart ALR"

This reverts commit 028bee14d2c6d8feb5eb418302f8751643e731c6 due to build error.

Change-Id: I6193f3e99192b618a3e6616524e28b230659fc0b
Reviewed-on: https://review.whamcloud.com/48639
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoRM-620 build: New tag 2.14.0-ddn61
Andreas Dilger [Fri, 23 Sep 2022 17:19:23 +0000 (11:19 -0600)]
RM-620 build: New tag 2.14.0-ddn61

New tag 2.14.0-ddn61

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34c78bc6ce2fbac65e4e8b017cad1da05c78d53a

2 years agoLU-16183 tests: sanity-hsm/70 should detect python
Minh Diep [Thu, 15 Sep 2022 03:41:37 +0000 (20:41 -0700)]
LU-16183 tests: sanity-hsm/70 should detect python

Check for python2 and python3 explicitly, since the
generic python command does not exist in newer distros.

Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=sles15sp3 testlist=sanity-hsm
Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=el7.9 testlist=sanity-hsm
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Change-Id: I2251be461129310868868277bf9d46015545ffe2
Reviewed-on: https://review.whamcloud.com/48577
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4141 lipe: lamigo should detect dead OST and restart ALR
Alexandre Ioffe [Tue, 29 Mar 2022 07:48:35 +0000 (00:48 -0700)]
EX-4141 lipe: lamigo should detect dead OST and restart ALR

Use #keepalive message and ssh read with timeout
to detect OST is down and restart ALR.
Add stats for ALR last seen message
Duplicate ofd_access_log_reader from lustre/utils into
lipe/src/es_ofd_access_log_reader
Use common lamigo_hash.h for lamigo and
es_ofd_access_log_reader

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial testlist=hot-pools
Change-Id: I26dc631a8663046821e049fc6e091108b2a62f87
Reviewed-on: https://review.whamcloud.com/46944
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John Hammond <jhammond@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14962 lnet: Check for -ESHUTDOWN in lnet_parse
Chris Horn [Tue, 24 Aug 2021 16:16:17 +0000 (11:16 -0500)]
LU-14962 lnet: Check for -ESHUTDOWN in lnet_parse

The fix for LU-8106, http://review.whamcloud.com/19993, no longer
works because rc does not have the return value from
lnet_nid2peerni_locked(). Use PTR_ERR to get the return value and
restore the LU-8106 fix.

Lustre-change: https://review.whamcloud.com/44743
Lustre-commit: cce82630cbf2c7badbbdd16a8ca9c8c0065ded13

Test-Parameters: trivial
HPE-bug-id: LUS-10333
Fixes: fa8b4e6357 ("LU-7734 lnet: peer/peer_ni handling adjustments")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I9cc2bc2d6e675d38cf06d99c524bdd95110bf0e9
Reviewed-on: https://review.whamcloud.com/48487
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15618 lnet: Return ESHUTDOWN in lnet_parse()
Chris Horn [Thu, 3 Mar 2022 07:12:32 +0000 (01:12 -0600)]
LU-15618 lnet: Return ESHUTDOWN in lnet_parse()

If the peer NI lookup in lnet_parse() fails with ESHUTDOWN then we
should return that value back to the LNDs so that they can treat the
failed call the same way as other lnet_parse() failures.

Returning zero results in at least one bug in socklnd where a
reference on a ksock_conn can be leaked which prevents socklnd from
shutting down.

Lustre-change: https://review.whamcloud.com/46711
Lustre-commit: 4fbd0705a3d25bbc85e953f81e697e5006b215ce

Fixes: 47b7b31978 ("LU-8106 lnet: Do not drop message when shutting down LNet")
Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-15794
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic403619c6dccf3921c46a674808c404adad7a30e
Reviewed-on: https://review.whamcloud.com/48485
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15616 lnet: ln_api_mutex deadlocks
Chris Horn [Mon, 7 Mar 2022 17:03:50 +0000 (11:03 -0600)]
LU-15616 lnet: ln_api_mutex deadlocks

LNetNIFini() acquires the ln_api_mutex and holds onto it throughout
various shutdown routines. Meanwhile, LND threads (via
lnet_nid2peerni_locked()) or the discovery thread (via
lnet_peer_data_present()) may need to acquire this mutex in order to
progress.

Address these potential deadlocks by setting the_lnet.ln_state to
LNET_STATE_STOPPING earlier in LNetNIFini(), and release the mutex
prior to any call into LND module or before any wait.

LNetNIInit() is modified to return -ESHUTDOWN if it finds that there
is a concurrent shutdown in progress.

Lustre-change: https://review.whamcloud.com/46727
Lustre-commit: 22de0bd145b649768b16dd42559d326af3c13200

Test-Parameters: trivial
HPE-bug-id: LUS-10681
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia8b28cc95ff71e66a0f99aed4f2c22ec9d44ce1e
Reviewed-on: https://review.whamcloud.com/48384
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy
Chris Horn [Fri, 11 Dec 2020 18:04:32 +0000 (12:04 -0600)]
LU-13806 lnet: Ensure proper peer, peer NI, peer net hierarchy

The MR design dictates that the peer nets and peer NIs are ordered
such that the peer net and peer NI for a peer's primary NID appears
first, followed by other peer NIs in the primary NID's peer net,
followed by other peer nets/NIs. This ordering is broken and it can
result in tripping an assertion if the primary NID of a peer is
deleted. Modify lnet_peer_attach_peer_ni() to check whether the
NI being attached is the peer's primary, and place it, and its
associated peer net, appropriately.

Modify lnet_peer_set_primary_nid() so that it updates the
lp_primary_nid before calling lnet_peer_add_nid() so that
lnet_peer_attach_peer_ni() can detect the situation where the
primary is changing and act appropriately.

Finally, modify lnet_peer_merge_data() to enforce the hierarchy
after it has finished merging the contents of the ping buffer. This
ensures we maintain the correct hierarchy in certain edge cases where
we've needed to reconcile two peers. e.g. if a peer adds a new
interface, the discovery push may arrive from that new interface
which will result in a second peer object being created which will
need to be reconciled with the original peer object.

Lustre-change: https://review.whamcloud.com/40985
Lustre-commit: 9eb9474c41c823c70f34e6bb102a8861ca21a3d1

HPE-bug-id: LUS-9630
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I8397a24ba1ba0bba33846e7e97b8d60a8f26a1be
Reviewed-on: https://review.whamcloud.com/48508
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15538 lnet: DLC sets map_on_demand incorrectly
Chris Horn [Sat, 5 Feb 2022 23:15:30 +0000 (23:15 +0000)]
LU-15538 lnet: DLC sets map_on_demand incorrectly

When any NET or LND tunable is specified via CLI or yaml, then the
whole tunables struct gets memset to 0, or in the case of yaml config,
0 gets assigned to any tunable that isn't specified in the yaml. This
causes a problem for map_on_demand because 0 is a valid value for that
parameter, and ko2iblnd cannot know whether the user specified that 0
should be used or if DLC is specifying that the parameter was unset.

Rather than setting this parameter to 0 in the LND tunables struct,
have DLC set it to UINT_MAX to indicate that ko2iblnd should use the
value of the kernel module parameter.

Lustre-change: https://review.whamcloud.com/46492
Lustre-commit: 896f4a082b93453f5e7168f685faff4fba594ff3

Test-Parameters: trivial
HPE-bug-id: LUS-10740
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I303e64d4d402ba61b5ae3e3910873f192a4a2845
Reviewed-on: https://review.whamcloud.com/48491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-4093 tests: hot-pools don't recreate pools
Alex Zhuravlev [Wed, 21 Sep 2022 00:40:46 +0000 (17:40 -0700)]
EX-4093 tests: hot-pools don't recreate pools

the test can save some time skipping pools recreating in every
subtest.

before: 1371 seconds
after:  1058 seconds

Test-Parameters: trivial testlist=hot-pools

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I9304e29b6fc59dd68626b44844dc81500009a80f
Reviewed-on: https://review.whamcloud.com/48614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoEX-5824 test: hot-pools test_57: data copy failed: mirror failed
Alexandre Ioffe [Thu, 8 Sep 2022 08:37:31 +0000 (01:37 -0700)]
EX-5824 test: hot-pools test_57: data copy failed: mirror failed

Add debug prints in hot-pools test_57

Test-Parameters: trivial env=FAIL_ON_ERROR=false,ONLY=56-57 testlist=hot-pools

Change-Id: I863b580f5483c14c24c6f79ebdddbc782b65e945
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-on: https://review.whamcloud.com/48477
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
2 years agoLU-14992 tests: sanity/replay-vbr mkdir on MDT0
James Nunez [Mon, 13 Sep 2021 16:35:30 +0000 (10:35 -0600)]
LU-14992 tests: sanity/replay-vbr mkdir on MDT0

Replace mkdir with mkdir_on_mdt0() for sanity test 133a
and relay-vbr test 7a.  These tests expect the newly
created directory is on MDT0.

Lustre-change: https://review.whamcloud.com/44902/
Lustre-commit: TBD

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: env=SLOW=yes mdscount=2 mdtcount=4 testlist=replay-vbr
Change-Id: Icea2923a8d8d3a3aa0ddf0401f0a025480b2f6f0
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48606
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-13358 libcfs: add timeout to cfs_race() to fix race
Alex Zhuravlev [Tue, 30 Mar 2021 05:57:14 +0000 (08:57 +0300)]
LU-13358 libcfs: add timeout to cfs_race() to fix race

there is no guarantee for the branches in cfs_race() to be executed
in strict order, thus it's possible that the second branch (with
cfs_race_state=1) is executed before the first branch and then another
thread executing the first branch gets stuck.

this construction is used for testing only and as a
workaround it's enough to timeout.

Lustre-change: https://review.whamcloud.com/43161
Lustre-commit: 2d2d381f35ee004319a20f5d2d8e70d13480d6c7

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie1cc0accedb3e1a198d4b17d1ab00ce298c560f2
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48553
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>