Whamcloud - gitweb
fs/lustre-release.git
18 months agoLU-16138 kernel: preserve RHEL8.x server kABI for block integrity 08/48608/2
Jian Yu [Tue, 20 Sep 2022 18:19:12 +0000 (11:19 -0700)]
LU-16138 kernel: preserve RHEL8.x server kABI for block integrity

Currently there are two kernel patches supporting SCSI T10-PI feature
left in the RHEL8.x series:

- block-integrity-allow-optional-integrity-functions-rhel8.patch
- block-pass-bio-into-integrity_processing_fn-rhel8.patch

The changes in the patches modified "struct bio_integrity_payload"
and "struct blk_integrity_iter", which caused kABI breakage.

This patch fixes the patches to preserve kABI by using
RH-supplied compatibility macros.

Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.5 serverdistro=el8.5
Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.6 serverdistro=el8.6

Change-Id: If547e1cd4ae4ff1affd315bbfefaeeff4f1dea81
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48608
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-9680 obdclass: user netlink to collect devices information 18/31618/80
James Simmons [Sat, 17 Sep 2022 20:19:48 +0000 (16:19 -0400)]
LU-9680 obdclass: user netlink to collect devices information

Our utilities can report to users a device list with various bits
of data using the debugfs file 'devices'. This debugfs file is
only by default available to root which prevents regular users
from collecting information. Enable non-root users to collect
the same information for lctl dl using netlink. The advantage of
using netlink is that it also removes the 8K ioctl limit. Add the
ability to present this data in YAML format as well.

Change-Id: I5e6378765bd2f4c415cf29b2bc54adf0e54f308b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/31618
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16166 ptlrpc: lower the message level in no resend case 85/48585/2
Yang Sheng [Mon, 19 Sep 2022 05:46:27 +0000 (13:46 +0800)]
LU-16166 ptlrpc: lower the message level in no resend case

Don't report the wrong generation as a error message in
rq_no_resend case.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I534cadc916fcd1eb6840439b6507e646d0e5d974
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48585
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15943 tests: Modify timing of sanity-lnet 210 and 211 80/48580/4
Chris Horn [Wed, 14 Sep 2022 00:47:58 +0000 (19:47 -0500)]
LU-15943 tests: Modify timing of sanity-lnet 210 and 211

The portions of test_210 and test_211 that test the
max_recovery_ping_interval parameter are a little racy because the
window where we can get an accurate ping count is small. This is due
to the tests only being able to sleep for whole seconds vs the more
fine-grained time keeping done in the kernel.

Increase the max interval from 2 to 4 and adjust the expected
ping counts accordingly.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=210,ONLY_REPEAT=100
Test-Parameters: testlist=sanity-lnet env=ONLY=211,ONLY_REPEAT=100
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Idf8b2ff0d5745bdf4484e75f452bc4f06fbcf1a4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48580
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16161 kernel: kernel update RHEL8.6 [4.18.0-372.26.1.el8_6] 64/48564/2
Jian Yu [Thu, 15 Sep 2022 18:43:02 +0000 (11:43 -0700)]
LU-16161 kernel: kernel update RHEL8.6 [4.18.0-372.26.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.26.1.el8_6.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I45bf6dbff5061407e1109732b6d466d0f7a8376c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48564
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16144 nrs: implement force mode for nrs_tbf_req_get() 94/48494/5
Etienne AUJAMES [Fri, 9 Sep 2022 06:52:02 +0000 (08:52 +0200)]
LU-16144 nrs: implement force mode for nrs_tbf_req_get()

ptlrpc_service_purge_all() calls ptlrpc_server_request_get() with
"force=true" to purge all active requests before stopping an NRS
policy (when unregistering a service).

"force" mode should always return a request if a pending request is
present in the NRS policy.

nrs_tbf_req_get() does not implement such a mode and can return a
NULL pointer.
This can cause a crash when umounting a target if a TBF rule rate
threshold is reached:

BUG: unable to handle kernel NULL pointer dereference at
0000000000000114
IP: [<ffffffffc0d9e965>] ptlrpc_nrs_req_stop_nolock+0x5/0x150
.....
? ptlrpc_server_finish_active_request+0x2b/0x140 [ptlrpc]
ptlrpc_service_purge_all+0x137/0x920 [ptlrpc]
ptlrpc_unregister_service+0xe7/0x6f0 [ptlrpc]
ost_cleanup+0x52/0x1b0 [ost]
class_free_dev+0x21d/0x720 [obdclass]
class_export_put+0x1f0/0x2c0 [obdclass]
class_unlink_export+0x135/0x170 [obdclass]
class_decref+0x80/0x160 [obdclass]
class_detach+0x1b3/0x2e0 [obdclass]
class_process_config+0x1a38/0x2830 [obdclass]
? complete+0x4a/0x60
? list_del+0xd/0x30
? wait_for_completion+0x4e/0x140
class_manual_cleanup+0x1e0/0x710 [obdclass]
server_stop_servers+0xd5/0x160 [obdclass]
server_put_super+0x12d/0xd00 [obdclass]
generic_shutdown_super+0x6d/0x100

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: Ic4443700725d9308764fbf21cb7de6fa4ab41134
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48494
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16072 utils: snapshot support to foreign host 26/48226/8
Akash B [Tue, 24 May 2022 05:49:41 +0000 (01:49 -0400)]
LU-16072 utils: snapshot support to foreign host

Currently <foreign> host field in /etc/ldev.conf is unused/ignored,
due to this <lctl snapshot_*> commands do not work when <local>
host is not accessible or if any of the targets are failed over to
<foreign> host. This patch addresses those cases where
<lctl snapshot_{create, destroy, mount, umount, list, modify}>
commands work when the targets are present in <foreign> host.

HPE-bug-id: LUS-10648
Test-Parameters: fstype=zfs testlist=sanity-lsnapshot
Signed-off-by: Akash B <akash-b@hpe.com>
Change-Id: I706c5e43755386eab4facd42ff7a127aa5c9254c
Reviewed-on: https://es-gerrit.dev.cray.com/160702
Tested-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Tested-by: Siddarth Raj <siddarth.raj@hpe.com>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48226
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16059 build: Installation of dkms server builds 83/48083/7
Shaun Tancheff [Wed, 24 Aug 2022 14:22:58 +0000 (21:22 +0700)]
LU-16059 build: Installation of dkms server builds

The linux-zfs-dkms package is passing the wrong paths
for zfs [and spl] causing the dkms build to fail.

ZFS_VERSION is not parsed correctly from 'dkms status'.

The splver and zfsver check can match against the wrong
package(s).

lustre-zfs-dkms provides: kmod-lustre-osd-zfs, and
                          lustre-osd-zfs-mount
lustre-ldiskfs-dkms provides: kmod-lustre-osd-ldiskfs and
                              lustre-osd-ldiskfs-mount

In the case of multiple zfs versions installed, build lustre
osd against the highest version number.

HPE-bug-id: LUS-11113
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic154ca045427bf26cb7e6a44b8c467675e987aad
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48083
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16125 tests: make sanity-sec more robust with SSK 86/48386/4
Sebastien Buisson [Tue, 30 Aug 2022 09:22:34 +0000 (11:22 +0200)]
LU-16125 tests: make sanity-sec more robust with SSK

Encryption related tests in sanity-sec carry out unmount and mount of
clients in order to exercise code with and without the encryption key.
In case SSK is in use, we need to make sure flavors are properly
applied before carrying on.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I92e85dc6dcef43f70a7fe05db94cd18fe66a3a24
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48386
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15777 hsm: set changelog error for restore layout swap failure 21/47121/14
Nikitas Angelinas [Wed, 11 May 2022 22:54:08 +0000 (15:54 -0700)]
LU-15777 hsm: set changelog error for restore layout swap failure

Set the error code in the changelog record generated, if the layout swap
fails at the end of an HSM restore operation. Also, handle error code
overflow inside hsm_set_cl_error(), so that callers don't need to do
this themselves.

Suggested-by: Olaf Weber <olaf.weber@hpe.com>
Suggested-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I4ed2ebffa3bc1c6a0f87ea9f13734e344f77006f
HPE-bug-id: LUS-10863
Test-Parameters: testlist=sanity-hsm,sanity-pcc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47121
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15626 tests: Fix "error" reported by shellcheck for functions.sh 34/46834/2
Arshad Hussain [Wed, 16 Mar 2022 08:04:10 +0000 (13:34 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for functions.sh

This patch fixes "error" issues reported by shellcheck
for functions.sh. This patch also moves spaces to tabs.

Test-Parameters: trivial
Test-Parameters: testlist=sanity,sanityn
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iec24ca81b16994c3bfbdc38d8106576a315e0bbd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46834
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15619 osc: Remove oap_magic 13/46713/5
Patrick Farrell [Wed, 2 Mar 2022 00:14:03 +0000 (19:14 -0500)]
LU-15619 osc: Remove oap_magic

oap_magic exists only to debug init and allocation
failures, but is allocated for every page of memory, which
wastes a lot of memory for something we don't need
dedicated debug for.

Remove it.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I360e09676f7ba8c3e5296bdf75a6e7f75e91eadb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46713
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-14108 mount: prevent if --network and discovery 32/46632/6
Cyril Bordage [Fri, 7 Jan 2022 10:08:21 +0000 (11:08 +0100)]
LU-14108 mount: prevent if --network and discovery

The --network= option to mkfs.lustre allows restricting a target
(OST/MDT) to a given LNet network. This makes it register to the MGS
with the specified network only. However, dynamic discovery is unaware
of this restriction and this can create problems.
We prevent mounting with mkfs "network" option if discovery is enabled
by returning an EINVAL error.

Test-Parameters: trivial
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I4b6da7804162192054d7b29a28fbe4cb015e6570
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46632
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: Various test cleanups 15/45915/5
Amir Shehata [Wed, 22 Dec 2021 05:42:32 +0000 (21:42 -0800)]
LU-10973 lnet: Various test cleanups

Cleaning up some of the LUTF test failures

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I529d3f171357255d04991293a5df4c7b41622d07
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF UDSP test suite and routing test suite 77/39777/45
Serguei Smirnov [Mon, 31 Aug 2020 22:35:52 +0000 (18:35 -0400)]
LU-10973 lnet: LUTF UDSP test suite and routing test suite

Added the UDSP suite and routing suite to the LUTF test cases.

Updated some of the infrastructure scripts with methods needed
for the new test cases.

Test-Parameters: @lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ibd74cea48982ccafc3b1d5034a409fd2df9e7b1c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39777
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF Multi-Rail test suite 58/39458/54
Amir Shehata [Mon, 20 Jul 2020 21:04:32 +0000 (14:04 -0700)]
LU-10973 lnet: LUTF Multi-Rail test suite

Added a test suite which covers various Multi-Rail functionality.

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0480e59ebd97c943669194acbb1c80222e202a6e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF dynamic discovery test suite 95/39195/58
Amir Shehata [Sat, 27 Jun 2020 04:15:04 +0000 (21:15 -0700)]
LU-10973 lnet: LUTF dynamic discovery test suite

Add the dynamic discovery test suite to the LUTF test cases.

Updated some of the infrastructure scripts with methods needed
for the DD test cases

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0cfef4ae6f88b4deca12f1a3d5ef3291137a6c04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10360 tests: test dynamic NIDs feature 11/39911/42
Amir Shehata [Tue, 15 Sep 2020 01:18:47 +0000 (18:18 -0700)]
LU-10360 tests: test dynamic NIDs feature

Add five LUTF test cases to test the following:
1. Enabling/Disabling dynamic_nids module parameter.
2. Allow clients to continue using servers which have changed
   their IP address during a boot cycle.
3. Verify feature is disabled if dynamic_nids module parameter
   is not set.
4. Verify feature is disabled if the dynamic_nids module parameter
   is asymmetrically set.

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I481c2ae938d07398f6b40af2a1a1db039168ede7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39911
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF DLC test suite and sample test suite 08/40108/40
Serguei Smirnov [Wed, 30 Sep 2020 22:52:39 +0000 (18:52 -0400)]
LU-10973 lnet: LUTF DLC test suite and sample test suite

Add the DLC test suite and sample test suite to LUTF test cases.

Test-Parameters: @lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic7579023cfaf796fd40d6e12434137fb3ec5b0e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40108
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-10391 lnet: only use PUBLIC IP6 addresses for connections 71/48571/3
Mr NeilBrown [Fri, 16 Sep 2022 00:49:51 +0000 (10:49 +1000)]
LU-10391 lnet: only use PUBLIC IP6 addresses for connections

IPv6 can have temporary address.  These can be used for short-lives
outgoing connections to increase privacy.  They are not suitable for
long-term connections.

So request that only PUBLIC IPv6 addresses are used when making a
connection.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1414d9ea11cd5873438a4c088884cefd7d933c8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10391 socklnd: support IPv6 in ksocknal_ip2index() 70/48570/2
Mr NeilBrown [Thu, 15 Sep 2022 05:09:59 +0000 (15:09 +1000)]
LU-10391 socklnd: support IPv6 in ksocknal_ip2index()

ksocknal_ip2index() can now find the interface index for an IPv6
address.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idd6bee5c9db417b05f8208ab5ab309f4c8404d54
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48570
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-10391 lnet: add iface index to struct lnet_inetdev 69/48569/2
Mr NeilBrown [Thu, 15 Sep 2022 01:47:55 +0000 (11:47 +1000)]
LU-10391 lnet: add iface index to struct lnet_inetdev

When getting list of interfaces, get the index as well, as this can be
useful and avoid search the list of interfaces again to find it.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9b3b2516fd4ec1b83e2ec31e1318326ed22cb31b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48569
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-12511 utils: make kfilnd support a soft requirement 18/48518/5
James Simmons [Sat, 17 Sep 2022 15:45:12 +0000 (11:45 -0400)]
LU-12511 utils: make kfilnd support a soft requirement

The new kfilnd driver doesn't exist upstream and looks like it
will be missing upstream for sometime. Make building the code
for this new LND optional which is needed for the native Linux
Lustre client.

Test-Parameters: trivial
Change-Id: Ib17f78b12ffed95e4198d4524f5ca44aab01c010
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48518
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-10391 lnet: track pinginfo size in bytes, not nis. 27/44627/17
Mr NeilBrown [Sun, 14 Aug 2022 21:37:23 +0000 (17:37 -0400)]
LU-10391 lnet: track pinginfo size in bytes, not nis.

When we extend the pinginfo to be able to store large-address nids,
there could be nids of different sizes in it.  So using the number of
nis to track the size won't work.  So change to using the number of
bytes.  i.e.  the total size of the 'struct lnet_ping_info'.

This affects pb_nnis in the ping_buffer, and the global
ln_push_target_nnis.

LNET_PING_INFO_SIZE is removed as size won't depend on number of nids
any more.

When determining the number of bytes expected in a received ping_info,
use a new macro lnet_ping_info_size() which can extract information
as required from the ping_info.

Note that lnet_ping_target_create() now initializes pi_nis to 0.
Setting the initial size doesn't seem to be useful.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7727b784ed9a7510959d5ec41f8df3851adb78ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16135 lod: prohibit DoM pattern in plain layout 33/48433/3
Mikhail Pershin [Mon, 5 Sep 2022 07:41:37 +0000 (10:41 +0300)]
LU-16135 lod: prohibit DoM pattern in plain layout

DoM pattern can be set as default directory plain layout by
older LFS version. It misses DoM component sanity checks if
plain layout is used. Such layout is not allowed and causes
later crashed when file is created under that directory.

While LFS can prevent this but not in all Lustre versions,
so LOD should do the check as well

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic58fdda2ab3e63083128cb6cf949fcb43ccd2c02
Reviewed-on: https://review.whamcloud.com/48433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16160 osc: take ldlm lock when queue sync pages 57/48557/2
Bobi Jam [Thu, 15 Sep 2022 06:46:34 +0000 (14:46 +0800)]
LU-16160 osc: take ldlm lock when queue sync pages

osc_queue_sync_pages() add osc_extent to osc_object's IO extent
list without taking ldlm locks, and then it calls
osc_io_unplug_async() to queue the IO work for the client.

This patch make sync page queuing take ldlm lock in the
osc_extent.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idefa2981e62a2a6e10d8b8a7692c0337b61b9052
Reviewed-on: https://review.whamcloud.com/48557
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16123 checkpatch: Suppress false warning 75/48375/2
Arshad Hussain [Mon, 29 Aug 2022 10:51:45 +0000 (16:21 +0530)]
LU-16123 checkpatch: Suppress false warning

checkpatch throws a warning if it finds an "UPPERCASE"
on the left and side. According to the script/code it
is to avoid cases like "foo + BAR < baz".

Warnings example:
(style)  Comparisons should place the constant on \
the right side of the test

However for our case which throws a warning as false
positive.

"#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
...
"#endif

This patch suppresses the warning thrown by above
code only. This is not a generic "left hand" upper-case
warning suppressor which can be a genuine error. This
only handles the case where the left side is
LUSTRE_VERSION_CODE upper-case macro.

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic8d8fccae035ba6e2ea28099bea6f163ceb0da0a
Reviewed-on: https://review.whamcloud.com/48375
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16154 obdclass: free inst_name correctly 42/48542/4
Emoly Liu [Thu, 15 Sep 2022 01:42:47 +0000 (09:42 +0800)]
LU-16154 obdclass: free inst_name correctly

In functon class_config_llog_handler(), inst_name should be freed
correctly before break.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I6adc0ed62c3c637237834b799f25666d0e7e1ecb
Reviewed-on: https://review.whamcloud.com/48542
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16153 tests: add version check to conf-sanity.sh test_133 41/48541/2
Emoly Liu [Wed, 14 Sep 2022 02:38:03 +0000 (10:38 +0800)]
LU-16153 tests: add version check to conf-sanity.sh test_133

conf-sanity.sh test_133 from the patch at
https://review.whamcloud.com/38136 has been landed since 2.15.51.
To avoid any interop failure, a version check is added there.

Test-Parameters: trivial
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ic5c142faa6f61fe83ce86e67a7cee8d8b183cdaf
Reviewed-on: https://review.whamcloud.com/48541
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14992 tests: sanity/replay-vbr mkdir on MDT0 02/44902/9
James Nunez [Mon, 13 Sep 2021 16:35:30 +0000 (10:35 -0600)]
LU-14992 tests: sanity/replay-vbr mkdir on MDT0

Replace mkdir with mkdir_on_mdt0() for sanity test 133a
and relay-vbr test 7a.  These tests expect the newly
created directory is on MDT0.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: env=SLOW=yes mdscount=2 mdtcount=4 testlist=replay-vbr
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Icea2923a8d8d3a3aa0ddf0401f0a025480b2f6f0
Reviewed-on: https://review.whamcloud.com/44902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Kevin Zhao <kevin.zhao@linaro.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC 80/48080/2
Li Dongyang [Fri, 29 Jul 2022 06:35:41 +0000 (16:35 +1000)]
LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC

ladvise RPC doesn't have OBD_MD_FLGROUP set, when RPC
reaches server, tgt_validate_obdo() will corrupt the FID
if it's seq is in FID_SEQ_NORMAL range.

Do not mess with seq in obdo_to_ioobj() and tgt_validate_obdo(),
since 2.0 all RPCs should have OBD_MD_FLGROUP set.

Add OBD_MD_FLGROUP for ladvise RPC to fix new client talking
to old servers.

Change-Id: I373b7f32458b18e29d9bb716a912fe4a54eccac5
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/48080
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs() 39/47839/14
Lei Feng [Thu, 30 Jun 2022 02:46:31 +0000 (10:46 +0800)]
LU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()

There is a race condition that: on server side, one thread sent
reply message and is deleting the reply message, another is
searching for existing request and print some debug information
in _debug_req() if there is a duplicated request. They both operate on
req->rq_repmsg but it is not protected in ptlrpc_req_drop_rs().
So we protected it with req->rq_early_free_lock.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ied55427ee15c3ef84bdd2d579844eba398dbf010
Reviewed-on: https://review.whamcloud.com/47839
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoNew tag 2.15.52 2.15.52 v2_15_52
Oleg Drokin [Sat, 17 Sep 2022 06:27:08 +0000 (02:27 -0400)]
New tag 2.15.52

Change-Id: I7425fd5ea8f382a10ea2574933257fcd41407fa2
Signed-off-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16145 lnet: Honor peer timeout of zero 89/48489/4
Chris Horn [Fri, 2 Sep 2022 16:47:02 +0000 (11:47 -0500)]
LU-16145 lnet: Honor peer timeout of zero

Zero is a valid value for the peer_timeout parameter (it is supposed
to disable the LNet Peer Health feature used on routers), but DLC
treats zero as uninitialized and assigns the default peer timeout
instead.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11233
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I66f45ddf282757f46c0169ae0e725e56234d3d89
Reviewed-on: https://review.whamcloud.com/48489
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16131 build: Do not depend on libmount during --enable-dist 07/48407/2
Shaun Tancheff [Thu, 1 Sep 2022 14:46:16 +0000 (21:46 +0700)]
LU-16131 build: Do not depend on libmount during --enable-dist

Defer the libmount requirement when using --enable-dist to
generate the lustre-src.rpm.

This allows mock and/or yum build-deps to resolve resolve
dependencies and pickup the libmount requirement without changing
the existing minimal build.

Test-Parameters: trivial
HPE-bug-id: LUS-11091
Fixes: f21b944127 ("LU-15940 build: add a required dependency for libmount")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I20a7a097f9b651b6ea5519f79efda6c96b6f2199
Reviewed-on: https://review.whamcloud.com/48407
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16089 kernel: kernel update RHEL 7.9 [3.10.0-1160.76.1.el7] 02/48202/3
Jian Yu [Fri, 12 Aug 2022 01:29:05 +0000 (18:29 -0700)]
LU-16089 kernel: kernel update RHEL 7.9 [3.10.0-1160.76.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.76.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I97d087a5d5bb27996a5c0caf382c011928c651b4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48202
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16029 utils: add options to lr_reader to parse raw files 88/47988/8
Etienne AUJAMES [Tue, 19 Jul 2022 20:21:52 +0000 (22:21 +0200)]
LU-16029 utils: add options to lr_reader to parse raw files

Add the following usages to lr_reader for post-mortem debuging:

debugfs -c -R "dump reply_data /tmp/reply_data" /dev/mapper/mds1
debugfs -c -R "dump last_rcvd /tmp/last_rcvd" /dev/mapper/mds1

lr_reader -cr -C /tmp/last_rcvd -R /tmp/reply_data
....

This patch attempts to re-refactoring lr_reader code.

It enable to use longer device name (by removing the limitation on
the 128 bytes buffer of debugfs command).

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I6a5f945134d4235ac467ba2274eb05f71b468cd8
Reviewed-on: https://review.whamcloud.com/47988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16106 lnet: allow direct messages regardless of peer NI status 55/48355/5
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status

If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.

This change allows non-routed traffic to peer NIs with "down"
status.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/48355
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16050 build: replace ofed_info with dpkg/rpm 47/48047/4
Jian Yu [Wed, 27 Jul 2022 22:45:49 +0000 (15:45 -0700)]
LU-16050 build: replace ofed_info with dpkg/rpm

After installing MLNX_OFED by running mlnxofedinstall command,
mlnx-ofed-kernel-modules package is not listed by ofed_info,
which causes Lustre configure fail as follows:

checking whether to use Compat RDMA... /usr/bin/ofed_info
dpkg-query: error: --listfiles needs at least one package name argument

This patch fixes the above issue by replacing ofed_info with
"dpkg -l" and "rpm -qa" commands to find OFED package.

Test-Parameters: trivial
Fixes: ec03c9628cae ("LU-15417 build: find the new path for MOFED 5.5")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Ia3c2d6bf10e147ca2761221741eff6f93008556c
Reviewed-on: https://review.whamcloud.com/48047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Gaurang Tapase <gtapase@ddn.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16002 ptlrpc: adds configurable ping interval 82/47982/3
Alexander Boyko [Sun, 10 Jul 2022 14:25:21 +0000 (10:25 -0400)]
LU-16002 ptlrpc: adds configurable ping interval

The patch adds ability to change ping interval and eviction
mutliplier. A default values stay as before.
Example
lctl set_param ping_interval=10
lctl set_param evict_multiplier=5

HPE-bug-id: LUS-11054
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I012dc7ba28ce9ff3edf0f145a403679bfaebbf55
Reviewed-on: https://review.whamcloud.com/47982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14719 osp: add inode watermark 28/47128/15
Lai Siyao [Fri, 1 Apr 2022 19:58:08 +0000 (15:58 -0400)]
LU-14719 osp: add inode watermark

* move block watermark from debugfs to sysfs.
* add inode watermark for OSP.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I7c768fa2ebfb4b8c2f75255f9e9c061d4c15cf66
Reviewed-on: https://review.whamcloud.com/47128
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16082 ldiskfs: old-style EA inode fix for el8.5/el8.6 96/48496/4
Andreas Dilger [Fri, 9 Sep 2022 08:17:09 +0000 (08:17 +0000)]
LU-16082 ldiskfs: old-style EA inode fix for el8.5/el8.6

Add the rhel8/ext4-old_ea_inodes_handling_fix.patch to the ldiskfs
series for el8.5 and el8.6 kernels.

Test-Parameters: trivial testlist=sanity serverdistro=el8.6
Fixes: 76c3fa96dc30 ("LU-16082 ldiskfs: old-style EA inode handling fix")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ifb66a0b7d78e5153d7897bee45fbf1d0e58fbc5c
Reviewed-on: https://review.whamcloud.com/48496
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14642 flr: allow layout version update from client/MDS 43/45443/21
Bobi Jam [Mon, 25 Oct 2021 08:45:29 +0000 (16:45 +0800)]
LU-14642 flr: allow layout version update from client/MDS

Client write request always carries its layout version so
that OFD can reject the request if the carried layout version
is a stale one.

This patch makes OFD allow layout version change request from
client as well as MDS. And during resync write, all OST objects
will get layout version updated.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I655044f69a4509a2b0cfe99f86de2ce4ee846979
Reviewed-on: https://review.whamcloud.com/45443
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16140 lnet: revert "LU-16011 lnet: use preallocate bulk for server" 57/48457/3
Andreas Dilger [Wed, 7 Sep 2022 19:13:11 +0000 (19:13 +0000)]
LU-16140 lnet: revert "LU-16011 lnet: use preallocate bulk for server"

This reverts commit 2447564e120cf622627a5ab81051657f6ce5ece2 due to OOM
on aarch64 clients.

Change-Id: Icfa7d520c36d497566f3e2d154a2065a9aab8da2
Test-Parameters: trivial testlist=lnet-selftest clientarch=aarch64
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48457
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
19 months agoLU-16073 utils: double snapshot_mount fix 25/48225/5
Akash B [Thu, 11 Aug 2022 07:51:57 +0000 (03:51 -0400)]
LU-16073 utils: double snapshot_mount fix

lsnapshot_mount on already mounted snapshot fs
results in umount of snapshot fs and the following
error is seen:

-> lsnapshot_mount -F testfs -n snap_test_fo
Can't mount the snapshot snap_test_fo: No such process

Add additional test to the existing sanity-lsnapshot.sh
(test_1b) to reproduce the above issue.

This is handled by returning appropriate error
code and return -EALREADY if snapshot fs is
already mounted.

HPE-bug-id: LUS-10650
Test-Parameters: fstype=zfs testlist=sanity-lsnapshot
Signed-off-by: Akash B <akash-b@hpe.com>
Change-Id: Ia13c3e1cf929ec7c53463a2ea74eb98fb46f8358
Reviewed-on: https://es-gerrit.dev.cray.com/160589
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/48225
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16062 ldlm: improve bl_timeout for prolong 94/48094/2
Vitaly Fertman [Fri, 28 Aug 2020 19:17:58 +0000 (22:17 +0300)]
LU-16062 ldlm: improve bl_timeout for prolong

If there is a client's RPC in hand, we can do a better job for
calculating the lock callback timeout as RPC has the info what
client thinks about this RPC timeout. Let's use it.

HPE-bug-id: LUS-8866, LUS-11074
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ibd67d37c1073d0d3cb2e08b532c801af0de116fe
Reviewed-on: https://es-gerrit.dev.cray.com/157782
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Tested-by: Jenkins Build User <nssreleng@cray.com>
Reviewed-on: https://review.whamcloud.com/48094
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15791 tests: Get health before removing drop rules 98/47998/3
Chris Horn [Wed, 20 Jul 2022 15:44:39 +0000 (09:44 -0600)]
LU-15791 tests: Get health before removing drop rules

lnet_health_post() can race with recovery pings, so we should
wait to delete the drop rules until after we've gathered the
health and resend values.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 79ab053562 ("LU-13569 lnet: Deprecate lnet_recovery_interval")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia7595e015809f796cafcc40382d98ab66a708a49
Reviewed-on: https://review.whamcloud.com/47998
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15833 llapi: don't use realpath in llapi_search_fsname() 58/47258/10
Etienne AUJAMES [Mon, 9 May 2022 13:44:29 +0000 (15:44 +0200)]
LU-15833 llapi: don't use realpath in llapi_search_fsname()

This patch use st_dev value to dertermine the fsname in
llapi_search_fsname().
The main purpose of this is to limit the number of lstat()
(realpath()) in this function.

get_root_path() is modified to search a mountpoint by dev.
And the last results of get_root_path() is cached to avoid reading
/proc/mount for each call.

A new api function llapi_search_rootpath_by_dev() is added to get
the path of Lustre mountpoint using the specified device value.

**Testing:**

*Environement:*
VMs: 1 client, 1 MDS (2MDT), 1 OSS (2 OST)
Lustre tree: test{001..100}/test{001..100}/test{01..10}/file{01..05}
(500000 files + 110100 folders)
OS: Centos 7 (no statx)
Lustre: 2.15.50_15_g1116739

*Tests*
cd <rootfs>
strace lfs getstripe -r .
echo 3 > /proc/sys/vm/drop_caches
/usr/bin/time lfs getstripe -r . (2 iterations)

*Results*
times (s):

                 ______________________________
                | user | system | real | real% |
 _______________|______|________|______|_______|
|without patch: | 6.18 | 57.3   | 427  | 0%    |
|_______________|______|________|______|_______|
|with patch:    | 2.88 | 47.3   | 404  |-5.45% |
|_______________|______|________|______|_______|

strace (only significant changes are displayed):
(*stat = lstat + stat + fstat)
                 _____________________________________________
                | *stat  | mmap   | open   | read   | all     |
 _______________|________|________|________|________|_________|
|without patch: | 760545 | 110142 | 330379 | 330325 | 4742658 |
|_______________|________|________|________|________|_________|
|with patch:    | 440484 | 0      | 220277 | 19     | 3541739 |
|_______________|________|________|________|________|_________|

-25.32% syscalls after patching.

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I3812d922d5b1d194d52132cba95d11820424c5d7
Reviewed-on: https://review.whamcloud.com/47258
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15911 enc: null encrypted names is embedded llcrypt only 20/47520/6
Sebastien Buisson [Fri, 3 Jun 2022 09:16:35 +0000 (11:16 +0200)]
LU-15911 enc: null encrypted names is embedded llcrypt only

enable_filename_encryption tunable only makes sense when Lustre client
is built against embedded llcrypt. When built against in-kernel
fscrypt, this tunable is silently ignored, as fscrypt always carries
out file name encryption.

So have the enable_filename_encryption tunable only when Lustre client
is built against embedded llcrypt. Also fix sanity-sec test_54 so that
it works for in-kernel fscrypt.

Fixes: e68d496ada ("LU-15858 sec: reinstate null encryption for file names")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibe52feb670a00c9f421907ecd438bcccc62856f0
Reviewed-on: https://review.whamcloud.com/47520
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15732 test: don't set RSYNC_SSH=rsh 32/47032/2
John L. Hammond [Mon, 11 Apr 2022 15:11:55 +0000 (10:11 -0500)]
LU-15732 test: don't set RSYNC_SSH=rsh

Let rsync use ssh (the default) in test-framework.sh.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I31fafc72b476070f0a16c1578bc014cc68e21424
Reviewed-on: https://review.whamcloud.com/47032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Casper <jcasper@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15619 osc: Remove submit time 12/46712/5
Patrick Farrell [Wed, 2 Mar 2022 00:08:02 +0000 (19:08 -0500)]
LU-15619 osc: Remove submit time

The osc page submit time is an unused bit of debugging
information, but it's allocated for every page.  Let's
just remove it to save memory.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I160d38039332cb17e07735b60ce7979626ed43dc
Reviewed-on: https://review.whamcloud.com/46712
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10994 osc: remove oap_cli 03/47403/5
John L. Hammond [Thu, 19 May 2022 18:46:26 +0000 (13:46 -0500)]
LU-10994 osc: remove oap_cli

Remove the redundant oap_cli member from struct osc_async_page.

...:(cl_page.c:216:__cl_page_alloc()) slab-alloced 'cl_page': 256 at 000000009ab84b37.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Idd088f0906a10773568495933592ac5e755dc047
Reviewed-on: https://review.whamcloud.com/47403
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10994 clio: remove cpl_obj 02/47402/5
John L. Hammond [Thu, 19 May 2022 18:29:58 +0000 (13:29 -0500)]
LU-10994 clio: remove cpl_obj

Remove cpl_obj from struct cl_page_slice. This member is only used in
the osc layer and struct osc_page already contains a pointer to the
osc_object.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I6451aa50ff0e8db67f1c6f4f7edbde4fa8d36c5b
Reviewed-on: https://review.whamcloud.com/47402
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10994 clio: remove unused convenience functions 01/47401/5
John L. Hammond [Thu, 19 May 2022 18:06:30 +0000 (13:06 -0500)]
LU-10994 clio: remove unused convenience functions

Remove the unused convenience functions cl_page_top(), cl_page_at(),
cl_page_at_trusted(), and cl2vm_page().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I9c994d8f4c81bc93383a9eb46def514685a27690
Reviewed-on: https://review.whamcloud.com/47401
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10994 clio: remove struct vvp_page 00/47400/5
John L. Hammond [Mon, 11 Jul 2022 14:04:12 +0000 (10:04 -0400)]
LU-10994 clio: remove struct vvp_page

Remove struct vvp_page and use struct cl_page_slice in its place. Use
cp_vmpage in place of vpg_page and cl_page_index() in place of
vvp_index().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I2cd408f08e6ff9f7686b591c02ea95e31ad2b2ae
Reviewed-on: https://review.whamcloud.com/47400
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10994 clio: remove cpo_prep and cpo_make_ready 99/47399/6
John L. Hammond [Mon, 22 Aug 2022 15:56:04 +0000 (11:56 -0400)]
LU-10994 clio: remove cpo_prep and cpo_make_ready

Remove the cpo_prep and cpo_make_ready methods from struct
cl_page_operations. These methods were only implemented by the vvp
layer and so they can be easily inlined into cl_page_prep() and
cl_page_make_ready().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I177fd8d3c3832bcc8f06ed98cdf9d30f18d49e88
Reviewed-on: https://review.whamcloud.com/47399
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10994 clio: remove vvp_page_print() 98/47398/6
John L. Hammond [Thu, 19 May 2022 16:07:01 +0000 (11:07 -0500)]
LU-10994 clio: remove vvp_page_print()

Remove vvp_page_print() by placing equivalent code in cl_page_print().

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I815c4f63dc6fe57eec0987f209a2f34d3ff58146
Reviewed-on: https://review.whamcloud.com/47398
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15595 lnet: Always use ping reply to set route lr_alive 24/46624/11
Chris Horn [Wed, 27 Oct 2021 20:10:17 +0000 (20:10 +0000)]
LU-15595 lnet: Always use ping reply to set route lr_alive

We currently process discovery ping replies in different ways
depending on whether the gateway has discovery enabled or disabled
(or the local peer doing the processing has discovery enabled or
disabled).

When DD is disabled we process the ping reply to set the lr_alive
field of lnet_route because the peer objects for non-MR routers do
not contain all the information needed to calculate the route
aliveness when a message is being sent.

When DD is enabled then we don't do any special processing of the
ping reply. We simply let discovery update the NI status for the
GW's peer NIs and then we calculate the route aliveness on every
send.

We issue discovery pings to routers every alive_router_check_interval
seconds (default 60), but we calculate route aliveness on every send
to a remote network (1000s of times per seconds). Thus, it is better
to slightly duplicate the effort expended when we receive a discovery
reply so that we can avoid calculating route aliveness on every send.

Since both lr_alive and hop type are being set on each ping reply, for
both DD enabled and disabled cases, we can remove the code for
updating lr_alive and hop type from lnet_router_discovery_complete().

If discover encounters a fatal error, we still set the status of each
peer NI, as well as all routes, to down in
lnet_router_discovery_complete().

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If4838c269a89885ba3763f62847e294804edf62e
Reviewed-on: https://review.whamcloud.com/46624
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15524 mdd: trigger changelog GC by free space 67/46467/12
Mikhail Pershin [Mon, 7 Feb 2022 10:12:29 +0000 (13:12 +0300)]
LU-15524 mdd: trigger changelog GC by free space

if amount of space consumed by changelog become comparable
with system free space then start emergency GC for changelog
by purging the oldest user

Such behavior is enabled by default and can be disabled via
mdd_changelog_free_space_gc parameter

Test 160t is added to sanity.sh

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia63cc71e708b0f10cdf54f45f0809c0e86950101
Reviewed-on: https://review.whamcloud.com/46467
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16052 llog: handle -EBADR for catalog processing 70/48070/3
Mikhail Pershin [Fri, 29 Jul 2022 08:24:15 +0000 (11:24 +0300)]
LU-16052 llog: handle -EBADR for catalog processing

Llog catalog processing might retry to get the last llog block
to check for new records if any. That might return -EBADR code
which should be considered as valid. Previously -EIO was
returned in all cases.

Run conf-sanity test_106 several times as specific test

Test-Parameters: testlist=conf-sanity env=ONLY=106,SLOW=yes,ONLY_REPEAT=10
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I30e04ba2c91c8bdce72c95675a1209639e9f0570
Reviewed-on: https://review.whamcloud.com/48070
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15259 tests: use existing usernames for setfacl 27/45627/34
Andreas Dilger [Wed, 31 Aug 2022 07:51:41 +0000 (00:51 -0700)]
LU-15259 tests: use existing usernames for setfacl

In SLES15.2 and Ubutntu 20 the "bin" and "daemon" users are not
defined in /etc/passwd, causing setfacl to print a cryptic error:

  setfacl -m u:bin:rw f -- failed
  ~     ? setfacl: Option -m: Invalid argument near character 3

Replace "bin" and "daemon" in ACL tests so they are run with user
and group names that exist on all distros currently being tested.
They can also be specified via ACLUSR1/ACLUSR2 in the test config.

The "permission_xattr" test also needs "nobody" user and group.

Also, the "getfacl" command prints users and groups in numerical
order, so the ACL tests will fail if "daemon" < "bin", or if either
group is higher than the "users" group.  Fix them as needed.

Test-Parameters: trivial testlist=sanity-quota,sanity-sec,pjdfstest
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el7.9 serverdistro=el7.9
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=el8.6
Test-Parameters: testlist=sanity env=ONLY=103-154,SANITY_EXCEPT=130,HONOR_EXCEPT=y clientdistro=el9.0
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=sles15sp3
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=sles15sp4
Test-Parameters: testlist=sanity env=ONLY=103-154 clientdistro=ubuntu2004

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7003e95577ab3a9314e8d4d29bb6b1784b9f8ae7
Reviewed-on: https://review.whamcloud.com/45627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14441 mdc: check/grab import before access 81/41681/19
Alex Zhuravlev [Mon, 13 Dec 2021 08:27:42 +0000 (11:27 +0300)]
LU-14441 mdc: check/grab import before access

to ensure the import doesn't disappear while being accessed
via procfs.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I005c96b349e55646996fd0d265ab4dd1e2b9a1fa
Reviewed-on: https://review.whamcloud.com/41681
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15829 llite: don't use a kms if it invalid. 95/47395/6
Alexey Lyashkov [Thu, 19 May 2022 17:35:18 +0000 (20:35 +0300)]
LU-15829 llite: don't use a kms if it invalid.

Lockless DIO don't update a KMS as other IO type does,
it caused a situation when next read don't known a real file size
to be read. Lets avoid using an invalid KMS.

Fixes: 6bce5367 (LU-4198 clio: turn on lockless for some kind of IO)
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ie71d3f3cc24fc16c03ed07f9f5a3a17c7fdfa684
Reviewed-on: https://review.whamcloud.com/47395
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: lprocfs_exp_setup() to take struct lnet_nid 42/44642/5
Mr NeilBrown [Thu, 8 Jul 2021 01:32:48 +0000 (11:32 +1000)]
LU-10391 ptlrpc: lprocfs_exp_setup() to take struct lnet_nid

lprocfs_exp_setup() now takes 'struct lnet_nid *' as peer_nid.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If779893d8b1c7b650d39182c121c1f611d058f0d
Reviewed-on: https://review.whamcloud.com/44642
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: pass lnet_nid for self to ptl_send_buf() 41/44641/5
Mr NeilBrown [Thu, 8 Jul 2021 00:53:30 +0000 (10:53 +1000)]
LU-10391 ptlrpc: pass lnet_nid for self to ptl_send_buf()

The 'self' arg to ptl_send_buf() is now a pointer to a
'struct lnet_nid', and can be NULL meaning "ANY NID".

LNetPut() already accepts NULL as the self pointer.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I859dfa10e2f5e50c029c6926fe25ac036fb4f494
Reviewed-on: https://review.whamcloud.com/44641
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change bd_sender in ptlrpc_bulk_frag_ops 40/44640/5
Mr NeilBrown [Tue, 18 Jan 2022 18:12:50 +0000 (13:12 -0500)]
LU-10391 ptlrpc: change bd_sender in ptlrpc_bulk_frag_ops

bd_sender in struct ptlrpc_bulk_frag_ops is now 'struct lnet_nid'.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I43a6600dcc814a6a46b3a793641545123efaa6ab
Reviewed-on: https://review.whamcloud.com/44640
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change rq_source to struct lnet_nid 39/44639/5
Mr NeilBrown [Sat, 20 Aug 2022 17:30:25 +0000 (13:30 -0400)]
LU-10391 ptlrpc: change rq_source to struct lnet_nid

rq_source in struct ptlrpc_request can now store large NIDs.
ptl_send_buf() now takes a struct lnet_processid for the peer.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I2fe7da2332955c69f6252d44fb3ae28d2ef4e517
Reviewed-on: https://review.whamcloud.com/44639
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change rq_peer to struct lnet_nid 38/44638/4
Mr NeilBrown [Thu, 4 Aug 2022 01:43:26 +0000 (21:43 -0400)]
LU-10391 ptlrpc: change rq_peer to struct lnet_nid

rq_peer in struct ptlrpc_request can now store large NIDs.
ptlrpc_connection_get() and others now take a
struct lnet_processid

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I3bb419720434714301946d278413ce6090aa2cdd
Reviewed-on: https://review.whamcloud.com/44638
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: pass net num to ptlrpc_uuid_to_connection 37/44637/4
Mr NeilBrown [Thu, 8 Jul 2021 00:34:36 +0000 (10:34 +1000)]
LU-10391 ptlrpc: pass net num to ptlrpc_uuid_to_connection

Rather than passing a nid to indicate which net to choose,
pass just the net number.  This will make it easier to convert to
'struct lnet_nid'.

Also change ptlrpc_uuid_to_peer() to take the refnet as an explicit
argument, rather than embedding in in the peer pid.

This makes the refnet test more obvious, and removes the (strange)
need to test the address part against zero.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I0650760a59342f5ac245cc14011452e436ef8e4c
Reviewed-on: https://review.whamcloud.com/44637
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-10391 ptlrpc: change rq_self to struct lnet_nid 36/44636/4
Mr NeilBrown [Wed, 7 Jul 2021 05:55:06 +0000 (15:55 +1000)]
LU-10391 ptlrpc: change rq_self to struct lnet_nid

rq_self in struct ptlrpc_request can now store largs NIDs.
ptlrpc_connection_get() is also changed to received a
'struct lnet_nid'.

Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If2ea7770e967e2f044f2b2300950b612463e130c
Reviewed-on: https://review.whamcloud.com/44636
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-8367 tests: cleanup_orphans hang reproducer 39/46939/7
Alexander Boyko [Thu, 12 May 2022 13:49:34 +0000 (09:49 -0400)]
LU-8367 tests: cleanup_orphans hang reproducer

The patch adds recovery-small 144 test to reproduce hang at
osp_precreate_cleanup_orphans().

PID: 49938  TASK: ffff98c63a248000  CPU: 30  COMMAND: "osp-pre-3-1"
__schedule at ffffffff8e54e1d4
schedule at ffffffff8e54e648
osp_precreate_cleanup_orphans at ffffffffc17d00e9 [osp]
osp_precreate_thread at ffffffffc17d18da [osp]

Test-Parameters: trivial testlist=recovery-small env=ONLY=144b
HPE-bug-id: LUS-10793
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I463c75e63043c71ed0de0c6d08294098099c67e5
Reviewed-on: https://review.whamcloud.com/46939
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16075 kernel: kernel update RHEL8.6 [4.18.0-372.19.1.el8_6] 16/48116/5
Jian Yu [Tue, 23 Aug 2022 01:37:06 +0000 (18:37 -0700)]
LU-16075 kernel: kernel update RHEL8.6 [4.18.0-372.19.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.19.1.el8_6.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I8e0fbdab54d36512c4c4cbdbc97c580994ebcbd3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48116
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16090 build: Module.symvers lookup by flavor on SUSE 95/48195/2
Shaun Tancheff [Thu, 11 Aug 2022 11:48:40 +0000 (18:48 +0700)]
LU-16090 build: Module.symvers lookup by flavor on SUSE

When multiple kernel flavors are found we need to select only
the Module.symvers for the flavor that is being built.

HPE-bug-id: LUS-11149
Test-Parameters: trivial
Fixes: 1f4aaefe1aae ("LU-15962 build: add in-kernel Module.symvers to symbol path")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1c9af91108534d3a67f816077756fded4cd0b653
Reviewed-on: https://review.whamcloud.com/48195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16085 tests: fix sanityn test_106c 35/48435/2
Sebastien Buisson [Tue, 6 Sep 2022 06:57:04 +0000 (08:57 +0200)]
LU-16085 tests: fix sanityn test_106c

Fix sanityn test_106c after modification introduced when fixing
stat attributes_mask.

Test-Parameters: trivial testlist=sanityn env=ONLY=106c
Fixes: 0e48653c27 ("LU-16085 llite: fix stat attributes_mask")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I370813b9b1c22450577c390964a0cc410735b989
Reviewed-on: https://review.whamcloud.com/48435
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16100 tests: fix sanity/51d divide-by-zero 93/48393/2
Andreas Dilger [Tue, 30 Aug 2022 19:14:16 +0000 (13:14 -0600)]
LU-16100 tests: fix sanity/51d divide-by-zero

Fix dirstripe count when testing on non-DNE configs.

Test-Parameters: trivial
Fixes: cf35c54224b3 ("LU-14745 tests: ensure sanity/51d has enough objects")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I787df4cfda9e62673e5f89d2b899154f636777fe
Reviewed-on: https://review.whamcloud.com/48393
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Feng, Lei <flei@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-9859 libcfs: remove Lustre specific bitmap handling 22/48222/3
James Simmons [Tue, 16 Aug 2022 13:46:36 +0000 (09:46 -0400)]
LU-9859 libcfs: remove Lustre specific bitmap handling

Only the NRS TBF handling uses the Lustre specific bitmap
handling. Convert to the Linux bitmap API and remove the
Lustre specific bitmap handling.

Test-Parameters: trivial testlist=sanityn env=ONLY=77
Change-Id: I58dcf869778d6cf6349c16e73d75e53735ffb97d
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/48222
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16085 llite: fix stat attributes_mask 08/48208/3
Sebastien Buisson [Fri, 12 Aug 2022 07:59:02 +0000 (09:59 +0200)]
LU-16085 llite: fix stat attributes_mask

Fix stat attributes_mask to return STATX_ATTR_ENCRYPTED whenever it is
possible. Also fix sanityn test_106c to expect at least the 0x30 flag
for attributes_mask.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Icd16beff058c42d77e9b04ad1a287ec2ac04dfed
Reviewed-on: https://review.whamcloud.com/48208
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16093 kernel: kernel update SLES12 SP5 [4.12.14-122.130.1] 04/48204/2
Jian Yu [Fri, 12 Aug 2022 01:41:35 +0000 (18:41 -0700)]
LU-16093 kernel: kernel update SLES12 SP5 [4.12.14-122.130.1]

Update SLES12 SP5 kernel to 4.12.14-122.130.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: Ib2180a056889d481a7b55c41cbcd98c8e0e272d8
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48204
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16084 tests: fix lustre-patched filefrag check 88/48188/3
Andreas Dilger [Wed, 10 Aug 2022 18:27:56 +0000 (12:27 -0600)]
LU-16084 tests: fix lustre-patched filefrag check

Fix sanity test_130b thru test_130g to check for "filefrag -l"
instead of "filefrag -e", since the "-e" option has been in
upstream e2fsprogs since commit v1.42.6-50-g2508eaa7.  The "-l"
option (logical extent ordering) is really what is needed to
handle Lustre-striped files anyway.

While there, fix the code style in these subtests:
- use "local" and lower-case names for local variables
- use $(...) for subshells
- use (( ... )) for numeric comparisons
- use preferred "check || action" style checks
- use "skip_env" for environment configuration checks (e2fsprogs)
- use "skip" for test-related checks that can't be "fixed"
- use pre-defined $ost1_FSTYPE for checking OST filesystem type

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8eb7f17a9532796ab0274247194dd52cbc8a141c
Reviewed-on: https://review.whamcloud.com/48188
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15994 tests: add testing for io_uring via fio 67/48167/3
Qian Yingjin [Tue, 9 Aug 2022 07:56:23 +0000 (03:56 -0400)]
LU-15994 tests: add testing for io_uring via fio

This patch adds test case for io_uring I/O engine via fio.

Test-Parameters: trivial testlist=sanity
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I0f2e371f91c02dc76644f42e5d1055ec200597c6
Reviewed-on: https://review.whamcloud.com/48167
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15548 tests: skip conf-sanity/131 for older servers 51/48151/3
Andreas Dilger [Fri, 5 Aug 2022 20:19:41 +0000 (14:19 -0600)]
LU-15548 tests: skip conf-sanity/131 for older servers

Skip conf-sanity.sh test_131 when running against older servers that
do not support the trusted.projid xattr.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=131
Test-Parameters: testlist=conf-sanity env=ONLY=131 serverversion=2.14.0
Fixes: e4d07f2c30 ("LU-12056 ldiskfs: add trusted.projid virtual xattr")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If1858502ab50ffd10e494eab793e3bc0f883fe9e
Reviewed-on: https://review.whamcloud.com/48151
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15873 obd: skip checking read-only fs health 95/48095/5
Bobi Jam [Tue, 3 Nov 2020 09:04:01 +0000 (17:04 +0800)]
LU-15873 obd: skip checking read-only fs health

Health check upon read-only file system would fail and STONITH
ensues.

Add obd_device::obd_read_only to record read-only flag of the
obd_device. And skip checking the health of read-only devices.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ica83b9c871f7bee62cef6504deb0dcc32dd20afb
Reviewed-by: Wang Shilong <wshilong@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/48095
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-1904 idl: add checks for OBD_CONNECT flags 53/48053/2
Andreas Dilger [Fri, 28 May 2021 08:49:19 +0000 (02:49 -0600)]
LU-1904 idl: add checks for OBD_CONNECT flags

Make it harder to accidentally declare OBD_CONNECT flags without
properly defining their names.  Otherwise, this can cause serious
compatibility problems if two features are using the same flag.

Add the definition lines into spelling.txt so there is *always*
a warning generated, since this always needs proper attention.

Make it clear whom to contact when reserving a new feature flag.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9a5e2c97c40c39ea57d20979d4b130854edc785a
Reviewed-on: https://review.whamcloud.com/48053
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16012 sec: fix detection of SELinux enforcement 49/48049/4
Sebastien Buisson [Wed, 27 Jul 2022 12:39:26 +0000 (12:39 +0000)]
LU-16012 sec: fix detection of SELinux enforcement

On newer distros (e.g. RHEL 9.0), on which selinux_is_enabled() does
not exist anymore, the only way to find out if SELinux is enforced
when initializing the security context is to fetch the length of the
security attribute name. If it is 0, we conclude SELinux is disabled.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ifcdcb8ffbb7f9ad50d16d7d3317e94d0d212fa42
Reviewed-on: https://review.whamcloud.com/48049
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16048 build: Update ZFS version to 2.1.5 40/48040/3
Jian Yu [Tue, 26 Jul 2022 07:15:49 +0000 (00:15 -0700)]
LU-16048 build: Update ZFS version to 2.1.5

Update ZFS version to 2.1.5. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.1.5

Change-Id: I9f25aafe889f87fb80677e59dbe4679932d8b920
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48040
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16045 enc: force use of new enc xattr on new servers 35/48035/3
Sebastien Buisson [Mon, 25 Jul 2022 14:39:56 +0000 (16:39 +0200)]
LU-16045 enc: force use of new enc xattr on new servers

When an older client uses encryption with a newer server, the client
wants to see the encryption context in security.c xattr. But
internally on server side, we force use of newer encryption.c xattr
for consistency purpose. When required, the encryption context is put
in the request to the client as usual, which interprets it as desired.

Fixes: 4231fab66e ("LU-13717 sec: make client encryption compatible with ext4")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I667e123bdff912acc270666e8c74ebda6f0534e7
Reviewed-on: https://review.whamcloud.com/48035
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16035 kfilnd: Initial kfilnd implementation 09/48009/8
Doug Oucharek [Tue, 16 Oct 2018 22:51:21 +0000 (15:51 -0700)]
LU-16035 kfilnd: Initial kfilnd implementation

Initial implementation of the kfabric Lustre Network Driver.

Test-Parameters: trivial
HPE-bug-id: LUS-6565
Signed-off-by: Doug Oucharek <dougso@me.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I48a070ca0ba37e4923cd6dcb3327676ae6ddaae1
Reviewed-on: https://review.whamcloud.com/48009
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16027 tests: sanity:test_66: specify blocksize explicitly 83/47983/3
Elena Gryaznova [Tue, 19 Jul 2022 13:15:42 +0000 (16:15 +0300)]
LU-16027 tests: sanity:test_66: specify blocksize explicitly

Fix test_66() to be independent from BLOCKSIZE environment
variable.

To reproduce the failure, just run:
  llmount.sh
  export BLOCKSIZE=4096; ONLY=66 sh sanity.sh
  == sanity test 66: update inode blocks count on client ==
  8+0 records in
  8+0 records out
  8192 bytes (8.2 kB, 8.0 KiB) copied, 0.00589935 s, 1.4 MB/s
  sanity test_66: @@@@@@ FAIL: /mnt/lustre/f66 blocks 2 < 8

Test-Parameters: trivial testlist=sanity env=ONLY=66
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11014
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I0adca724518cb955e3664d33a36628ae19a1712d
Reviewed-on: https://review.whamcloud.com/47983
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15999 tests: format journal with correct block size 30/47930/5
Elena Gryaznova [Mon, 11 Jul 2022 09:04:53 +0000 (12:04 +0300)]
LU-15999 tests: format journal with correct block size

Without "-b block-size" mke2fs calculates block size itself based upon
device size. In result, filesystem and journal may be formatted with
different block sizes. For example, 32M device gets formatted to journal
with block size 1K and llmount.sh fails to create Lustre with external
journal:
   mke2fs: Filesystem has unexpected block size while trying
           to open journal device /dev/vdb
because the target device itself is created with default
"Block size: 4096".
Let's make sure that journal gets formatted with correct block size.

Patch also adds the ability to create fs with different block
sizes if BLCKSIZE or <facet>_BLOCKSIZE are set.

Fixes: d01d4c697a ("LU-957 scrub: Proc interfaces and tests for OI scrub")
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11008
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I0a82e34efc23d318bbd52946916ae8f2b7cd94eb
Reviewed-on: https://review.whamcloud.com/47930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15393 tests: check QoS hang with OST failover 15/47715/4
Alexander Boyko [Thu, 23 Jun 2022 13:33:47 +0000 (09:33 -0400)]
LU-15393 tests: check QoS hang with OST failover

Patch adds recovery small test 152, to reproduce situation
where MDT object allocation sleeps on OST failover at
lod_ost_alloc_rr under lq_rw_sem read. And all other creation threads
hang at lod_ost_alloc_qos at down_write(lq_rw_sem).

HPE-bug-id: LUS-10388
Test-Parameters: trivial testlist=recovery-small env=ONLY=152
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I7b9c5a5c9870a559e673a5fd253dcaea40d9fe63
Reviewed-on: https://review.whamcloud.com/47715
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16081 lnet: Memory leak on adding existing interface 73/48173/7
Frank Sehr [Tue, 9 Aug 2022 17:10:54 +0000 (10:10 -0700)]
LU-16081 lnet: Memory leak on adding existing interface

In the function lnet_dyn_add_ni an lnet_ni structure is allocated.
In case of an error the function returns without freeing the memory of
the structure.
Added handling of possible lnet_net structure memory leaks.

Test-parameters: trivial testlist=sanity-lnet
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: I7544a9379093b99f77aaddb8d021b4a5bf221082
Reviewed-on: https://review.whamcloud.com/48173
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15694 quota: keep grace time while setting default limits 35/46935/11
Hongchao Zhang [Thu, 28 Jul 2022 13:54:00 +0000 (21:54 +0800)]
LU-15694 quota: keep grace time while setting default limits

The quota grace time should only be changed by "lfs setquota -t",
and it should be kept while setting default quota limits.

This patch also fixes an issue of not saving the grace time while
writing glboal quota record.

Signed-off-by: Hongchao Zhag <hongchao@whamcloud.com>
Change-Id: I89ca49d09dc41deffe4bc77e53721b5bb4f4be37
Reviewed-on: https://review.whamcloud.com/46935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15642 obdclass: use consistent stats units 33/46833/9
Andreas Dilger [Wed, 16 Mar 2022 04:51:55 +0000 (22:51 -0600)]
LU-15642 obdclass: use consistent stats units

Use consistent stats units, since some were "usec" and others "usecs".
Most stats already use LPROCFS_TYPE_* to encode type stats type, so
use this to provide units for those stats, and only explicitly provide
strings for the few stats that don't match the commonly-used units.
This also reduces the number of repeat static strings in the modules.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I25f31478f238072ddbf9a3918cd43bb08c3ebbe5
Reviewed-on: https://review.whamcloud.com/46833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Ben Evans <beevans@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15930 lnet: Remove duplicate checks for peer sensitivity 26/46626/12
Chris Horn [Thu, 24 Feb 2022 20:30:59 +0000 (14:30 -0600)]
LU-15930 lnet: Remove duplicate checks for peer sensitivity

Callers of lnet_inc_lpni_healthv_locked() and
lnet_dec_healthv_locked() currently check whether the parent peer
has a peer specific sensitivity defined. To remove this code
duplication, this logic is rolled into
lnet_inc_lpni_healthv_locked() and lnet_dec_lpni_healthv_locked().
The latter is a new wrapper around lnet_dec_healthv_locked().

lnet_dec_healthv_locked() is changed to return a bool indicating
whether the health value was actually modified so that the peer
net health is only updated when the peer NI health actually changes.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11018
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I624561167392ad625ea7478689e9c5975cec3f2e
Reviewed-on: https://review.whamcloud.com/46626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15929 lnet: Correct net selection for router ping 27/47527/4
Chris Horn [Wed, 1 Jun 2022 02:19:07 +0000 (21:19 -0500)]
LU-15929 lnet: Correct net selection for router ping

lnet_find_best_ni_on_local_net() contains logic for restricting
the NI selection to a net specified by lnet_peer::lp_disc_net_id. The
purpose of this is to ensure that LNet peers ping every interface on
a router at a regular interval as part of the LNet router health
feature. However, this logic is flawed because lnet_msg_discovery()
is used to determine whether the message being sent is a discovery
message, but that function actually determines whether a given message
can _trigger_ discovery.

Introduce a new function, lnet_msg_is_ping(), which determines whether
a given lnet_msg is a GET on the LNET_RESERVED_PORTAL.
Modify lnet_find_best_ni_on_local_net() to restrict NI selection to
lp_disc_net_id iff:
1. lp_disc_net_id is non-zero
2. The peer has the LNET_PEER_RTR_DISCOVERY flag set.
3. lnet_msg_is_ping() returns true

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11017
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3dbdfd5c44b6167d24b7b6e0b1097db0b3c5cb76
Reviewed-on: https://review.whamcloud.com/47527
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15595 lnet: LNet peer aliveness broken 23/46623/11
Chris Horn [Mon, 14 Feb 2022 21:48:15 +0000 (21:48 +0000)]
LU-15595 lnet: LNet peer aliveness broken

The peer health feature used on LNet routers is intended to detect if
a peer is dead or alive by keeping track of the last time it received
a message from the peer. If the last alive value is outside of a
configurable interval then the peer is considered dead and the router
will drop messages to that peer rather than attempt to send to it.

This feature no longer works as intended because even if the
last alive value is outside the interval the router will still
consider the peer NI to be alive if the health value of the NI and
the cached status both indicate the peer NI is alive.

So even if a router has not received any messages from the client in
days, as long as the router thinks the peer's interfaces are healthy
then it will consider the peer alive. This doesn't make any sense as
peers are supposed to regularly ping the router, and if they don't do
so then they should not be considered alive.

Fix the issue by relying solely on the last alive value to determine
peer aliveness. Do not consider the health value or cached status
when determining whether to drop the message.

lnet_peer_alive_locked() has single caller that only checks whether
zero was returned. We can convert lnet_peer_alive_locked() to return
bool rather than int.

Rename lnet_peer_alive_locked() to lnet_check_message_drop() to
better reflect the purpose of the function. The return value is
inverted to reflect the name change.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaabdf5109676ffd18bdba9627afea7e041ddc1e1
Reviewed-on: https://review.whamcloud.com/46623
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-15595 tests: Add various router tests 22/46622/12
Chris Horn [Mon, 7 Feb 2022 23:20:37 +0000 (23:20 +0000)]
LU-15595 tests: Add various router tests

Add test cases to exercise LNet routing.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4a077937b3e3b8b07707afeb0c5c23ec1c9074f4
Reviewed-on: https://review.whamcloud.com/46622
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-14955 lnet: Use fatal NI if none other available 46/44746/6
Serguei Smirnov [Tue, 24 Aug 2021 20:48:41 +0000 (13:48 -0700)]
LU-14955 lnet: Use fatal NI if none other available

Allow NI in fatal state to be selected for sending if there are no
NIs in non-fatal state.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11019
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iab8ef6ee5c5f45896196dbd88a2f61e004278297
Reviewed-on: https://review.whamcloud.com/44746
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-16058 build: proc_ops check fails with SUBARCH undefined 01/48101/3
Shaun Tancheff [Mon, 1 Aug 2022 13:58:46 +0000 (20:58 +0700)]
LU-16058 build: proc_ops check fails with SUBARCH undefined

During configure with config.cache enabled SUBARCH may not
be defined.

Move the definition to a location that must be traversed.

Test-Parameters: trivial
Fixes: a5084c2f2e ("LU-14937 build: re-use config cache in 'make rpms/debs'")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I0a7b4de3ecccd41b1c55e8b2df29039517e0c416
Reviewed-on: https://review.whamcloud.com/48101
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
20 months agoLU-12514 target: move server mount code to target layer 60/47160/6
James Simmons [Mon, 6 Jun 2022 14:48:24 +0000 (10:48 -0400)]
LU-12514 target: move server mount code to target layer

Currently the server mount code for lustre_tgt is all in obdclass. If
we change the stack to initialize the LNet / ptlrpc layer after mounting
then we will end up with modular circular dependencies. To avoid this
move all the lustre_tgt mounting code to the target layer. This way the
mounting code can use both ptlrpc and obdclass module routiens. Also include
MODULE_ALAIS("lustre_tgt") so mount -t lustre_tgt will load ptlrpc which
contains the target layer.

Change-Id: I392602e8fd18d001cb97b05b909c366ba5b8fa82
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/47160
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>