Whamcloud - gitweb
fs/lustre-release.git
18 months agoLU-15852 lnet: Don't modify uptodate peer with temp NI 22/47322/3
Chris Horn [Wed, 30 Mar 2022 18:35:23 +0000 (13:35 -0500)]
LU-15852 lnet: Don't modify uptodate peer with temp NI

When processing the config log it is possible that we attempt to
add temp NIs after discovery has completed on a peer. These temp
may not actually exist on the peer. Since discovery has already
completed the peer is considered up-to-date and we can end up with
incorrect peer entries. We shouldn't add temp NIs to a peer that
is already up-to-date.

HPE-bug-id: LUS-10867
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia484713b1e6c9e1a46e525589b7c741c6478e417
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47322
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15139 osp: block reads until the object is created 03/47003/24
Alex Zhuravlev [Wed, 6 Apr 2022 08:00:30 +0000 (11:00 +0300)]
LU-15139 osp: block reads until the object is created

it's possible that remote llog can be read and written simultaneously
at recovery. for example, dtx recovery thread is fetching updates
while MDD's orphan cleanup procedure is removing orphans from PENDING.

OSP can be asked to read a just created in OSP cache object while
actual object on remote MDS hasn't been created yet. OSP should
block such reads until the creation is done.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Id0f52b90761839399102bed825569da6bfd17864
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47003
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15626 tests: Fix "error" reported by shellcheck for replay-dual 35/46835/4
Arshad Hussain [Wed, 16 Mar 2022 08:24:32 +0000 (13:54 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for replay-dual

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/replay-dual.sh. This patch also
moves spaces to tabs.

Test-Parameters: trivial testlist=replay-dual
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie195dd39dd4789be660115b360b5b8bf6ebc1a57
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46835
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-15626 tests: Fix "error" reported by shellcheck 11/46811/5
Arshad Hussain [Sat, 12 Mar 2022 06:14:20 +0000 (11:44 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck

This patch fixes "error" issues reported by shellcheck
for *.sh files. These files had only single error
reported by shellcheck. The change in these files are
init_test_env $@ (->to->) init_test_env "$@"

Test-Parameters: trivial
Test-Parameters: testlist=dom-performance,scrub-performance
Test-Parameters: testlist=replay-single,replay-ost-single,replay-vbr
Test-Parameters: testlist=sanity-pcc,sanity-pfl,sanity-selinux
Test-Parameters: testlist=sanity-benchmark,parallel-scale
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I21fc2f25eb67d724b9e30c586568d2501648a80a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
18 months agoLU-15619 osc: Remove oap lock 19/46719/7
Patrick Farrell [Fri, 4 Mar 2022 22:08:44 +0000 (17:08 -0500)]
LU-15619 osc: Remove oap lock

The OAP lock is taken around setting the oap flags, but not
any of the other fields in oap.  As far as I can tell, this
is just some cargo cult belief about locking - there's no
reason for it.

Remove it entirely.  (From the code, a queued spin lock
appears to be 12 bytes on x86_64.)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ib61190d52c08d88c95a0c19b8ef7d114e26cfae2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46719
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
18 months agoLU-15046 osp: precreate thread vs connect race 99/45099/24
Alex Zhuravlev [Thu, 30 Sep 2021 12:16:57 +0000 (15:16 +0300)]
LU-15046 osp: precreate thread vs connect race

lcs_exp (required for fid client) was initialized in osp_obd_connect()
which races with osp_precreate_thread(). the latter can get stuck if
lcs_exp is not initialized and then the whole precreation logic is
blocked until remount. instead the precreation thread can just wait
preliminary until lcs_exp is initialized properly.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7a42bf4b17ce5d46bc25bd548d81eb55f168804b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45099
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-6142 obdclass: make ccc_users in cl_client_cache a refcount_t 81/48881/2
Mr. NeilBrown [Fri, 7 Oct 2022 13:53:38 +0000 (09:53 -0400)]
LU-6142 obdclass: make ccc_users in cl_client_cache a refcount_t

As this is used as a refcount, it should be declared
as one.

Change-Id: I5af513ccb2b706a398e647ce0427affa4516a9b5
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16160 llite: clear stale page's uptodate bit 07/48607/18
Bobi Jam [Tue, 20 Sep 2022 16:27:04 +0000 (00:27 +0800)]
LU-16160 llite: clear stale page's uptodate bit

With truncate_inode_page()->do_invalidatepage()->ll_invalidatepage()
call path before deleting vmpage from page cache, the page could be
possibly picked up by ll_read_ahead_page()->grab_cache_page_nowait().

If ll_invalidatepage()->cl_page_delete() does not clear the vmpage's
uptodate bit, the read ahead could pick it up and think it's already
uptodate wrongly.

In ll_fault()->vvp_io_fault_start()->vvp_io_kernel_fault(), the
filemap_fault() will call ll_readpage() to read vmpage and wait for
the unlock of the vmpage, and when ll_readpage() successfully read
the vmpage then unlock the vmpage, memory pressure or truncate can
get in and delete the cl_page, afterward filemap_fault() find that
the vmpage is not uptodate and VM_FAULT_SIGBUS got returned. To fix
this situation, this patch makes vvp_io_kernel_fault() restart
filemap_fault() to get uptodated vmpage again.

Test-Parameters: testlist=sanityn env=ONLY="16f",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16g",ONLY_REPEAT=50
Test-Parameters: testlist=sanityn env=ONLY="16f 16g",ONLY_REPEAT=50
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I369e1362ffb071ec0a4de3cd5bad27a87cff5e05
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48607
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15935 target: keep track of multirpc slots in last_rcvd 82/48082/11
Etienne AUJAMES [Fri, 29 Jul 2022 12:35:33 +0000 (14:35 +0200)]
LU-15935 target: keep track of multirpc slots in last_rcvd

OBD_INCOMPAT_MULTI_RPCS is cleared by tgt_boot_epoch_update() if the
recovery is aborted. This supposes that all the clients are evicted
but that is not true. Some clients could have successfully finished
their recovery. In that case, those clients will keep their last_rcvd
slot.

This patch modifies lut_num_client to keep track of multirpc
slots in last_rcvd.
For now the counter is use only by tgt_fini() to clear
OBD_INCOMPAT_MULTI_RPCS. So we can expand this use case for
tgt_boot_epoch_update().

Add replay-dual test_33.

Test-Parameters: testlist=replay-dual env=ONLY=33,ONLY_REPEAT=30
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I70791c9dcb7cc77f018b9e5c95568598d54f0322
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48082
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoRevert "LU-16046 revert: "LU-9964 llite: prevent mulitple group locks""
Oleg Drokin [Tue, 1 Nov 2022 18:39:15 +0000 (14:39 -0400)]
Revert "LU-16046 revert: "LU-9964 llite: prevent mulitple group locks""

This reverts commit bc37f89a81ea0a2fae8668e21247552e8894bfd8.

unreverting the revert since the fix that replaced it was bad and
ther are better ideas on how to amend this fix now rather than
full-on revert

Change-Id: I1ef28c13715e7ea98021e1f83331e5533c2a8868
Signed-off-by: Oleg Drokin <green@whamcloud.com>
18 months agoRevert "LU-16046 ldlm: group lock fix"
Oleg Drokin [Tue, 1 Nov 2022 18:38:37 +0000 (14:38 -0400)]
Revert "LU-16046 ldlm: group lock fix"

This reverts commit 3ffcb5b700ebfd68dba4daca4192fdacaf7fd541.
it introduced sleep under spinlock that was missed in testing.

Change-Id: I133e704595e97c0c62f47c23b3996871daf4c0dd
Signed-off-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-14719 lod: distributed transaction check space 39/47039/8
Lai Siyao [Wed, 30 Mar 2022 21:50:22 +0000 (17:50 -0400)]
LU-14719 lod: distributed transaction check space

Distributed transaction failure may cause file missing or disconnected
directories, to avoid failure on disk full, check remote MDT free
space before transaction start.

The block/inode watermarks in obd_statfs_info are used to check
whether MDT has enough free blocks/inodes.

Add sanity 230x.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I0922e9c8668e8b842d313576bd68b52fa5d434ac
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47039
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16187 tests: Fix is_project_quota_supported() 54/48654/2
Arshad Hussain [Mon, 26 Sep 2022 09:31:41 +0000 (15:01 +0530)]
LU-16187 tests: Fix is_project_quota_supported()

is_project_quota_supported() is called from sanity-quota.sh
to verify if the ldiskfs FS $ENABLE_PROJECT_QUOTAS is true
and to verify if current version of lfs command supports
'project'.  To do this it calls 'lfs --help' which is
not supported. This patch moves 'lfs --help' call to
'lfs --list-commands' call to verfiy if the present
version of lfs supports 'project'

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iba7e6696d3fa9e980088f448ae72b07a4b47f4f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48654
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16240 build: Use new AS_HELP_STRING 84/48884/4
James Simmons [Tue, 4 Oct 2022 13:36:23 +0000 (07:36 -0600)]
LU-16240 build: Use new AS_HELP_STRING

Starting with autoconf 2.70 AC_HELP_STRING has been replaced with
AS_HELP_STRING. Move to this new macro.

Test-Parameters: trivial
Change-Id: I1d4f69fb844f51f05a8f46751df8b79d93db78f8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48884
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
18 months agoLU-15305 obdclass: fix race in class_del_profile 02/48802/4
Li Dongyang [Fri, 7 Oct 2022 12:09:10 +0000 (23:09 +1100)]
LU-15305 obdclass: fix race in class_del_profile

Move profile lookup and remove from lustre_profile_list
into the same critical section, otherwise we could race with
class_del_profiles or another class_del_profile.

Do not create duplicate mount opts in the client config,
otherwise we will add duplicate lustre_profile to
lustre_profile_list for a single mount.

Change-Id: I648aa206716213b064d045f546516b219337e0ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48802
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15807 ksocklnd: fix irq lock inversion while calling sk_data_ready() 15/48715/5
James Simmons [Sun, 2 Oct 2022 13:45:42 +0000 (09:45 -0400)]
LU-15807 ksocklnd: fix irq lock inversion while calling sk_data_ready()

sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but ksocklnd version of sk_data_ready, ksocknal_data_ready()
does not handle the BH case. Change how ksnd_global_lock is taken in
this case.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: testgroup=review-ldiskfs-arm testlist=sanity-lnet
Change-Id: I07fade0da4cdfe095edc7a17e4f65012d6f92942
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48715
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
18 months agoLU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0] 76/48676/3
Jian Yu [Thu, 6 Oct 2022 19:02:23 +0000 (12:02 -0700)]
LU-16177 kernel: kernel update RHEL9.0 [5.14.0-70.26.1.el9_0]

Update RHEL9.0 kernel to 5.14.0-70.26.1.el9_0 for Lustre client.

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="130 244a" testlist=sanity

Change-Id: I9da2ccdf419d6490fdba80199eda69f4f19361be
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48676
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16175 kernel: kernel update SLES12 SP5 [4.12.14-122.133.1] 05/48605/2
Jian Yu [Tue, 20 Sep 2022 03:47:03 +0000 (20:47 -0700)]
LU-16175 kernel: kernel update SLES12 SP5 [4.12.14-122.133.1]

Update SLES12 SP5 kernel to 4.12.14-122.133.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles12sp5 \
env=SANITY_EXCEPT="56oc 430c 817" testlist=sanity

Change-Id: I35596cdfa075a19b5b1d29bad96271cbe83491bb
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48605
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2] 04/48604/2
Jian Yu [Tue, 20 Sep 2022 03:33:30 +0000 (20:33 -0700)]
LU-16174 kernel: kernel update SLES15 SP4 [5.14.21-150400.24.21.2]

Update SLES15 SP4 kernel to 5.14.21-150400.24.21.2 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Change-Id: Ia68e1c960c79f40d0f725b0f440cd562b820a19f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48604
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1] 01/48601/3
Jian Yu [Thu, 13 Oct 2022 01:18:15 +0000 (18:18 -0700)]
LU-16173 kernel: kernel update SLES15 SP3 [5.3.18-150300.59.93.1]

Update SLES15 SP3 kernel to 5.3.18-150300.59.93.1 for Lustre client.

Test-Parameters: trivial clientdistro=sles15sp3 \
testlist=sanity

Change-Id: I1e0afe6974567d13680dbb0d463fbbd873ef2e5f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48601
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16233 build: Add always target for SUSE15 SP3 LTSS 33/48833/2
Shaun Tancheff [Wed, 12 Oct 2022 06:16:21 +0000 (13:16 +0700)]
LU-16233 build: Add always target for SUSE15 SP3 LTSS

SUSE 15 SP3 LTSS kernel version 5.3.18-150300.59.93
(and later) breaks lustre build tests which expect
conftest.i to be generated.

HPE-bug-id: LUS-11286
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If23e9b31b537878a43075ffff62a99906f47fd9a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48833
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15234 lnet: add mechanism for dumping lnd peer debug info 66/48566/6
Serguei Smirnov [Mon, 28 Feb 2022 19:04:00 +0000 (11:04 -0800)]
LU-15234 lnet: add mechanism for dumping lnd peer debug info

Add ability to dump lnd peer debug info:
lnetctl debug peer --nid=<nid>

The debug info is dumped to the log as D_CONSOLE by the respective
lnd and can be retrieved with "lctl dk" or seen in syslog.
This mechanism has been added for socklnd and o2iblnd peers.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ia9c4d59143206bcb7ec43806594cf0cfaed5f0a9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48566
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15795 lbuild: enable KABI 07/47507/5
Minh Diep [Tue, 20 Sep 2022 18:24:54 +0000 (11:24 -0700)]
LU-15795 lbuild: enable KABI

Enable build kabi and clean up kmodtool patch

Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.5 serverdistro=el8.5
Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.6 serverdistro=el8.6

Change-Id: I16d54af0004c4ddc1cc5e6acca81e4aa89a1a1c1
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47507
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-12130 test: pool inheritance for mdt component 91/46391/4
Vitaly Fertman [Mon, 31 Jan 2022 15:43:14 +0000 (18:43 +0300)]
LU-12130 test: pool inheritance for mdt component

test if the pool info is inherited for the mdt component,
what is not supposed to happen

Test-Parameters: testlist=sanity env=ONLY=65o
Change-Id: I07e15fe2979c2e8887024fb959af2926425d258a
HPE-bug-id: LUS-7180
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46391
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15447 tests: sanity-flr/208 reset rotational status 88/46088/15
Alex Zhuravlev [Thu, 13 Jan 2022 07:27:21 +0000 (10:27 +0300)]
LU-15447 tests: sanity-flr/208 reset rotational status

new kernels (e.g. 4.18.0-305.25.1) declares loopback devices
in tmpfs as non-rotational one. sanity-flr/208 does wrong
assumption that devices are non-rotational by default. thus,
sanity-flr/208 started to fail with new kernels.

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib5c42da39667227a6cff5d379e30d2cd6c1e2773
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46088
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16183 test: sanity-hsm/70 should detect python 37/48737/3
Minh Diep [Mon, 3 Oct 2022 18:22:47 +0000 (11:22 -0700)]
LU-16183 test: sanity-hsm/70 should detect python

Check for python2 and python3 explicitly, since the
generic python command does not exist in newer distros.

Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=sles15sp3 testlist=sanity-hsm
Test-Parameters: env=SLOW=yes,ENABLE_QUOTA=yes \
clientdistro=el7.9 testlist=sanity-hsm

Change-Id: I35bbe15fd298341870ad4f1ab5976e82ccc84667
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48737
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Charlie Olmstead <charlie@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-13175 tests: sanity/803 to sync MDTs for actual statfs 46/37346/9
Alex Zhuravlev [Tue, 28 Jan 2020 23:00:59 +0000 (02:00 +0300)]
LU-13175 tests: sanity/803 to sync MDTs for actual statfs

as number of dnodes is updated at commit.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I037064419a4674fe8e269b68e41f97c0f3763332
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/37346
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16139 statahead: avoid to block ptlrpcd interpret context 51/48451/5
Qian Yingjin [Wed, 7 Sep 2022 08:59:19 +0000 (04:59 -0400)]
LU-16139 statahead: avoid to block ptlrpcd interpret context

If a stat-ahead entry is a striped directory or a regular file
with layout change, it will generate a new RPC and block ptlrpcd
interpret context for a long time.
However, it is dangerous of blocking in ptlrpcd thread as it may
result in deadlock.

The following is the stack trace for the timeout of replay-dual
test_26:
task:ptlrpcd_00_01   state:I stack:    0 pid: 8026 ppid:     2
osc_extent_wait+0x44d/0x560 [osc]
osc_cache_wait_range+0x2b8/0x930 [osc]
osc_io_fsync_end+0x67/0x80 [osc]
cl_io_end+0x58/0x130 [obdclass]
lov_io_end_wrapper+0xcf/0xe0 [lov]
lov_io_fsync_end+0x6f/0x1c0 [lov]
cl_io_end+0x58/0x130 [obdclass]
cl_io_loop+0xa7/0x200 [obdclass]
cl_sync_file_range+0x2c9/0x340 [lustre]
vvp_prune+0x5d/0x1e0 [lustre]
cl_object_prune+0x58/0x130 [obdclass]
lov_layout_change.isra.47+0x1ba/0x640 [lov]
lov_conf_set+0x38d/0x4e0 [lov]
cl_conf_set+0x60/0x140 [obdclass]
cl_file_inode_init+0xc8/0x380 [lustre]
ll_update_inode+0x432/0x6e0 [lustre]
ll_iget+0x227/0x320 [lustre]
ll_prep_inode+0x344/0xb60 [lustre]
ll_statahead_interpret_common.isra.26+0x69/0x830 [lustre]
ll_statahead_interpret+0x2c8/0x5b0 [lustre]
mdc_intent_getattr_async_interpret+0x14a/0x3e0 [mdc]
ptlrpc_check_set+0x5b8/0x1fe0 [ptlrpc]
ptlrpcd+0x6c6/0xa50 [ptlrpc]

In this patch, we use work queue to handle the extra RPC and long
wait in a separate thread for a striped directory and a regular
file with layout change:
(@ll_prep_inode->@lmv_revalidate_slaves);
(@ll_prep_inode->@lov_layout_change->osc_cache_wait_range)

Test-Parameters: testlist=replay-dual env=ONLY=26,ONLY_REPEAT=10 mdscount=2 mdtcount=4
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I404a320620c4ec4caa608e675ecf324fcd26f1e0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48451
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16149 lnet: Discovery queue and deletion race 32/48532/3
Chris Horn [Mon, 12 Sep 2022 21:09:38 +0000 (15:09 -0600)]
LU-16149 lnet: Discovery queue and deletion race

lnet_peer_deletion() can race with another thread calling
lnet_peer_queue_for_discovery.

Discovery thread:
 - Calls lnet_peer_deletion():
 - LNET_PEER_DISCOVERING bit is cleared from lnet_peer::lp_state
 - releases lnet_peer::lp_lock

Another thread:
 - Acquires lnet_net_lock/EX
 - Calls lnet_peer_queue_for_discovery()
 - Takes lnet_peer::lp_lock
 - Sets LNET_PEER_DISCOVERING bit
 - Releases lnet_peer::lp_lock
 - Sees lnet_peer::lp_dc_list is not empty, so it does not add peer
   to dc request queue
 - lnet_peer_queue_for_discovery() returns, lnet_net_lock/EX releases

Discovery thread:
 - Acquires lnet_net_lock/EX
 - Deletes peer from ln_dc_working list
 - performs the peer deletion

At this point, the peer is not on any discovery list, and it has
LNET_PEER_DISCOVERING bit set. This peer is now stranded, and any
messages on the peer's lnet_peer::lp_dc_pendq are likewise stranded.

To solve this, we modify lnet_peer_deletion() so that it waits to
clear the LNET_PEER_DISCOVERING bit until it has completed deleting
the peer and re-acquired the lnet_peer::lp_lock. This ensures we
cannot race with any other thread that may add the
LNET_PEER_DISCOVERING bit back to the peer. We also avoid deleting
the peer from the ln_dc_working list in lnet_peer_deletion(). This is
already done by lnet_peer_discovery_complete().

There is another window where the LNET_PEER_DISCOVERING bit can be
added when the discovery thread drops the lp_lock just before
acquiring the net_lock/EX and calling lnet_peer_discovery_complete().
Have lnet_peer_discovery_complete() clear LNET_PEER_DISCOVERING to
deal with this (it already does this for the case where discovery hit
an error). Also move the deletion of lp_dc_list to after we clear the
DISCOVERING bit. This is to mirror the behavior of
lnet_peer_queue_for_discovery() which sets the DISCOVERING bit and
then manipulates the lp_dc_list.

Also tweak the logic in lnet_peer_deletion() to call
lnet_peer_del_locked() in order to avoid extra calls to
lnet_net_lock()/lnet_net_unlock().

HPE-bug-id: LUS-11237
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ifcfef1d49f216af4ddfcdaf928024e8ee3952555
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48532
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15847 tgt: move tti_ transaction params to tsi_ 91/47491/5
Mikhail Pershin [Sat, 28 May 2022 18:16:11 +0000 (21:16 +0300)]
LU-15847 tgt: move tti_ transaction params to tsi_

Move tti_mult_trans and tti_has_trans to tgt_session_info to
be available in all targets. This allows to cleanup old MDT
duplicating code and can be used for complex transaction
handling in MDT/OFD if needed.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I3f0c15e283b9e21c04a009f6cf346afa278e7095
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: John Hammond <jhammond@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15847 tgt: reply always with the latest assigned transno 92/47492/3
Mikhail Pershin [Tue, 31 May 2022 10:38:25 +0000 (13:38 +0300)]
LU-15847 tgt: reply always with the latest assigned transno

In tgt_txn_stop_cb() don't skip transno assignment in case
of unexpected multiple last_rcvd updates. So the latest
transno will be reported back in reply but not the first
one.

The reporting of just the first transno might lead to data
loss at failover because partially committed operation will
be considered as fully committed and rest of operation will
not be replayed.

Proposed way with reporting the last assigned transno to
the client could cause replay failures in some cases which
is still better that possible data loss. So patch makes a
multiple transaction case less severe.

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia07e89576127a2fc1eb2ae706551ffe8ceaa93be
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47492
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16025 llite: adjust read count as file got truncated 96/47896/21
Bobi Jam [Thu, 7 Jul 2022 07:38:54 +0000 (15:38 +0800)]
LU-16025 llite: adjust read count as file got truncated

File read will not notice the file size truncate by another node,
and continue to read 0 filled pages beyond the new file size.

This patch add a confinement in the read to prevent the issue and
add a test case verifying the fix.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ie51ba09201a1ca1464c3a3892d367590e978ee34
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47896
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
18 months agoLU-16046 ldlm: group lock fix 38/48038/6
Vitaly Fertman [Wed, 8 Jun 2022 20:05:45 +0000 (23:05 +0300)]
LU-16046 ldlm: group lock fix

The original LU-9964 fix had a problem because with many pages in
memory grouplock unlock takes 10+ seconds just to discard them.

The current patch makes grouplock unlock asynchronous. It introduces
a logic similar to the original one, but on mdc/osc layer.

add a new test similar to sanity_244b but for DOM layout files.

HPE-bug-id: LUS-10644, LUS-10906
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: Ib6d6a3a41baff5b0161468abfd959f52e2a1b497
Reviewed-on: https://es-gerrit.dev.cray.com/159856
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48038
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16046 revert: "LU-9964 llite: prevent mulitple group locks" 37/48037/5
Vitaly Fertman [Thu, 9 Jun 2022 22:00:50 +0000 (01:00 +0300)]
LU-16046 revert: "LU-9964 llite: prevent mulitple group locks"

This reverts commit aba68250a67a10104c534bd726f67b31a7f35692
since it makes group unlock synchronous what leads to poor performance
on shared file IO under group lock.

Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I4548986297c22e402acd051dbdf97fe58198d100
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48037
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15721 llite: only statfs for projid if PROJINHERIT set 52/47352/3
Andreas Dilger [Sat, 14 May 2022 14:10:20 +0000 (08:10 -0600)]
LU-15721 llite: only statfs for projid if PROJINHERIT set

If projid is set on a directory but PROJINHERIT is not, do not report
the project quota for statfs.  This matches how ext4_statfs() and
xfs_fs_statfs() behave, on which Lustre project quota is modelled.

Fixes: e5c8f6670f ("LU-9555 quota: df should return projid-specific values")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I27cb444c3dfabc0ec693cee6fe6f9cae6db8a77a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47352
Reviewed-by: Wang Shilong <wangshilong1991@gmail.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
18 months agoLU-16219 tests: syntax error fix 95/48795/3
Elena Gryaznova [Thu, 6 Oct 2022 08:07:39 +0000 (11:07 +0300)]
LU-16219 tests: syntax error fix

scrub-performance:scrub_create() fix

Fixes: a20b78a81d ("LU-15357 iokit: fix the obsolete usage of cfg_device")
Test-Parameters: trivial testlist=scrub-performance
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11277
Change-Id: Ib6e4354c2f399019ec2d6c33f9a7d544226c0392
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48795
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
18 months agoLU-16198 tests: increase margin for sanity/33hh 13/48713/3
Andreas Dilger [Fri, 30 Sep 2022 22:25:52 +0000 (16:25 -0600)]
LU-16198 tests: increase margin for sanity/33hh

The filenames created by sanity test_33hh are randomly generated by
"mktemp" and in some rare cases a larger number of filenames may
fail the CRUSH2 hash detection for 'random' suffixes (all-numeric,
all-uppercase, all-lowercase).  This appears to be failing about
1/200 tests, but since sanity is run frequently (~1400 times/month)
there are still occasional failures reported.

Increase the maximum filename mismatch rate from 20% to 23%, which
would have avoided all of the test failures in the past 3 months.

Test-Parameters: trivial testlist=sanity mdscount=2 mdtcount=4 env=ONLY=33hh,ONLY_REPEAT=400
Fixes: 1ac4b9598a ("LU-15720 dne: add crush2 hash type")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If63528c4281e543975454d1d84306b0dfcfc0fff
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48713
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16200 tests: test_32[f,g]: specify blocksize explicitly 08/48708/2
Elena Gryaznova [Fri, 30 Sep 2022 15:39:13 +0000 (18:39 +0300)]
LU-16200 tests: test_32[f,g]: specify blocksize explicitly

Fix conf-sanity:test_32f(), conf-sanity:test_32g() to be
independent from BLOCKSIZE environment variable.

To reproduce the failure, just run:
   BLOCKSIZE=4096 ONLY=32g sh conf-sanity.sh
  -total 36
  -total 64
  +total 16
  +total 9
   144115205289279502 -rw-r--r-- 1 0 0  1160 1550597702 README
   conf-sanity test_32g: @@@@@@ FAIL: list verification failed

Fixes: 3c1c462399 ("LU-1943 tests: Refresh conf-sanity 32[ab]")
Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11013
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iaa07df8f5a9ba286ef5b3a5581b667cc7de63334
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48708
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed 29/48629/11
Andreas Dilger [Thu, 6 Oct 2022 17:31:51 +0000 (10:31 -0700)]
LU-16180 ptlrpc: reduce lock contention in ptlrpc_free_committed

This patch breaks out of the loop in ptlrpc_free_committed()
if need_resched() is true or there are other threads waiting
on the imp_lock. This can avoid the thread holding the
CPU for too long time to free large number of requests. The
remaining requests in the list will be processed the next
time this function is called. That also avoids delaying a
single thread too long if the list is long.

Test-Parameters: testlist=sanity clientdistro=el8.6
Test-Parameters: testlist=sanity clientdistro=ubuntu2204 env=SANITY_EXCEPT="130 244a"

Change-Id: I50f56b87844e8b019053e569767b6c949d2a3f55
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48629
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16076 utils: enhance 'lfs check' command 55/48155/13
Lei Feng [Mon, 8 Aug 2022 02:59:25 +0000 (10:59 +0800)]
LU-16076 utils: enhance 'lfs check' command

Add optional argument to 'lfs check' command so that only the
servers related to the specified lustre file system is checked.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanityn env=ONLY=113
Change-Id: I826a8e822af0a290f06ffaadadf1bb7f86899d99
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48155
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16044 osd: discard pagecache in truncate's declaration 33/48033/14
Alex Zhuravlev [Mon, 25 Jul 2022 13:26:40 +0000 (16:26 +0300)]
LU-16044 osd: discard pagecache in truncate's declaration

to avoid taking pagelock inside a transaction which conflicts
with the write path where we take pagelock before any another one.
this should be safe as the write path writes the pages out
synchronously, so they should be clean by truncate.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Iba555ace2ce9ef34ab5517375ecb5c176f738a02
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48033
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15451 sec: retry ro mount if read-only flag set 90/47490/6
Sebastien Buisson [Wed, 25 May 2022 14:53:57 +0000 (16:53 +0200)]
LU-15451 sec: retry ro mount if read-only flag set

In case client mount fails with -EROFS because the read-only nodemap
flag is set and ro mount option is not specified, just retry ro mount
internally. This is to avoid the need for users to manually retry the
mount with ro option.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I0dedd1394eeb6804f7fdde930275f6649b935bab
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47490
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-13364 utils: fix bad output for lnetctl import --show 22/43922/2
Cyril Bordage [Fri, 4 Jun 2021 03:40:07 +0000 (05:40 +0200)]
LU-13364 utils: fix bad output for lnetctl import --show

Read the right node from the yaml input ("net type" instead of "net")
to compare to what we find from ioctl when we filter results.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I9fbbac882f26fd93299f37cca00fcbd4cb7e95d2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43922
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-14165 utils: llog_reader: display changleog_user records 18/40818/4
Etienne AUJAMES [Tue, 1 Dec 2020 18:10:41 +0000 (19:10 +0100)]
LU-14165 utils: llog_reader: display changleog_user records

Add a function to print changelog_user information.

llog_reader output:

01 (080)changelog user record (v2) id:0x0 cur_id:3 cur_endrec:0
cur_time:1661258371 cur_mask:0x00000003 cur_name:"toto"
...
04 (080)changelog user record (v1) id:0x0 cur_id:6 cur_endrec:0
cur_time:1661261064

Test-Parameters: trivial testlist=sanity,sanity-hsm
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I4e948f52a678127d70e8084e94fb89ec2677cc4b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40818
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-6142 obdclass: change some foo0() to __foo() 03/48803/2
Mr. NeilBrown [Fri, 7 Oct 2022 12:57:29 +0000 (08:57 -0400)]
LU-6142 obdclass: change some foo0() to __foo()

Change:
  cl_io_init0 -> __cl_io_init
  cl_lock_trace0 -> __cl_lock_trace
  cl_page_delete0 -> __cl_page_delete
  cl_page_state_set0 -> __cl_page_state_set
  cl_page_own0 -> __cl_page_own
  cl_page_disown0 -> __cl_page_disown
  cl_page_delete0 -> __cl_page_delete

This is more consistent with Linux naming style.

Test-Parameters: trivial
Change-Id: If38b52465d42ac425d47c1e9ded62bd7f013e0eb
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10391 lnet: support IPv6 in lnet_inet_enumerate() 72/48572/2
Mr NeilBrown [Fri, 16 Sep 2022 00:57:13 +0000 (10:57 +1000)]
LU-10391 lnet: support IPv6 in lnet_inet_enumerate()

lnet_inet_enumerate() can now optionally report IPv6 addresses on
interfaces.  We use this in socklnd to determine the address of the
interface.

Unlike IPv4, different IPv6 addresses associated with a single
interface cannot be associated with different labels (e.g. eth0:2).
This means that lnet_inet_enumerate() must report the same name for
each address.  For now, we only report the first non-temporary address
to avoid any confusion.

The network mask provided with IPv4 is only use for reporting
information for an ioctl.  It isn't clear this will be useful for
IPv6, so no netmask is collected.

To save a bit of space in struct lnet_inetdev{} which much now hold a
16byte address, we replace he 4byte flag with a 1byte bool as only the
IFF_MASTER flag is ever of interest.  Another bool is needed to report
of the address is IPv6.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7a73033f40cc83a8993281696f17332a9101db1e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48572
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16002 ptlrpc: reduce pinger eviction time 28/47928/10
Alexander Boyko [Fri, 16 Sep 2022 08:00:38 +0000 (04:00 -0400)]
LU-16002 ptlrpc: reduce pinger eviction time

On a server side eviction is based on PING_INTERVAL. A client
should be evicted after PING_EVICT_TIMEOUT. But eviction logic
adds additional 3 PING_INTERVAL for it. For a configuration
with obd_timeout equal to 300, addition is 225 seconds.
The second level timeout is needed when network is down for
some time. And it prevents clients evictions after first
connection.
Patch adds additional logic to check if an import is active,
and evict client faster without second level. It reduces an
eviction timeout to a PING_EVICT_TIMEOUT.

replay_dual test_0a  is based on a client eviction during recovery,
lfs df check could fail because of eviction. So complete check
similar to recovery-small.sh

Test-Parameters: testlist=recovery-small env=RECOVERY_SMALL_EXCEPT=144 serverversion=2.14
HPE-bug-id: LUS-11054
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I4d60046ef4737f9cf95a16ac0ab63a36859b8adc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16211 o2iblnd: Avoid NULL md deref 77/48777/3
Chris Horn [Mon, 3 Oct 2022 21:34:11 +0000 (15:34 -0600)]
LU-16211 o2iblnd: Avoid NULL md deref

struct lnet_msg::msg_md is NULL when a router is forwarding a
REPLY. ko2iblnd attempts to access this pointer on the receive path.
This causes a panic.

Test-Parameters: trivial
Fixes: 959304eac7 ("LU-15189 lnet: fix memory mapping.")
HPE-bug-id: LUS-11269
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I0c1dbb1e0bcd3c17b278f358755d465f7bbbb2b0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48777
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16199 ldiskfs: make ubuntu kernel version detection better 17/48717/2
Ake Sandgren [Mon, 3 Oct 2022 06:39:20 +0000 (08:39 +0200)]
LU-16199 ldiskfs: make ubuntu kernel version detection better

Ubuntu kernel version detection is not working correctly with
official versioning scheme.  There are also a couple of errors in the
AS_VERSION_COMPARE sequences causing problems for 5.4.0 and later.

Signed-off-by: Ake Sandgren <ake.sandgren@hpc2n.umu.se>
Change-Id: Ie6e51de95ae1513b15ee0c2baa8c421f3cb954f5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48717
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16197 kfilnd: Convert NID num to host order 00/48700/2
Chris Horn [Mon, 26 Sep 2022 18:59:38 +0000 (12:59 -0600)]
LU-16197 kfilnd: Convert NID num to host order

The nid_num field in struct lnet_nid is stored in network byte order.
The nid_num field is used to generate the kfabric service string. The
underlying kfabric providers expect the service string to be in host
byte order not network byte order. This mismatch is preventing
multiple LNet NID indexes from being used.

Fix this by converting nid_num to host byte order.

Test-Parameters: trivial
HPE-bug-id: LUS-11254
Change-Id: I804daa6d66d775212a83e3ed013310b383b94974
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Signed-off-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48700
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16191 socklnd: limit retries on conns_per_peer mismatch 64/48664/3
Serguei Smirnov [Mon, 26 Sep 2022 23:47:24 +0000 (16:47 -0700)]
LU-16191 socklnd: limit retries on conns_per_peer mismatch

If connection initiator has a higher conns-per-peer setting than
its peer, don't try to create extra connections forever as the
peer will keep rejecting them. A few retries should suffice to
resolve a valid race.

Test-Parameters: trivial
Fixes: 71b2476e ("LU-12815 socklnd: add conns_per_peer parameter")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I7d04d4ac41e98a738b6c85c3d323608038f5c51e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48664
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15791 tests: Drop local traffic during health test 61/48661/2
Chris Horn [Mon, 26 Sep 2022 15:19:19 +0000 (09:19 -0600)]
LU-15791 tests: Drop local traffic during health test

Existing drop rules for health tests omit local nids for the
destination so it is possible for local NI health values to recover
while the tests execute. Add drop rules for local NIDs to prevent
their health from recovering.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=205,ONLY_REPEAT=100
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6a4a06b3fa76effd21e21449abf47cd0e14bbf18
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48661
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16051 o2iblnd: detect link state to set fatal error on ni 44/48644/3
Serguei Smirnov [Fri, 23 Sep 2022 22:20:51 +0000 (15:20 -0700)]
LU-16051 o2iblnd: detect link state to set fatal error on ni

To avoid selecting lnet ni which corresponds to a downed link
for sending, add a mechanism for detecting ip-layer link events
in o2iblnd. On ip link up/down events, find corresponding
ni and toggle ni_fatal_error_on flag. This complements the
existing mechanism for ib-layer link event handling.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I4720cd0a7bc577a522c7d40b54f821a4c12b670f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48644
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16184 o2iblnd: fix deadline for tx on peer queue 40/48640/2
Serguei Smirnov [Fri, 23 Sep 2022 19:29:59 +0000 (12:29 -0700)]
LU-16184 o2iblnd: fix deadline for tx on peer queue

In o2iblnd, deadline is checked for txs on peer queue,
but not set prior to adding the tx to the queue. This
may cause the tx to be dropped unnecessarily with
"Timed out tx for ..." warning.

Fix it by setting the tx_deadline when adding tx to peer queue.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ie7cf5590b440b60f71527049953a64bb31d53578
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48640
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15595 tests: Router test interop check and aarch fix 78/48578/9
Chris Horn [Wed, 14 Sep 2022 01:23:37 +0000 (20:23 -0500)]
LU-15595 tests: Router test interop check and aarch fix

setup_router_test() executes load_lnet() on remote nodes, but
this function was only added in 2.15. Add a version check for it.

Enabling routing may fail on nodes with small amount of memory (like
aarch config). Define small number of router buffers to work around
this issue. Modify the functions which calculate the number of buffers
to allow small sizes to be specified via parameters.

Test-Parameters: trivial testlist=sanity-lnet serverversion=2.12.9
Test-Parameters: testgroup=review-ldiskfs-arm testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: If0b76747fe09e883546f18da9f3322c72263e29d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48578
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-13641 socklnd: remove remnants of tcp bonding 68/48568/3
Mr NeilBrown [Thu, 15 Sep 2022 05:32:05 +0000 (15:32 +1000)]
LU-13641 socklnd: remove remnants of tcp bonding

->ksnp_n_passive_ips is now always zero, so remove it and all uses of
it.  ->ksnp_passive_ips is gone too, as is ksocknal_ip2iface().

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I5de6d027c545087c961673d8704f68c4f3dd5076
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48568
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16150 zfs: Fix ZFS(2.1.99-1) build error on CentOS (3.10) 36/48536/5
Arshad Hussain [Tue, 13 Sep 2022 07:31:25 +0000 (03:31 -0400)]
LU-16150 zfs: Fix ZFS(2.1.99-1) build error on CentOS (3.10)

ZFS: (2.1.99-1)
Lustre: 27723374a38 LU-16073 utils: double snapshot_mount fix
CentOS: 3.10.0-1160.15.2.el7.x86_64

This patch fixes build failures seens as below for the
above configuration:

First:
make[4]: Entering directory `/root/lustre01/lustre-release/lustre/utils'
gcc  -rdynamic -shared -export-dynamic -pthread \
-L/root/zfs/zfs_git_lustre_build/zfs//lib/libzfs/.libs/
-L/root/zfs/zfs_git_lustre_build/zfs//lib/libnvpair/.libs/
-L/root/zfs/zfs_git_lustre_build/zfs//lib/libzpool/.libs/ -o
mount_osd_zfs.so \
`ar -t libmount_utils_zfs.a` \
-ldl   -lzfs -lnvpair -lzpool
/usr/bin/ld: cannot find -lzfs
/usr/bin/ld: cannot find -lnvpair
/usr/bin/ld: cannot find -lzpool
collect2: error: ld returned 1 exit status

Test-Parameters: trivial fstype=zfs
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I32f270c7912379f7dce940e0aa2bceee5e49ad79
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48536
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE 92/48492/2
Serguei Smirnov [Thu, 8 Sep 2022 22:27:12 +0000 (15:27 -0700)]
LU-15885 o2iblnd: fix handling of RDMA_CM_EVENT_UNREACHABLE

RDMA_CM_EVENT_UNREACHABLE may be received not only when connection
is being connected, but also when it is being closed. Fix handing
of this event accordingly.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I79428188c159b2d80d36326589b2977db065d4a7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48492
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15646 llog: correct llog FID and path output 30/48430/6
Mikhail Pershin [Sat, 3 Sep 2022 07:31:38 +0000 (10:31 +0300)]
LU-15646 llog: correct llog FID and path output

- fix wrong LLOG_ID-to-FID convertion to output llog FID by
  introducing PLOGID macro to expand llog ID for DFID format
- stop printing lgl_ogen along with llog FID as it always zero
  since 2.3.51 and is not used anymore
- output correct path for update llog in llog_reader
- always print header info in llog_reader if available
- print llog flags in header info

Fixes: 5a8e47d0a1a7 ("LU-9153 llog: update llog print format to use FIDs")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I7ba49e8101a67d2d80c204a5fc629bfd0bce89ad
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48430
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15738 test: check lfsck status before starting 18/48018/3
Hongchao Zhang [Fri, 22 Jul 2022 15:02:24 +0000 (23:02 +0800)]
LU-15738 test: check lfsck status before starting

If the LFSCK has been started before calling "lfsck_start"
to start it, the test shouldn't fail for starting LFSCK.

Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I266d9e2b9c5f37eb9e08b489fab428268b90d895
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48018
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-15472 ldlm: optimize flock reprocess 57/46257/7
Andriy Skulysh [Fri, 5 Nov 2021 10:55:08 +0000 (12:55 +0200)]
LU-15472 ldlm: optimize flock reprocess

Resource reprocess on flock unlock can be done once
after all pending unlock requests.
It allows to reduce spinlock contention.

Change-Id: I2809070f27fe3af7e1fc34e2b4b22603931f3dff
HPE-bug-id: LUS-10471, LUS-10909
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46257
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10391 lnet: use %pISc for formatting IP addresses 85/48685/2
Mr NeilBrown [Wed, 28 Sep 2022 04:41:47 +0000 (14:41 +1000)]
LU-10391 lnet: use %pISc for formatting IP addresses

The Linux kernel's printf functionality understands %pIS to means that
a the address in a 'struct sockaddr' should be formated, either as
IPv4 or IPv6.  For IPv6, the verbose format showing all 16 bytes
whether zero or not is used.

To get the more familiar "compressed" format where strings of :0000:
are replaced with ::, we need to add the 'c' flag.  This is ignored
for IPv4.

When requesting the port as well ("%pISp), the 'c' and 'p' can appear
in either order.

So this patch changes all %pIS to %pISc as we always want the
compressed format.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ida17f5008e06a00c5460cf7161ed07de8fa7a65d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48685
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16138 kernel: preserve RHEL8.x server kABI for block integrity 08/48608/2
Jian Yu [Tue, 20 Sep 2022 18:19:12 +0000 (11:19 -0700)]
LU-16138 kernel: preserve RHEL8.x server kABI for block integrity

Currently there are two kernel patches supporting SCSI T10-PI feature
left in the RHEL8.x series:

- block-integrity-allow-optional-integrity-functions-rhel8.patch
- block-pass-bio-into-integrity_processing_fn-rhel8.patch

The changes in the patches modified "struct bio_integrity_payload"
and "struct blk_integrity_iter", which caused kABI breakage.

This patch fixes the patches to preserve kABI by using
RH-supplied compatibility macros.

Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.5 serverdistro=el8.5
Test-Parameters: trivial fstype=ldiskfs clientdistro=el8.6 serverdistro=el8.6

Change-Id: If547e1cd4ae4ff1affd315bbfefaeeff4f1dea81
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48608
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-9680 obdclass: user netlink to collect devices information 18/31618/80
James Simmons [Sat, 17 Sep 2022 20:19:48 +0000 (16:19 -0400)]
LU-9680 obdclass: user netlink to collect devices information

Our utilities can report to users a device list with various bits
of data using the debugfs file 'devices'. This debugfs file is
only by default available to root which prevents regular users
from collecting information. Enable non-root users to collect
the same information for lctl dl using netlink. The advantage of
using netlink is that it also removes the 8K ioctl limit. Add the
ability to present this data in YAML format as well.

Change-Id: I5e6378765bd2f4c415cf29b2bc54adf0e54f308b
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/31618
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16166 ptlrpc: lower the message level in no resend case 85/48585/2
Yang Sheng [Mon, 19 Sep 2022 05:46:27 +0000 (13:46 +0800)]
LU-16166 ptlrpc: lower the message level in no resend case

Don't report the wrong generation as a error message in
rq_no_resend case.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I534cadc916fcd1eb6840439b6507e646d0e5d974
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48585
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15943 tests: Modify timing of sanity-lnet 210 and 211 80/48580/4
Chris Horn [Wed, 14 Sep 2022 00:47:58 +0000 (19:47 -0500)]
LU-15943 tests: Modify timing of sanity-lnet 210 and 211

The portions of test_210 and test_211 that test the
max_recovery_ping_interval parameter are a little racy because the
window where we can get an accurate ping count is small. This is due
to the tests only being able to sleep for whole seconds vs the more
fine-grained time keeping done in the kernel.

Increase the max interval from 2 to 4 and adjust the expected
ping counts accordingly.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=210,ONLY_REPEAT=100
Test-Parameters: testlist=sanity-lnet env=ONLY=211,ONLY_REPEAT=100
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Idf8b2ff0d5745bdf4484e75f452bc4f06fbcf1a4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48580
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-16161 kernel: kernel update RHEL8.6 [4.18.0-372.26.1.el8_6] 64/48564/2
Jian Yu [Thu, 15 Sep 2022 18:43:02 +0000 (11:43 -0700)]
LU-16161 kernel: kernel update RHEL8.6 [4.18.0-372.26.1.el8_6]

Update RHEL8.6 kernel to 4.18.0-372.26.1.el8_6.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.6 serverdistro=el8.6 testlist=sanity

Change-Id: I45bf6dbff5061407e1109732b6d466d0f7a8376c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48564
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16144 nrs: implement force mode for nrs_tbf_req_get() 94/48494/5
Etienne AUJAMES [Fri, 9 Sep 2022 06:52:02 +0000 (08:52 +0200)]
LU-16144 nrs: implement force mode for nrs_tbf_req_get()

ptlrpc_service_purge_all() calls ptlrpc_server_request_get() with
"force=true" to purge all active requests before stopping an NRS
policy (when unregistering a service).

"force" mode should always return a request if a pending request is
present in the NRS policy.

nrs_tbf_req_get() does not implement such a mode and can return a
NULL pointer.
This can cause a crash when umounting a target if a TBF rule rate
threshold is reached:

BUG: unable to handle kernel NULL pointer dereference at
0000000000000114
IP: [<ffffffffc0d9e965>] ptlrpc_nrs_req_stop_nolock+0x5/0x150
.....
? ptlrpc_server_finish_active_request+0x2b/0x140 [ptlrpc]
ptlrpc_service_purge_all+0x137/0x920 [ptlrpc]
ptlrpc_unregister_service+0xe7/0x6f0 [ptlrpc]
ost_cleanup+0x52/0x1b0 [ost]
class_free_dev+0x21d/0x720 [obdclass]
class_export_put+0x1f0/0x2c0 [obdclass]
class_unlink_export+0x135/0x170 [obdclass]
class_decref+0x80/0x160 [obdclass]
class_detach+0x1b3/0x2e0 [obdclass]
class_process_config+0x1a38/0x2830 [obdclass]
? complete+0x4a/0x60
? list_del+0xd/0x30
? wait_for_completion+0x4e/0x140
class_manual_cleanup+0x1e0/0x710 [obdclass]
server_stop_servers+0xd5/0x160 [obdclass]
server_put_super+0x12d/0xd00 [obdclass]
generic_shutdown_super+0x6d/0x100

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: Ic4443700725d9308764fbf21cb7de6fa4ab41134
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48494
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16072 utils: snapshot support to foreign host 26/48226/8
Akash B [Tue, 24 May 2022 05:49:41 +0000 (01:49 -0400)]
LU-16072 utils: snapshot support to foreign host

Currently <foreign> host field in /etc/ldev.conf is unused/ignored,
due to this <lctl snapshot_*> commands do not work when <local>
host is not accessible or if any of the targets are failed over to
<foreign> host. This patch addresses those cases where
<lctl snapshot_{create, destroy, mount, umount, list, modify}>
commands work when the targets are present in <foreign> host.

HPE-bug-id: LUS-10648
Test-Parameters: fstype=zfs testlist=sanity-lsnapshot
Signed-off-by: Akash B <akash-b@hpe.com>
Change-Id: I706c5e43755386eab4facd42ff7a127aa5c9254c
Reviewed-on: https://es-gerrit.dev.cray.com/160702
Tested-by: Alexander Lezhoev <alexander.lezhoev@hpe.com>
Tested-by: Siddarth Raj <siddarth.raj@hpe.com>
Reviewed-by: Dipak Ghosh <dipak.ghosh@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48226
Reviewed-by: Alexander <alexander.boyko@hpe.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16059 build: Installation of dkms server builds 83/48083/7
Shaun Tancheff [Wed, 24 Aug 2022 14:22:58 +0000 (21:22 +0700)]
LU-16059 build: Installation of dkms server builds

The linux-zfs-dkms package is passing the wrong paths
for zfs [and spl] causing the dkms build to fail.

ZFS_VERSION is not parsed correctly from 'dkms status'.

The splver and zfsver check can match against the wrong
package(s).

lustre-zfs-dkms provides: kmod-lustre-osd-zfs, and
                          lustre-osd-zfs-mount
lustre-ldiskfs-dkms provides: kmod-lustre-osd-ldiskfs and
                              lustre-osd-ldiskfs-mount

In the case of multiple zfs versions installed, build lustre
osd against the highest version number.

HPE-bug-id: LUS-11113
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ic154ca045427bf26cb7e6a44b8c467675e987aad
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48083
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16125 tests: make sanity-sec more robust with SSK 86/48386/4
Sebastien Buisson [Tue, 30 Aug 2022 09:22:34 +0000 (11:22 +0200)]
LU-16125 tests: make sanity-sec more robust with SSK

Encryption related tests in sanity-sec carry out unmount and mount of
clients in order to exercise code with and without the encryption key.
In case SSK is in use, we need to make sure flavors are properly
applied before carrying on.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I92e85dc6dcef43f70a7fe05db94cd18fe66a3a24
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48386
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15777 hsm: set changelog error for restore layout swap failure 21/47121/14
Nikitas Angelinas [Wed, 11 May 2022 22:54:08 +0000 (15:54 -0700)]
LU-15777 hsm: set changelog error for restore layout swap failure

Set the error code in the changelog record generated, if the layout swap
fails at the end of an HSM restore operation. Also, handle error code
overflow inside hsm_set_cl_error(), so that callers don't need to do
this themselves.

Suggested-by: Olaf Weber <olaf.weber@hpe.com>
Suggested-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I4ed2ebffa3bc1c6a0f87ea9f13734e344f77006f
HPE-bug-id: LUS-10863
Test-Parameters: testlist=sanity-hsm,sanity-pcc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47121
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15626 tests: Fix "error" reported by shellcheck for functions.sh 34/46834/2
Arshad Hussain [Wed, 16 Mar 2022 08:04:10 +0000 (13:34 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for functions.sh

This patch fixes "error" issues reported by shellcheck
for functions.sh. This patch also moves spaces to tabs.

Test-Parameters: trivial
Test-Parameters: testlist=sanity,sanityn
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iec24ca81b16994c3bfbdc38d8106576a315e0bbd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46834
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15619 osc: Remove oap_magic 13/46713/5
Patrick Farrell [Wed, 2 Mar 2022 00:14:03 +0000 (19:14 -0500)]
LU-15619 osc: Remove oap_magic

oap_magic exists only to debug init and allocation
failures, but is allocated for every page of memory, which
wastes a lot of memory for something we don't need
dedicated debug for.

Remove it.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I360e09676f7ba8c3e5296bdf75a6e7f75e91eadb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46713
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-14108 mount: prevent if --network and discovery 32/46632/6
Cyril Bordage [Fri, 7 Jan 2022 10:08:21 +0000 (11:08 +0100)]
LU-14108 mount: prevent if --network and discovery

The --network= option to mkfs.lustre allows restricting a target
(OST/MDT) to a given LNet network. This makes it register to the MGS
with the specified network only. However, dynamic discovery is unaware
of this restriction and this can create problems.
We prevent mounting with mkfs "network" option if discovery is enabled
by returning an EINVAL error.

Test-Parameters: trivial
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I4b6da7804162192054d7b29a28fbe4cb015e6570
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46632
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: Various test cleanups 15/45915/5
Amir Shehata [Wed, 22 Dec 2021 05:42:32 +0000 (21:42 -0800)]
LU-10973 lnet: Various test cleanups

Cleaning up some of the LUTF test failures

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I529d3f171357255d04991293a5df4c7b41622d07
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF UDSP test suite and routing test suite 77/39777/45
Serguei Smirnov [Mon, 31 Aug 2020 22:35:52 +0000 (18:35 -0400)]
LU-10973 lnet: LUTF UDSP test suite and routing test suite

Added the UDSP suite and routing suite to the LUTF test cases.

Updated some of the infrastructure scripts with methods needed
for the new test cases.

Test-Parameters: @lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Ibd74cea48982ccafc3b1d5034a409fd2df9e7b1c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39777
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF Multi-Rail test suite 58/39458/54
Amir Shehata [Mon, 20 Jul 2020 21:04:32 +0000 (14:04 -0700)]
LU-10973 lnet: LUTF Multi-Rail test suite

Added a test suite which covers various Multi-Rail functionality.

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0480e59ebd97c943669194acbb1c80222e202a6e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF dynamic discovery test suite 95/39195/58
Amir Shehata [Sat, 27 Jun 2020 04:15:04 +0000 (21:15 -0700)]
LU-10973 lnet: LUTF dynamic discovery test suite

Add the dynamic discovery test suite to the LUTF test cases.

Updated some of the infrastructure scripts with methods needed
for the DD test cases

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I0cfef4ae6f88b4deca12f1a3d5ef3291137a6c04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39195
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10360 tests: test dynamic NIDs feature 11/39911/42
Amir Shehata [Tue, 15 Sep 2020 01:18:47 +0000 (18:18 -0700)]
LU-10360 tests: test dynamic NIDs feature

Add five LUTF test cases to test the following:
1. Enabling/Disabling dynamic_nids module parameter.
2. Allow clients to continue using servers which have changed
   their IP address during a boot cycle.
3. Verify feature is disabled if dynamic_nids module parameter
   is not set.
4. Verify feature is disabled if the dynamic_nids module parameter
   is asymmetrically set.

Test-Parameters: @lnet
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I481c2ae938d07398f6b40af2a1a1db039168ede7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39911
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10973 lnet: LUTF DLC test suite and sample test suite 08/40108/40
Serguei Smirnov [Wed, 30 Sep 2020 22:52:39 +0000 (18:52 -0400)]
LU-10973 lnet: LUTF DLC test suite and sample test suite

Add the DLC test suite and sample test suite to LUTF test cases.

Test-Parameters: @lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic7579023cfaf796fd40d6e12434137fb3ec5b0e4
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40108
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-10391 lnet: only use PUBLIC IP6 addresses for connections 71/48571/3
Mr NeilBrown [Fri, 16 Sep 2022 00:49:51 +0000 (10:49 +1000)]
LU-10391 lnet: only use PUBLIC IP6 addresses for connections

IPv6 can have temporary address.  These can be used for short-lives
outgoing connections to increase privacy.  They are not suitable for
long-term connections.

So request that only PUBLIC IPv6 addresses are used when making a
connection.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1414d9ea11cd5873438a4c088884cefd7d933c8c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48571
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
18 months agoLU-10391 socklnd: support IPv6 in ksocknal_ip2index() 70/48570/2
Mr NeilBrown [Thu, 15 Sep 2022 05:09:59 +0000 (15:09 +1000)]
LU-10391 socklnd: support IPv6 in ksocknal_ip2index()

ksocknal_ip2index() can now find the interface index for an IPv6
address.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Idd6bee5c9db417b05f8208ab5ab309f4c8404d54
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48570
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-10391 lnet: add iface index to struct lnet_inetdev 69/48569/2
Mr NeilBrown [Thu, 15 Sep 2022 01:47:55 +0000 (11:47 +1000)]
LU-10391 lnet: add iface index to struct lnet_inetdev

When getting list of interfaces, get the index as well, as this can be
useful and avoid search the list of interfaces again to find it.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9b3b2516fd4ec1b83e2ec31e1318326ed22cb31b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48569
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-12511 utils: make kfilnd support a soft requirement 18/48518/5
James Simmons [Sat, 17 Sep 2022 15:45:12 +0000 (11:45 -0400)]
LU-12511 utils: make kfilnd support a soft requirement

The new kfilnd driver doesn't exist upstream and looks like it
will be missing upstream for sometime. Make building the code
for this new LND optional which is needed for the native Linux
Lustre client.

Test-Parameters: trivial
Change-Id: Ib17f78b12ffed95e4198d4524f5ca44aab01c010
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48518
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-10391 lnet: track pinginfo size in bytes, not nis. 27/44627/17
Mr NeilBrown [Sun, 14 Aug 2022 21:37:23 +0000 (17:37 -0400)]
LU-10391 lnet: track pinginfo size in bytes, not nis.

When we extend the pinginfo to be able to store large-address nids,
there could be nids of different sizes in it.  So using the number of
nis to track the size won't work.  So change to using the number of
bytes.  i.e.  the total size of the 'struct lnet_ping_info'.

This affects pb_nnis in the ping_buffer, and the global
ln_push_target_nnis.

LNET_PING_INFO_SIZE is removed as size won't depend on number of nids
any more.

When determining the number of bytes expected in a received ping_info,
use a new macro lnet_ping_info_size() which can extract information
as required from the ping_info.

Note that lnet_ping_target_create() now initializes pi_nis to 0.
Setting the initial size doesn't seem to be useful.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7727b784ed9a7510959d5ec41f8df3851adb78ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16135 lod: prohibit DoM pattern in plain layout 33/48433/3
Mikhail Pershin [Mon, 5 Sep 2022 07:41:37 +0000 (10:41 +0300)]
LU-16135 lod: prohibit DoM pattern in plain layout

DoM pattern can be set as default directory plain layout by
older LFS version. It misses DoM component sanity checks if
plain layout is used. Such layout is not allowed and causes
later crashed when file is created under that directory.

While LFS can prevent this but not in all Lustre versions,
so LOD should do the check as well

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ic58fdda2ab3e63083128cb6cf949fcb43ccd2c02
Reviewed-on: https://review.whamcloud.com/48433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16160 osc: take ldlm lock when queue sync pages 57/48557/2
Bobi Jam [Thu, 15 Sep 2022 06:46:34 +0000 (14:46 +0800)]
LU-16160 osc: take ldlm lock when queue sync pages

osc_queue_sync_pages() add osc_extent to osc_object's IO extent
list without taking ldlm locks, and then it calls
osc_io_unplug_async() to queue the IO work for the client.

This patch make sync page queuing take ldlm lock in the
osc_extent.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Idefa2981e62a2a6e10d8b8a7692c0337b61b9052
Reviewed-on: https://review.whamcloud.com/48557
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16123 checkpatch: Suppress false warning 75/48375/2
Arshad Hussain [Mon, 29 Aug 2022 10:51:45 +0000 (16:21 +0530)]
LU-16123 checkpatch: Suppress false warning

checkpatch throws a warning if it finds an "UPPERCASE"
on the left and side. According to the script/code it
is to avoid cases like "foo + BAR < baz".

Warnings example:
(style)  Comparisons should place the constant on \
the right side of the test

However for our case which throws a warning as false
positive.

"#if LUSTRE_VERSION_CODE < OBD_OCD_VERSION(3, 0, 53, 0)
...
"#endif

This patch suppresses the warning thrown by above
code only. This is not a generic "left hand" upper-case
warning suppressor which can be a genuine error. This
only handles the case where the left side is
LUSTRE_VERSION_CODE upper-case macro.

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic8d8fccae035ba6e2ea28099bea6f163ceb0da0a
Reviewed-on: https://review.whamcloud.com/48375
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16154 obdclass: free inst_name correctly 42/48542/4
Emoly Liu [Thu, 15 Sep 2022 01:42:47 +0000 (09:42 +0800)]
LU-16154 obdclass: free inst_name correctly

In functon class_config_llog_handler(), inst_name should be freed
correctly before break.

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: I6adc0ed62c3c637237834b799f25666d0e7e1ecb
Reviewed-on: https://review.whamcloud.com/48542
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16153 tests: add version check to conf-sanity.sh test_133 41/48541/2
Emoly Liu [Wed, 14 Sep 2022 02:38:03 +0000 (10:38 +0800)]
LU-16153 tests: add version check to conf-sanity.sh test_133

conf-sanity.sh test_133 from the patch at
https://review.whamcloud.com/38136 has been landed since 2.15.51.
To avoid any interop failure, a version check is added there.

Test-Parameters: trivial
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Ic5c142faa6f61fe83ce86e67a7cee8d8b183cdaf
Reviewed-on: https://review.whamcloud.com/48541
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-14992 tests: sanity/replay-vbr mkdir on MDT0 02/44902/9
James Nunez [Mon, 13 Sep 2021 16:35:30 +0000 (10:35 -0600)]
LU-14992 tests: sanity/replay-vbr mkdir on MDT0

Replace mkdir with mkdir_on_mdt0() for sanity test 133a
and relay-vbr test 7a.  These tests expect the newly
created directory is on MDT0.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: env=SLOW=yes mdscount=2 mdtcount=4 testlist=replay-vbr
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Icea2923a8d8d3a3aa0ddf0401f0a025480b2f6f0
Reviewed-on: https://review.whamcloud.com/44902
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Kevin Zhao <kevin.zhao@linaro.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC 80/48080/2
Li Dongyang [Fri, 29 Jul 2022 06:35:41 +0000 (16:35 +1000)]
LU-16057 obdclass: set OBD_MD_FLGROUP for ladvise RPC

ladvise RPC doesn't have OBD_MD_FLGROUP set, when RPC
reaches server, tgt_validate_obdo() will corrupt the FID
if it's seq is in FID_SEQ_NORMAL range.

Do not mess with seq in obdo_to_ioobj() and tgt_validate_obdo(),
since 2.0 all RPCs should have OBD_MD_FLGROUP set.

Add OBD_MD_FLGROUP for ladvise RPC to fix new client talking
to old servers.

Change-Id: I373b7f32458b18e29d9bb716a912fe4a54eccac5
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/48080
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs() 39/47839/14
Lei Feng [Thu, 30 Jun 2022 02:46:31 +0000 (10:46 +0800)]
LU-15986 ptlrpc: protect rq_repmsg in ptlrpc_req_drop_rs()

There is a race condition that: on server side, one thread sent
reply message and is deleting the reply message, another is
searching for existing request and print some debug information
in _debug_req() if there is a duplicated request. They both operate on
req->rq_repmsg but it is not protected in ptlrpc_req_drop_rs().
So we protected it with req->rq_early_free_lock.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Ied55427ee15c3ef84bdd2d579844eba398dbf010
Reviewed-on: https://review.whamcloud.com/47839
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoNew tag 2.15.52 2.15.52 v2_15_52
Oleg Drokin [Sat, 17 Sep 2022 06:27:08 +0000 (02:27 -0400)]
New tag 2.15.52

Change-Id: I7425fd5ea8f382a10ea2574933257fcd41407fa2
Signed-off-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16145 lnet: Honor peer timeout of zero 89/48489/4
Chris Horn [Fri, 2 Sep 2022 16:47:02 +0000 (11:47 -0500)]
LU-16145 lnet: Honor peer timeout of zero

Zero is a valid value for the peer_timeout parameter (it is supposed
to disable the LNet Peer Health feature used on routers), but DLC
treats zero as uninitialized and assigns the default peer timeout
instead.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-11233
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I66f45ddf282757f46c0169ae0e725e56234d3d89
Reviewed-on: https://review.whamcloud.com/48489
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16131 build: Do not depend on libmount during --enable-dist 07/48407/2
Shaun Tancheff [Thu, 1 Sep 2022 14:46:16 +0000 (21:46 +0700)]
LU-16131 build: Do not depend on libmount during --enable-dist

Defer the libmount requirement when using --enable-dist to
generate the lustre-src.rpm.

This allows mock and/or yum build-deps to resolve resolve
dependencies and pickup the libmount requirement without changing
the existing minimal build.

Test-Parameters: trivial
HPE-bug-id: LUS-11091
Fixes: f21b944127 ("LU-15940 build: add a required dependency for libmount")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I20a7a097f9b651b6ea5519f79efda6c96b6f2199
Reviewed-on: https://review.whamcloud.com/48407
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16089 kernel: kernel update RHEL 7.9 [3.10.0-1160.76.1.el7] 02/48202/3
Jian Yu [Fri, 12 Aug 2022 01:29:05 +0000 (18:29 -0700)]
LU-16089 kernel: kernel update RHEL 7.9 [3.10.0-1160.76.1.el7]

Update RHEL 7.9 kernel to 3.10.0-1160.76.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I97d087a5d5bb27996a5c0caf382c011928c651b4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/48202
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16029 utils: add options to lr_reader to parse raw files 88/47988/8
Etienne AUJAMES [Tue, 19 Jul 2022 20:21:52 +0000 (22:21 +0200)]
LU-16029 utils: add options to lr_reader to parse raw files

Add the following usages to lr_reader for post-mortem debuging:

debugfs -c -R "dump reply_data /tmp/reply_data" /dev/mapper/mds1
debugfs -c -R "dump last_rcvd /tmp/last_rcvd" /dev/mapper/mds1

lr_reader -cr -C /tmp/last_rcvd -R /tmp/reply_data
....

This patch attempts to re-refactoring lr_reader code.

It enable to use longer device name (by removing the limitation on
the 128 bytes buffer of debugfs command).

Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I6a5f945134d4235ac467ba2274eb05f71b468cd8
Reviewed-on: https://review.whamcloud.com/47988
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: DELBARY Gael <gael.delbary@cea.fr>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
19 months agoLU-16106 lnet: allow direct messages regardless of peer NI status 55/48355/5
Serguei Smirnov [Sun, 28 Aug 2022 01:50:16 +0000 (18:50 -0700)]
LU-16106 lnet: allow direct messages regardless of peer NI status

If check_routers_before_use is enabled, the router needs to
be pinged before it is used, which is not possible because
its NIs are assumed to be down at start-up. Don't prevent
discovery of the router in this case.

This change allows non-routed traffic to peer NIs with "down"
status.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I36fa60e37ef4f47c82c69855c9b0b80bad8a36f4
Reviewed-on: https://review.whamcloud.com/48355
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>