Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-15097 quota: stop pool_recalc before killing pool 56/45256/8
Sergey Cheremencev [Mon, 20 Sep 2021 12:08:20 +0000 (15:08 +0300)]
LU-15097 quota: stop pool_recalc before killing pool

qmt_start_pool_recalc holds a refrence on a pool while
it is running. This thread should be stopped before
putting the last pool reference in qmt_pool_free to be
sure that pool can finally freed. Patch helps to avoid
following ASSERTION:
qmt_pool_fini()) ASSERTION( list_empty(&qmt->qmt_pool_list) ) failed

HPE-bug-id: LUS-10294
Change-Id: If72042a620d9ded693fcb669bc9148d1f96126a4
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45256
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15031 quota: reseed glbe in qmt_lvbo_udate 32/45032/13
Sergey Cheremencev [Tue, 25 May 2021 22:44:48 +0000 (01:44 +0300)]
LU-15031 quota: reseed glbe in qmt_lvbo_udate

Reseed glbe array in qmt_lvbo_update after changing edquot.
Without a fix edquot flag wasn't set in glbe array. Later,
when edquot was cleared, need_update(nu) flag wasn't set
in glbe array to notify OSTs with a new edquot.

The patch also adds test 80 to check that OST gets correct
edquot value after failover.

HPE-bug-id: LUS-10029
Change-Id: I5b7e1a553e3351c22649431860d51b5a671c6fd9
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45032
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15018 o2iblnd: treat cmid->device == NULL as an error 81/44981/4
Serguei Smirnov [Fri, 17 Sep 2021 21:06:26 +0000 (14:06 -0700)]
LU-15018 o2iblnd: treat cmid->device == NULL as an error

Even if rdma_bind_addr is successful, kiblnd_dev_failover should
treat cmid->device == NULL as an error in order to later avoid
calling kiblnd_set_ni_fatal_on with possibly dev->ibd_hdev == NULL.

Test-Parameters: trivial
Fixes: 4668283cd1 ("LU-14806 o2iblnd: clear fatal error on successful failover")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Iefbe030b25d2dc543461cf98afeacd734fd64cf8
Reviewed-on: https://review.whamcloud.com/44981
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14707 tests: Bashify scripts for Ubuntu et. al. 86/43786/8
Shaun Tancheff [Fri, 10 Dec 2021 17:45:46 +0000 (12:45 -0500)]
LU-14707 tests: Bashify scripts for Ubuntu et. al.

Some scripts use bash-isms that are not present in
bourne (sh) or Ubuntu's default dash shell.

Be explicit and prefer bash

HPE-bug-id: LUS-8398
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I02f742e9787e1811b422b619e00911ee52673262
Reviewed-on: https://review.whamcloud.com/43786
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14542 obd: tunable for sanity grant check 28/42128/7
Vladimir Saveliev [Mon, 13 Dec 2021 15:11:58 +0000 (18:11 +0300)]
LU-14542 obd: tunable for sanity grant check

Control on sanity grant check via lctl set_param
*.*.grant_check_threshold is added.  0 is to unconditionally turn
grant checking on.
By default, as before, grant check gets turned off when number of
exports is more than 100.

Change-Id: Ib2505da74f6e3d541bce5def3e90597eda232c58
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
HPE-bug-id: LUS-9827
Reviewed-on: https://review.whamcloud.com/42128
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14195 ldiskfs: update patches for Linux 5.10 13/40913/10
Mr NeilBrown [Tue, 14 Dec 2021 17:43:45 +0000 (12:43 -0500)]
LU-14195 ldiskfs: update patches for Linux 5.10

Mostly simple conflicts due to code movement, however:

ext4-data-in-dirent.patch now needs to patch fs/ext4/fast-commit.c as
well as ext4_init_new_dir() is used in that file. Since fast commit
can break recovery we prevent mounting with this option.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I59b10fdb6bb606b193472e3045ab7d9b1d0d36b5
Reviewed-on: https://review.whamcloud.com/40913
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 lov: Cache stripe offset calculation 45/39445/25
Patrick Farrell [Thu, 10 Jun 2021 17:23:19 +0000 (13:23 -0400)]
LU-13799 lov: Cache stripe offset calculation

Calculating the page offset relative to the stripe (etc)
in a file is surprisingly expensive.  Because i/o has
already been split up to stripes by the cl_io code,
calculating the stripe each time is unnecessary.

We cache most of the values requiring calculation.

This improves AIO/DIO page submission significantly,
improving performance by a bit over 10%.

Also remove lpg_generation, which isn't doing anything
useful.  This suggests the possibility of removing
lov_page, but that's for another patch.

This patch reduces i/o time in ms/GiB by:
Write: 17 ms/GiB
Read: 22 ms/GiB

Totals:
Write: 119 ms/GiB
Read: 121 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write        7531 MiB/s
read         7179 MiB/s

Plus this patch:
write        8637 MiB/s
read         8488 MiB/s

Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Change-Id: I89e994592853d0fe93a034bfe8bdfb459bdaf584
Reviewed-on: https://review.whamcloud.com/39445
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11558 tests: re-enable conf-sanity t32_verify_quota() 23/33423/11
Andreas Dilger [Mon, 22 Oct 2018 23:19:57 +0000 (07:19 +0800)]
LU-11558 tests: re-enable conf-sanity t32_verify_quota()

Since patch https://review.whamcloud.com/28020 "LU-3285 test: add
Data-on-MDT tests and fixes" landed the call to t32_verify_quota()
from conf-sanity.sh t32_test() has been removed, and (I'd guess)
that we are no longer verifying that quota is still working
correctly after an upgrade.

It is unclear why this check was removed. Return it back to the
code so that we resume testing that quota is working after an
upgrade from an old disk format.

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: fstype=zfs testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia6bab4907e7cb4e6e5581a6a072cc51ab53ebbe5
Reviewed-on: https://review.whamcloud.com/33423
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9272 tests: fix for facet_failover mgs 35/26235/27
Elena Gryaznova [Thu, 11 Oct 2018 14:20:32 +0000 (17:20 +0300)]
LU-9272 tests: fix for facet_failover mgs

Have facet_failover to work for mgs facet and include failover
nodes in the list of nodes to load modules.

When mgs/mds are combined, assign the failover host of mds to
the failover host of mgs.

Assign mgsfailover_dev with mds1failover_dev when mgs/mds are combined
while mounting facets, as mds1failover_dev is already defined when
mgs/mds are combined.

Fix start() to export mgs_dev and mgsfailover_dev for
combined_mds_mgs.

Do not wait recovery complete on mgs.

Test-Parameters: trivial failover=true osscount=2 mdscount=2 mdtcount=1 austeroptions=-R iscsi=1 env="ONLY=121" testlist=conf-sanity
Test-Parameters: testlist=conf-sanity
Change-Id: Ie698814c530c8deb98aa0010f2a0fa8e261b4b69
HPE-bug-id: MRP-3374, LUS-4858, LUS-2361
Signed-off-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Alexander Boyko <c17825@cray.com>
Signed-off-by: Noopur Maheshwari <noopur.maheshwari@seagate.com>
Signed-off-by: Vladimir Saveliev <c17830@cray.com>
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/26235
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14651 libcfs: use namespace CRYPTO_INTERNAL 05/45805/3
Jian Yu [Thu, 9 Dec 2021 08:13:23 +0000 (00:13 -0800)]
LU-14651 libcfs: use namespace CRYPTO_INTERNAL

In kernel 5.12 commit 0eb76ba29d16df2951d37c54ca279c4e5630b071,
cipher routines are moved into include/crypto/internal/cipher.h,
and the symbol exports are moved into namespace CRYPTO_INTERNAL.

This patch accommodates the above changes and fixes the following
build errors:
ERROR: modpost: module libcfs uses symbol crypto_cipher_encrypt_one
from namespace CRYPTO_INTERNAL, but does not import it.
ERROR: modpost: module libcfs uses symbol crypto_cipher_setkey
from namespace CRYPTO_INTERNAL, but does not import it.

Test-Parameters: trivial

Change-Id: I908006f81ee632c2d02fe3dd6ac41fdd6296a4b0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45805
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15298 tests: set mdt_hash permanently 93/45693/3
Elena Gryaznova [Sun, 26 Dec 2021 09:53:33 +0000 (12:53 +0300)]
LU-15298 tests: set mdt_hash permanently

On failover setup where <mdtN>_HOST != <mdtNfailover>_HOST
"do_nodes $(comma_list $(mdts_nodes)) lctl set_param" fails:
  set_param: param_path 'lod/*/mdt_hash': No such file or directory
if mdtN facet is active and up on <mdtNfailover>_HOST.
Let's set this parameter permanently.

Fixes: 0a1cf8da80 ("LU-11025 dne: introduce new directory hash type: "crush"")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10601
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: Ie36745cdc5fde4a33387baafe146e06ce8812eb4
Reviewed-on: https://review.whamcloud.com/45693
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10824 llite: make foreign symlinks aware of mount namespaces 09/45609/4
James Simmons [Thu, 18 Nov 2021 16:39:08 +0000 (11:39 -0500)]
LU-10824 llite: make foreign symlinks aware of mount namespaces

Currently the foreign symlink code test if mount namespace is the
same namespace related to the sysfs tree. This doesn't cover all
cases. Linux supports limiting which mounts are visible to a
process with mount namespaces. Lets add this support as well.

Change-Id: Ie87ed45b3c4439e8800c937eb27ed4931989c0f4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45609
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Faccini Bruno <bruno.faccini@intel.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15112 mgc: do not ignore target registration failure 59/45259/15
Alexander Zarochentsev [Wed, 15 Dec 2021 10:26:02 +0000 (13:26 +0300)]
LU-15112 mgc: do not ignore target registration failure

A serious target registation failure with LDD_F_ERROR
flag set is ignored by target, it makes possible
registreting new target with already used index;
Writeconf flag should be encoded in fs label regardless
the "first_time" flag, otherwise target cannot be registered
after initial registration failure.

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: If051199d3dbafc8f8102f3daf086de01bc5c5f98
Reviewed-on: https://review.whamcloud.com/45259
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15112 ptlrpc: make rq_replied flag always correct 71/45871/3
Alexander Zarochentsev [Wed, 15 Dec 2021 12:31:47 +0000 (15:31 +0300)]
LU-15112 ptlrpc: make rq_replied flag always correct

rq_replied flag is cleared at ptl_rpc_send() only,
so state of the flag may be incorrect for rpcs which
are timed out but have have been never sent.

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I0de996a4d775b8f1a1a6b27ff38d21645694f868
Reviewed-on: https://review.whamcloud.com/45871
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15056 nrs: length of a tbf rule should be checked 24/45124/6
Etienne AUJAMES [Mon, 4 Oct 2021 18:42:31 +0000 (20:42 +0200)]
LU-15056 nrs: length of a tbf rule should be checked

The maximum size of a tbf rule name is 16 bytes (MAX_TBF_NAME). This
length is not verify before applying the rule. This causes a buffer
overflow at name copy.

This patch adds a str length verification inside name_is_invalid().
The test sanityn 77p checks if an error is returned when user try to
register a rule with an invalid name.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I93c73083b6e81ab9070a860e702e56b0cb498352
Reviewed-on: https://review.whamcloud.com/45124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13587 quota: protect qpi in proc 87/43987/15
Sergey Cheremencev [Thu, 15 Apr 2021 14:14:51 +0000 (17:14 +0300)]
LU-13587 quota: protect qpi in proc

Access to pool info only when pool is fully inited.
This patch protects from following panic:

[212010.467347] BUG: unable to handle kernel NULL pointer dereference at           (null)
[212010.468205] IP: [<ffffffffc0e55e46>] qpi_state_seq_show+0x86/0xe0 [lquota]
...
[212010.486786] Call Trace:
[212010.487344]  [<ffffffffbbc68b50>] seq_read+0x130/0x440
[212010.487741]  [<ffffffffbbcb8380>] proc_reg_read+0x40/0x80
[212010.488445]  [<ffffffffbbc4118f>] vfs_read+0x9f/0x170
[212010.489056]  [<ffffffffbbc4204f>] SyS_read+0x7f/0xf0
[212010.489920]  [<ffffffffbc176ddb>] system_call_fastpath+0x22/0x27
[212010.490861] Code: 5c a8 01 00 00 41 8b 8c 1c c0 01 00 00 48 c7 c6 18
[212010.493235] RIP  [<ffffffffc0e55e46>] qpi_state_seq_show+0x86/0xe0 [lquota]
[212010.493672] RSP <ffff908505747e28>
[212010.494161] CR2: 0000000000000000

Add test 79 to sanity-quota to check that race between
access to /proc/.../dt-pool_name/info of non-existed pool
with this pool creating doesn't cause a panic.

HPE-bug-id: LUS-9938
Change-Id: I8eff846c6c3881a8431a98efb54e660ecb9155bf
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/43987
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14008 o2iblnd: cleanup 60/40260/8
Alexey Lyashkov [Fri, 7 Aug 2020 11:26:25 +0000 (14:26 +0300)]
LU-14008 o2iblnd: cleanup

simplify kiblnd_send by avoid code duplication.
lets pickup idle tx first.

Test-Parameters: trivial
HPE-bug-id: LUS-1796
Change-Id: Iaf71a9a3aeb3047a086d4cc0a3cf4f1dbe8944b4
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40260
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12056 ldiskfs: add trusted.projid virtual xattr 79/45679/10
Li Dongyang [Tue, 30 Nov 2021 01:13:03 +0000 (12:13 +1100)]
LU-12056 ldiskfs: add trusted.projid virtual xattr

Add trusted.projid virtual xattr in ldiskfs to export the
current project id, intended for ldiskfs level MDT backup.

When the project id is EXT4_DEF_PROJID/0,
the virtual xattr is hidden from listxattr(2).

It's also hidden on lustre client when parent has the
project inherit flag and the same project ID,
to stop mv from setting the virtual xattr on the dest with
the project id from src, which could be different from dest.

getxattr(2) on trusted.projid will report current project id,
setxattr(2) will change curent project id and
removexattr(2) will set project id back to EXT4_DEF_PROJID/0

Both get|setxattr(2) will work even when the virtual xattr is
hidden.

Invalidate client xattr cache for the inode when changing its
project id, so the virtual xattr can get the new value
for next getxattr(2)

Add test cases to verify the virtual projid xattr and backup
restore MDT using tar can now preserve the project id.

Change mds_backup_restore in test framework, to use
tar with --xattrs --xattrs-include='trusted.*'" options.

Change-Id: I29b1aa922ef72d734cdc87125401fa08fb13d4af
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45679
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15219 lfs: migration to DoM layout fix 49/45549/4
Mikhail Pershin [Fri, 12 Nov 2021 16:00:22 +0000 (19:00 +0300)]
LU-15219 lfs: migration to DoM layout fix

Migration to DoM layout from OST-striped file can skip
data sync beyond DoM component if it is not initialized.
Patch forces data copy prior layout merge, so new layout
is initialized and contains needed data

Tests 272e/272f in sanity.sh were modified to migrate data
for both MDT and OST parts

Fixes: 44a721b8c1 ("LU-11421 dom: manual OST-to-DOM migration via mirroring")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I206358e762780ab7cfaa7587888174a31bc7b196
Reviewed-on: https://review.whamcloud.com/45549
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15189 osc: don't have extra nvidia call 81/45481/5
Alexey Lyashkov [Mon, 8 Nov 2021 06:36:08 +0000 (09:36 +0300)]
LU-15189 osc: don't have extra nvidia call

osc don't needs to call nvidia to check an GPU page,
this is in the oap_flags

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I124c328838ad9823361afef33d0732fa4ebbb696
Reviewed-on: https://review.whamcloud.com/45481
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15358 tests: Variable incorrectly defined under sanityn 19/45819/2
Arshad Hussain [Fri, 10 Dec 2021 04:34:26 +0000 (10:04 +0530)]
LU-15358 tests: Variable incorrectly defined under sanityn

Under sanityn.sh/print_jbd_stat() local variable
was incorrectly defined. This was exposed using
shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In lustre/tests/sanityn.sh line 950:
local varcvs
      ^-- SC2034: varcvs appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanityn
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I7b2f62c15e420a4c6f5d71445a2e940816e20098
Reviewed-on: https://review.whamcloud.com/45819
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15360 tests: Use saved value on EXIT/Restore 21/45821/2
Arshad Hussain [Fri, 10 Dec 2021 09:46:54 +0000 (15:16 +0530)]
LU-15360 tests: Use saved value on EXIT/Restore

This was originally reported by shellcheck as
unused variable. However, on closer inspection
it appears that the restore on "EXIT" was
hard-coded to 0 (mostly this should be correct)
instead of using the original value of $old

This patch resets 'enable_chprojid_gid' value
to original value captured in $old instead of
hard-coded value of 0

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In ./lustre/tests/sanity-quota.sh line 4150:
local old=$(do_facet mds1 $LCTL get_param -n \
      ^-- SC2034: old appears unused. Verify it or export it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I31e7a8a931d53a1fcb9d77ecf1759fce572bd52c
Reviewed-on: https://review.whamcloud.com/45821
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15220 utils: fix gcc-11 -Werror=format-truncation= error 15/45815/2
Jian Yu [Thu, 9 Dec 2021 20:00:36 +0000 (12:00 -0800)]
LU-15220 utils: fix gcc-11 -Werror=format-truncation= error

This patch fixes the following -Werror=format-truncation= error in
liblustreapi.c:

liblustreapi.c: In function ‘lov_dump_comp_v1’:
liblustreapi.c:3673:57: error: ‘snprintf’ output may be truncated
before the last format character [-Werror=format-truncation=]
 3673 |                 snprintf(pool_name, LOV_MAXPOOLNAME, "%s",
      |                                                         ^

Change-Id: I55c3e05a933ff3d2c33a71ed269fffe63797b528
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45815
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15356 tests: get rid of extra spaces in PERM_CMD 13/45813/2
Elena Gryaznova [Thu, 9 Dec 2021 18:50:37 +0000 (21:50 +0300)]
LU-15356 tests: get rid of extra spaces in PERM_CMD

The tests use the PERM_CMD set to "set_param  -P" with
extra space before -P" fail because they do not expect
these allowable extra spaces:
   [[ $PERM_CMD = *"set_param -P"* ]]

Fixes: b9c359a70d ("LU-7004 tests: move from lctl conf_param to lctl set_param -P")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: Ia18e32baa56b7dac1f4e15777bfcc4b9ab1048fb
Reviewed-on: https://review.whamcloud.com/45813
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14776 ldiskfs: support Ubuntu 5.8.0-63 94/45794/2
James Simmons [Sat, 4 Dec 2021 22:51:23 +0000 (15:51 -0700)]
LU-14776 ldiskfs: support Ubuntu 5.8.0-63

Handle small changes in ext4 for Ubuntu 5.8.0-63 release.

Change-Id: Ie81b64909a49e66af17b4dfc1b8fbaf538f9f29e
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45794
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15342 tests: escape "|" 88/45788/3
Elena Gryaznova [Wed, 8 Dec 2021 11:19:44 +0000 (14:19 +0300)]
LU-15342 tests: escape "|"

escape "|" on want="FULL|IDLE" to protect interpretation
by shell:
  sh: IDLE: command not found

Fixes: af666bef05 ("LU-12857 tests: allow clients to be IDLE after recovery")
Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I2f885ea225ba43537f37b8dad1c2e0cd8f652a79
Reviewed-on: https://review.whamcloud.com/45788
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15339 tests: Increase timeout in sanity 208 79/45779/2
Patrick Farrell [Tue, 7 Dec 2021 21:54:20 +0000 (16:54 -0500)]
LU-15339 tests: Increase timeout in sanity 208

It's been observed that occasionally the initial request in
sanity 208 does not complete in 1 second, which invalidates
the test.  (And sometimes causes it to fail - but even if
it passes, the test is invalid.)

Increase the time to 2 seconds.

Using trivial testing because this just modifies sanity and
it's such a simple change.

test-parameters: trivial

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I70cf32813a9a2ced0cc388eb25eba29918ba7d03
Reviewed-on: https://review.whamcloud.com/45779
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15338 tests: check whole jobid in sanity 205a 74/45774/3
Andreas Dilger [Tue, 7 Dec 2021 19:12:00 +0000 (12:12 -0700)]
LU-15338 tests: check whole jobid in sanity 205a

Check the whole jobid string in sanity test_205a to avoid matching
a substring of the jobid twice.  This could only currently happen
for the second "dd" test, at a rate about 1/8192, but might also
fail in the future if other tests are added.

Test-Parameters: trivial testlist=sanity env=ONLY=205a,ONLY_REPEAT=200
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I34b7ed1a7825e3fbad9ea8666fccb2bdc53ebbe5
Reviewed-on: https://review.whamcloud.com/45774
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15331 kernel: kernel update SLES15 SP2 [5.3.18-24.96.1] 64/45764/2
Jian Yu [Tue, 7 Dec 2021 07:11:51 +0000 (23:11 -0800)]
LU-15331 kernel: kernel update SLES15 SP2 [5.3.18-24.96.1]

Update SLES15 SP2 kernel to 5.3.18-24.96.1 for Lustre client.

Test-Parameters: trivial \
env=SANITY_EXCEPT="100 103 125 130 136 154 255 817" \
clientdistro=sles15sp2 \
testlist=sanity

Change-Id: Ia457af76060a96f574cb501af6456afdc7de6411
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45764
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15244 llite: set ra_pages of backing_dev_info with 0 12/45712/4
Qian Yingjin [Thu, 2 Dec 2021 13:16:18 +0000 (08:16 -0500)]
LU-15244 llite: set ra_pages of backing_dev_info with 0

The latest RHEL8.5 kernel sets initial @ra_pages of
backing_dev_info with VM_READAHEAD_PAGES:
struct backing_dev_info *bdi_alloc(int node_id)
{
...
bdi->ra_pages = VM_READAHEAD_PAGES;
bdi->io_pages = VM_READAHEAD_PAGES;
...
}

This will cause that @ra_pages of file readahead state is set
with @bdi->ra_pages, make the readahead is out of Lustre control
and trigger the readahead logic in Linux kernel wrongly. And it
results in the failure sanity 101j.

In this patch, we force to set @ra_pages of backing_dev_info with
0 after setup the backing device info. By this way, it disables
kernel readahead in the super block.

This patch also cleanups the unnecessary setting of @ra_pages in
llite "file.c" and "vvp_io.c".

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: If6468109620269c1e76abe3a1cd73c3b40a417a8
Reviewed-on: https://review.whamcloud.com/45712
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15293 build: Add build support for arm64 centos8 91/45691/4
Xinliang Liu [Mon, 22 Nov 2021 01:34:34 +0000 (01:34 +0000)]
LU-15293 build: Add build support for arm64 centos8

This patch adds lbuid support for latest Arm64 CentOS 8.4, 8.5.

Also fix build doesn't use Lustre provided kernel config file
on CentOS8.

Test-Parameters: trivial

Test-Parameters: env=SANITY_EXCEPT="101j" \
 clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Change-Id: I95c7aa7e77ea1cc7a99fdaacc2220e14d2db6185
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/45691
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12678 o2iblnd: convert ibp_refcount to a kref 85/45685/4
James Simmons [Tue, 30 Nov 2021 18:21:49 +0000 (13:21 -0500)]
LU-12678 o2iblnd: convert ibp_refcount to a kref

This refcount is used exactly like a kref.  So change it to one.
kref uses refcount_t which will warn on increment-from-zero and
similar problems (which enabled with CONFIG option), so we don't
need the LASSERT calls.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: I23ade8c2f768c70a1fd330e8c173e0d18f5ff976
Reviewed-on: https://review.whamcloud.com/45685
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15234 lnet: Race on discovery queue 70/45670/8
Chris Horn [Mon, 29 Nov 2021 17:38:48 +0000 (11:38 -0600)]
LU-15234 lnet: Race on discovery queue

If the discovery thread clears the LNET_PEER_DISCOVERING bit then a
race window opens when the discovery thread drops the
lnet_peer.lp_lock spinlock and closes when the discovery thread
acquires the lnet_net_lock. If another thread queues the peer for
discovery during this window then the LNET_PEER_DISCOVERING bit is
added back to the peer state, but since the peer is already on the
lnet.ln_dc_working queue, it does not get added to the
lnet.ln_dc_request queue.

When the discovery thread acquires the lnet_net_lock/EX, it sees that
the LNET_PEER_DISCOVERING bit has not been cleared, so it does not
call lnet_peer_discovery_complete() which is responsible for sending
messages on the peer's discovery pending queue.

At this point, the peer is stuck on the lnet.ln_dc_working queue, and
messages may continue to accumulate on the peer's
lnet_peer.lp_dc_pendq.

Fix the issue by re-working the main discovery thread loop so that we
do not release the lnet_peer.lp_lock until after we've determined
whether we need to call lnet_peer_discovery_complete().
This ensures that the lnet_peer is correctly removed from the
discovery work queue and any messages on the peer's
lnet_peer.lp_dc_pendq are sent or finalized.

It is also possible for the lnet_peer.lp_dc_error to be cleared
during the aforementioned window, as well as during the time when
lnet_peer_discovery_complete() is processing the contents of the
lnet_peer.lp_dc_pendq. This could prevent messages on the
lnet_peer.lp_dc_pendq from being correctly finalized. To fix this
issue, the responsibilities of lnet_peer_discovery_error() were
incorporated into lnet_peer_discovery_complete().

HPE-bug-id: LUS-10615
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3779a342de7108105c2fd2bc41373560e8e5ef14
Reviewed-on: https://review.whamcloud.com/45670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Olaf Weber <olaf.weber@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15279 ptlrpc: use a cached value 61/45661/5
Alexey Lyashkov [Thu, 25 Nov 2021 18:12:21 +0000 (21:12 +0300)]
LU-15279 ptlrpc: use a cached value

Don't calculate a early reply size - use a cached,
as it don't changed after start

Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I3a6bd5013d0646b6165db52d6a7fb38b263756e6
Reviewed-on: https://review.whamcloud.com/45661
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15175 tests: fix ldev tests 13/45613/6
Elena Gryaznova [Thu, 2 Dec 2021 11:31:04 +0000 (14:31 +0300)]
LU-15175 tests: fix ldev tests

generate_ldev_conf() and tests use this fn do not
work on setup with ost targets located not on 1 oss
and do not work on failover setup where
  <facet>_HOST != <facet>failover_HOST.

Fixes: 0f17fc82a89a ("LU-7060 ldev: Added MGS NID substitution to ldev")
Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-2495
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ief38df0b1a0ffa37a8e7a4545a69a453d6dba7bd
Reviewed-on: https://review.whamcloud.com/45613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15095 target: lbug_on_grant_miscount module parameter 21/45521/5
Vladimir Saveliev [Wed, 10 Nov 2021 08:40:50 +0000 (11:40 +0300)]
LU-15095 target: lbug_on_grant_miscount module parameter

Some tests have hit "lctl: error invoking upcall" when setting the
lbug_on_grant_miscount tunable parameter.  Instead, define a module
parameter lbug_on_grant_miscount flag as ptlrpc module parameter,
similar to how it is done for ldiskfs_track_declares_assert.

Change-Id: I9cd0f9fa75b37539b23443bbcbb3445c87318ab1
Fixes: bb5d81ea95 ("LU-14543 target: prevent overflowing of tgd->tgd_tot_granted")
Test-Parameters: trivial
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45521
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15195 ofd: missing OST object 59/45459/6
Vitaly Fertman [Thu, 4 Nov 2021 14:28:49 +0000 (17:28 +0300)]
LU-15195 ofd: missing OST object

as the OST-MDT resync may be not finished by the end of the recovery
it may happen new enqueue for a write op may fail due to an absent
object. Return EINPROGRESS so that the enqueue was resent until get
resynced.

to not get stuck forever in case of disappeared MDT or a double
failure, return EINPROGRESS during hard failover timeout only.

also, cleanup replay-ost-single test 12:
- eliminate a need in the hard failover
- no need in a special obd_fail_loc, just use replay_barrier
- createmany is able to create files with unique names,
  no need in special steps

HPE-bug-id: LUS-10267
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I5f16b63454c51ad8d112770c15c7e6e7f41f3c40
Reviewed-by: Sergey Cheremencev <c17829@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Alexander Lezhoev <c17454@cray.com>
Reviewed-on: https://review.whamcloud.com/45459
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15137 socklnd: expect two control connections maximum 61/45461/4
Serguei Smirnov [Thu, 4 Nov 2021 18:35:43 +0000 (11:35 -0700)]
LU-15137 socklnd: expect two control connections maximum

As a result of connecting to ourselves, e.g. pinging own nid,
two control type connections are established vs. just one
in case of connecting externally.
Fix the control connection counter to be able to handle that.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 71b2476e ("LU-12815 socklnd: add conns_per_peer parameter")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: Idce01d81e3924226b5b163d2472cbcd4f6eb5819
Reviewed-on: https://review.whamcloud.com/45461
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
2 years agoLU-15109 tests: different quota and usage relations 57/45257/6
Sergey Cheremencev [Fri, 15 Oct 2021 12:23:53 +0000 (15:23 +0300)]
LU-15109 tests: different quota and usage relations

Add sanity-quota_1i that following cases:
- User is above PQ limit and the quota limit is cleared.
  User should now be able to write.
- User is below PQ limit and the quota limit is lowered
  below current usage. User should not be able to write.
- User is above PQ limit and the quota limit is raised
  above current usage. Should now be able to write.

Change-Id: Iad81c706aaf838cacfdf2971ee100950c47d1585
HPE-bug-id: LUS-9935
Test-Parameters: testlist=sanity-quota env=ONLY=1i
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45257
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9516 tests: fix sanity test_24v 72/45172/3
Andreas Dilger [Fri, 8 Oct 2021 23:01:00 +0000 (17:01 -0600)]
LU-9516 tests: fix sanity test_24v

The "lfs getdirstripe -c" command will return stripes=0 for
unstriped directories.  Handle this when calculating free_inodes
to avoid creating zero files for this test.

Speed cleanup of test_24v and other users of simple_cleanup_common()
by using unlinkmany to delete files if the file count is provided.

Use stack_trap consistently and don't do both manual and exit cleanup.

Test-Parameters: trivial testlist=sanity env=ONLY=24v
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I25105a5d0ab719d41bf41cff0aaea6d00a9c4059
Reviewed-on: https://review.whamcloud.com/45172
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14965 ldiskfs: rhel7.6 inode mutex for ldiskfs_orphan_add 65/45165/2
Bobi Jam [Fri, 8 Oct 2021 09:21:22 +0000 (16:21 +0700)]
LU-14965 ldiskfs: rhel7.6 inode mutex for ldiskfs_orphan_add

See following warning:

ldiskfs/namei.c:3331 ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
Call Trace:
dump_stack+0x19/0x1b
__warn+0xd8/0x100
warn_slowpath_null+0x1d/0x20
ldiskfs_orphan_add+0x11e/0x290 [ldiskfs]
ldiskfs_xattr_inode_orphan_add+0xbb/0x110 [ldiskfs]
ldiskfs_xattr_delete_inode+0x5c/0x350 [ldiskfs]
ldiskfs_evict_inode+0x1a8/0x630 [ldiskfs]
evict+0xb4/0x180
iput+0xfc/0x190
osd_object_delete+0x1f8/0x370 [osd_ldiskfs]
lu_object_free.isra.27+0xb8/0x1c0 [obdclass]
lu_object_put+0xa5/0x460 [obdclass]
mdt_object_put+0x30/0x110 [mdt]
mdt_reint_unlink+0x8e0/0x1890 [mdt]
mdt_reint_rec+0x83/0x210 [mdt]
mdt_reint_internal+0x720/0xaf0 [mdt]
mdt_reint+0x67/0x140 [mdt]
tgt_request_handle+0x7ea/0x1750 [ptlrpc]
ptlrpc_server_handle_request+0x256/0xb10 [ptlrpc]
ptlrpc_main+0xb3c/0x14e0 [ptlrpc]
kthread+0xd1/0xe0
ret_from_fork_nospec_begin+0x21/0x21

Need to hold inode mutex on the external EA for ldiskfs_orphan_add()
to soothe the warning.

This is a port of:

Lustre-commit: 7d3b5d9fdc766411eacaed27fb2fd9250800f096
Lustre-change: https://review.whamcloud.com/44754

Test-Parameters: trivial
Fixes: f64e9f19f68e ("LU-12977 ldiskfs: properly take inode_lock() for truncates")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I47a01862793afaac1d7c311f1b6d65d2cf4bb93f
Reviewed-on: https://review.whamcloud.com/45165
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13717 sec: fix handling of encrypted file with long name 63/45163/3
Sebastien Buisson [Tue, 5 Oct 2021 14:51:52 +0000 (16:51 +0200)]
LU-13717 sec: fix handling of encrypted file with long name

The ciphertext representation of the name of an encrypted file or
directory can be up to 256 bytes of binary data, if the cleartext
name is up to NAME_MAX. But then this ciphertext is encoded via
critical_encode() before being sent to servers. Once encoded, the
length can exceed NAME_MAX because of the escaped critical
characters.
So make sure ll_prep_md_op_data() accepts those too long encoded names
if it is called for lookup or create of an encrypted file or
directory. In the other cases, the 'name' taken as input is the plain
text version, so it must conform to the NAME_MAX limit.

When carrying out operations on an encrypted file with long name, we
manipulate a digested form whose hash needs to be matched against the
content of the LinkEA. The name found in the LinkEA is not NUL
terminated, so this aspect must be taken care of.

Fixes: 4d38566a00 ("LU-13717 sec: filename encryption")
Fixes: ed4a625d88 ("LU-13717 sec: filename encryption - digest support")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4b0e51eee5e549ab56292fe0fec3c1be1b487fc7
Reviewed-on: https://review.whamcloud.com/45163
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15009 ofd: continue precreate if LAST_ID is less on MDT 84/44984/7
Lai Siyao [Thu, 16 Sep 2021 21:49:33 +0000 (17:49 -0400)]
LU-15009 ofd: continue precreate if LAST_ID is less on MDT

It's possible that precreate succeeded on OST, but MDT didn't get the
reply, and assumed failure. In this case, the LAST_ID on MDT is
smaller than that on OST, instead of report error and stop precreate,
it's better to move precreate window forward.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ia6ca418ec0ea6797b7eccc1610879331307fad07
Reviewed-on: https://review.whamcloud.com/44984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14677 sec: remove MIGRATION_ compatibility defines 57/44957/11
Sebastien Buisson [Fri, 10 Sep 2021 12:03:03 +0000 (14:03 +0200)]
LU-14677 sec: remove MIGRATION_ compatibility defines

Remove the MIGRATION_* compatibility flags and use
LLAPI_MIGRATION_* everywhere.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iab2a2f6dfc435377e9db0d4963547841b2cbc403
Reviewed-on: https://review.whamcloud.com/44957
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14677 sec: no encryption key migrate/extend/resync/split 24/44024/24
Sebastien Buisson [Thu, 17 Jun 2021 13:31:44 +0000 (15:31 +0200)]
LU-14677 sec: no encryption key migrate/extend/resync/split

Allow some layout operations on encrypted files, even when the
encryption key is not available:
- lfs migrate
- lfs mirror extend
- lfs mirror resync
- lfs mirror verify
- lfs mirror split
We allow these access patterns to applications that know what they are
doing, by using the specific flag O_FILE_ENC and O_DIRECT.

Also add sanity-sec test_59a,b,c to exercise these access patterns.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ieaeee0e5bf7643f18d775fe6daa5e31c2f349f8c
Reviewed-on: https://review.whamcloud.com/44024
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files 76/43876/20
James Simmons [Wed, 8 Dec 2021 22:13:40 +0000 (17:13 -0500)]
LU-13783 osd-ldiskfs: use alloc_file_pseudo to create fake files

With kallsyms_lookup_name() no longer exported with 5.8+ kernels
this means the work around to setup the security handling broke.
Currently osd-ldiskfs will crash due to security_alloc() never
being called. The solution is to use alloc_file_pseudo() instead
to create our fake file.

Change-Id: Ib417ebdda7d9829a231c568022618154c273f3e6
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/43876
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14704 tests: disable opencache for sanity/29 77/43777/14
Alex Zhuravlev [Tue, 25 May 2021 04:13:55 +0000 (07:13 +0300)]
LU-14704 tests: disable opencache for sanity/29

otherwise lock counting is not quite correct

Fixes: 41d99c4902 ("LU-10948 llite: Introduce inode open heat counter")

Test-Parameters: trivial

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia73e8aa4a16b7ced29490d41c8eac4ee839a3406
Reviewed-on: https://review.whamcloud.com/43777
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-11388 test: enable replay-single test_131b 21/40421/7
Vikentsi Lapa [Tue, 27 Oct 2020 14:39:58 +0000 (14:39 +0000)]
LU-11388 test: enable replay-single test_131b

Issue is fixed, so this commit verifies fix.

Test-Parameters: trivial env=ONLY=131 testlist=replay-single fstype=zfs
Signed-off-by: Vikentsi Lapa <vlapa@whamcloud.com>
Change-Id: I609146172c1fee2a955d5c41f623c8b8c2ffaeaa
Reviewed-on: https://review.whamcloud.com/40421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15357 mdd: fix changelog context leak 31/45831/4
Mikhail Pershin [Sat, 11 Dec 2021 12:49:47 +0000 (15:49 +0300)]
LU-15357 mdd: fix changelog context leak

The mdd_changelog_clear() shouldn't skip llog_ctxt_put()
in case of error.

Fixes: 6b183927e1 (LU-14553 changelog: eliminate mdd_changelog_clear warning)
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I9c9aa3ce0d11e8f67470b450d007f2a1081644c6
Reviewed-on: https://review.whamcloud.com/45831
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15252 mdt: reduce contention at mdt_lsom_update 09/45709/5
Alexander Boyko [Thu, 2 Dec 2021 09:43:54 +0000 (04:43 -0500)]
LU-15252 mdt: reduce contention at mdt_lsom_update

mot_som_mutex serialize all close requests with lsom updates for
a same mdt_object. For a massive open/read/close single shared
file load, it leads to high load avarage cause many threads sleep
on mutex.
This patch introduces a cached lsom size, and uses a mutex at update
part only. Close requests with lsom size less or equal to cached size
would not take a mutex at all.

Test results MPI open/flock/funlock/close SSF
10 iterations 10 node 100 thread each, 1000 file ops per thread
close time secs master patch MDT load avarage master patch
avg             0.142  0.086                  47.05  38.89
max             0.164  0.129                  49.39  44.77
min             0.097  0.041                  44.44  34.7

Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I807b468b128295df9391b0467e74d4f10240662e
Reviewed-on: https://review.whamcloud.com/45709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-7372 tests: re-enable replay-dual test_26 82/43982/2
Andreas Dilger [Fri, 11 Jun 2021 07:19:52 +0000 (01:19 -0600)]
LU-7372 tests: re-enable replay-dual test_26

Re-enable test_26 since it was just the unfortunate victim of
either test_24 or test_25 causing MDS unmount to hang.

Test-Parameters: trivial testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib944028e798488c425501f0c48bf812fc13ebbe5
Reviewed-on: https://review.whamcloud.com/43982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vikentsi Lapa <vlapa@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15262 osd: bio_integrity_prep_fn return value processing 46/45646/3
Alexey Lyashkov [Mon, 22 Nov 2021 13:32:23 +0000 (16:32 +0300)]
LU-15262 osd: bio_integrity_prep_fn return value processing

There is osd_bio_integrity_handle() fn in lustre/osd-ldiskfs/osd_io.c
It checks the returned code of bio_integrity_prep_fn() but between
mainstream Linux 4.12 and 4.13 kernel integrity API has changed and
in 4.13+ (as well as for any RHEL8 including first beta)

bio_integrity_prep() returns boolean true on success.

HPe-bug-id: LUS-10443
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I973aa8ccae024157ad863d26afc7b1264a5c7149
Reviewed-on: https://review.whamcloud.com/45646
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew tag 2.14.56 2.14.56 v2_14_56
Oleg Drokin [Mon, 13 Dec 2021 20:16:40 +0000 (15:16 -0500)]
New tag 2.14.56

Change-Id: I2491f69b4d4e4a7ae8ed39bef8c9806127c93d79
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7] 87/45687/2
Jian Yu [Tue, 30 Nov 2021 21:43:10 +0000 (13:43 -0800)]
LU-15292 kernel: kernel update RHEL7.9 [3.10.0-1160.49.1.el7]

Update RHEL7.9 kernel to 3.10.0-1160.49.1.el7.

Test-Parameters: trivial clientdistro=el7.9 serverdistro=el7.9

Change-Id: I356b8a8345a4a91d6d1c1a4a9b4eab4bb5afe75b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15260 tests: numfailovers() fix 33/45633/6
Elena Gryaznova [Mon, 22 Nov 2021 15:13:07 +0000 (18:13 +0300)]
LU-15260 tests: numfailovers() fix

Patch fixes numfailovers() to use comma
separated MDTS list correctly. Without this fix
in newer bash version we see the following error:
  line 69: mds1,mds2,mds3,mds4_nums: bad substitution

Fixes: a7a2133bfa ("b=18696 new RECOVERY_RANDOM_SCALE test")
Fixes: b594948509 ("TT-59 remove . and - from the node name")
Test-Parameters: trivial testlist=recovery-random-scale
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10619
Change-Id: I4c28e3c62cada60dc1241948dc4e969e0e10ce9a
Reviewed-on: https://review.whamcloud.com/45633
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15263 quota: fix bug in qmt_pool_recalc 32/45632/2
Sergey Cheremencev [Thu, 21 Oct 2021 20:28:01 +0000 (23:28 +0300)]
LU-15263 quota: fix bug in qmt_pool_recalc

env should be freed at the end of qmt_pool_recalc,
as it is needed in qpi_putref. It causes a panic,
if pool has the last reference:
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: [<ffffffffc08de2d7>] lu_context_key_get+0x17/0x30 [obdclass]
...
Call Trace:
 [<ffffffffc08de358>] lu_object_free.isra.30+0x68/0x170 [obdclass]
 [<ffffffffc08e1a35>] lu_object_put+0xc5/0x3e0 [obdclass]
 [<ffffffffc100e56c>] qmt_pool_free+0x30c/0x590 [lquota]
 [<ffffffffc10100b5>] qmt_pool_recalc+0x365/0x1260 [lquota]
 [<ffffffff8bac1c31>] kthread+0xd1/0xe0
 [<ffffffff8c176c37>] ret_from_fork_nospec_begin+0x21/0x21

HPE-bug-id: LUS-10426
Change-Id: Ic23dcb858ff811757f38948aa572c936c076e21e
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Reviewed-on: https://review.whamcloud.com/45632
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15208 ldiskfs: add support for Ubuntu20 kernel 5.4.0.90 47/45547/4
Li Dongyang [Fri, 12 Nov 2021 12:30:43 +0000 (23:30 +1100)]
LU-15208 ldiskfs: add support for Ubuntu20 kernel 5.4.0.90

Also fix the lustre-build-ldiskfs.m4 to select correct series file.
We use -ge to check the kernel release version, so greater version
should come on top.

Change-Id: Id6b599ef5b2ea823e203aaa6a40917e49f98f4d9
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/45547
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-930 doc: update lustre.7 man page 93/45493/2
Andreas Dilger [Mon, 8 Nov 2021 21:03:24 +0000 (14:03 -0700)]
LU-930 doc: update lustre.7 man page

Update the lustre.7 man page to better describe current functionality.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I979841e597fcfa8448c708dd66d4d89d3018b1cc
Reviewed-on: https://review.whamcloud.com/45493
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15196 kernel: kernel update RHEL8.4 [4.18.0-305.25.1.el8_4] 60/45460/4
Jian Yu [Tue, 30 Nov 2021 22:07:40 +0000 (14:07 -0800)]
LU-15196 kernel: kernel update RHEL8.4 [4.18.0-305.25.1.el8_4]

Update RHEL8.4 kernel to 4.18.0-305.25.1.el8_4.

Test-Parameters: trivial fstype=ldiskfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Test-Parameters: trivial fstype=zfs \
clientdistro=el8.4 serverdistro=el8.4 testlist=sanity

Change-Id: Ic70f7330f90a36646bb36e0c6015ea22882b20b9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45460
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15190 ptlrpc: fix duplication check 45/45445/5
Alex Zhuravlev [Wed, 3 Nov 2021 06:31:06 +0000 (09:31 +0300)]
LU-15190 ptlrpc: fix duplication check

ptlrpc_server_check_for_resend() skips duplication check if
current exp_rpc_count == 0 which is wrong as exp_rpc_count
is incremented for RPCs in progress.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4ba1600341d916871f66aceb4d6a1043dd015e55
Reviewed-on: https://review.whamcloud.com/45445
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12784 tests: fix large_xattr_enabled() for ZFS 64/45264/2
Andreas Dilger [Fri, 15 Oct 2021 17:31:57 +0000 (11:31 -0600)]
LU-12784 tests: fix large_xattr_enabled() for ZFS

Fix large_xattr_enabled() check for ZFS filesystems, since bash
functions return "0" for true.  Otherwise, all ZFS tests that
check large_xattr_enabled() will be skipped.

Fixes: 84097792f56c ("LU-12784 llite: limit max xattr size by kernel value")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie566244c6b1f46b947a96331e7623b9b863ebbe5
Reviewed-on: https://review.whamcloud.com/45264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15057 utils: pool quota man 21/45121/4
Sergey Cheremencev [Wed, 31 Mar 2021 12:13:53 +0000 (15:13 +0300)]
LU-15057 utils: pool quota man

Adding pool quota man for setquota and
quota commands.
Remove [-o <obd_uuid>|-i <mdt_idx>|-I <ost_idx>]
from the case "lfs quota -t". Grace period
is stored only at quota master. Furthermore,
command lfs quota -t -I 0 /mnt/testfs fails
with EOPNOTSUPP.

Test-Parameters: trivial
HPE-bug-id: LUS-9869
Change-Id: I368e22b782bd3626f64907059ea329e94986535b
Reviewed-on: https://es-gerrit.dev.cray.com/158556
Reviewed-by: Alexander Boyko <c17825@cray.com>
Reviewed-by: Elena Gryaznova <c17455@cray.com>
Tested-by: Elena Gryaznova <c17455@cray.com>
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45121
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13756 quota: up_read leak in qmt_pool_lookup 06/45106/7
Sergey Cheremencev [Thu, 30 Sep 2021 15:58:16 +0000 (18:58 +0300)]
LU-13756 quota: up_read leak in qmt_pool_lookup

qmt_pool_lock is not released if qti_pools_add fails in
qmt_pool_lookup.

Change-Id: Ic2adb44468d51af7aefcbb91279260ae6f85d67a
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45106
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14975 dne: dir migration in non-recursive mode 02/44802/10
Lai Siyao [Thu, 26 Aug 2021 11:37:09 +0000 (07:37 -0400)]
LU-14975 dne: dir migration in non-recursive mode

Add an option "-d|--directory" option for "lfs migrate -m" to
migrate specified directory only, which is similar to "ls -d".

Add sanity 230w.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ib97949e3840a3b49f7074b16e259582a9bf16e3b
Reviewed-on: https://review.whamcloud.com/44802
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14956 fld: repeat failed FLDB lookup 23/44723/13
Alex Zhuravlev [Mon, 23 Aug 2021 07:29:18 +0000 (10:29 +0300)]
LU-14956 fld: repeat failed FLDB lookup

it's possible that LWP reconnection is in progress after remote
MDS restart. if FLDB misses an entry, then FLDB lookup can fail
with EAGAIN and whole RPC processing (like MDS_REINT) can fail
as well. try to lookup few times in cases of EAGAIN.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib6aeaf7706a6465b0c8bee696d985bb440ed192e
Reviewed-on: https://review.whamcloud.com/44723
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 mdd: unwind md_capable() 80/44580/8
James Simmons [Tue, 17 Aug 2021 14:16:41 +0000 (10:16 -0400)]
LU-9859 mdd: unwind md_capable()

The inline function md_capable() is just a wrapper
around cap_raised() which adds little benefit. Lets
just remove the use of this wrapper.

Change-Id: I1a5f4b2e34b4cf358b52b3fc4bdeff17fdab50c9
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/44580
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14754 tests: add Overstripe support to racer 64/43964/5
Elena Gryaznova [Thu, 10 Jun 2021 09:19:44 +0000 (12:19 +0300)]
LU-14754 tests: add Overstripe support to racer

The files are created with a overstripe layout if
RACER_ENABLE_OVERSTRIPE=true is set.

We would like to have the ability to use the "real"
layouts, i.e. to limit the number of stripes per OST
instead of allowing racer to achieve the max LOV_MAX_STRIPE_COUNT
value. Patch adds RACER_LOV_MAX_STRIPECOUNT equal to
LOV_MAX_STRIPE_COUNT by default.

Test-Parameters: trivial testlist=racer
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9466, LUS-9608
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Change-Id: I550922938438afa121af275fd1d6f60082db9b54
Reviewed-on: https://review.whamcloud.com/43964
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14392 gnilnd: re-enable large I/o buffers 73/41373/2
Shaun Tancheff [Sun, 31 Jan 2021 16:20:54 +0000 (10:20 -0600)]
LU-14392 gnilnd: re-enable large I/o buffers

DVS on gni breaks the LNet 1M handshake of LNET_MAX_IOV.

Introduce GNILND_MAX_IOV with a 4M i/o maximum and a hint
LNET_MD_GNILND so LNet can accept the large buffer w/o complaint.

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4e78c0022fdece0d6945bbcc47e2e64d4d181dca
Reviewed-on: https://review.whamcloud.com/41373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11915 tests: fix conf-sanity 115 test 49/38849/9
Artem Blagodarenko [Fri, 5 Jun 2020 15:45:45 +0000 (11:45 -0400)]
LU-11915 tests: fix conf-sanity 115 test

Not enough xattrs added to move outside inode.
Add one additional xattr. The test works only with FLAKEY=false.

Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Test-Parameters: testlist=conf-sanity env="ONLY=115"
HPE-bug-id: LUS-6966
Change-Id: Iab13ed3434effb03e1209755ac51eba2debe7387
Reviewed-on: https://review.whamcloud.com/38849
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13542 osd: brw stats are initialized too late 54/38554/6
Andrew Perepechko [Thu, 25 Nov 2021 17:28:01 +0000 (20:28 +0300)]
LU-13542 osd: brw stats are initialized too late

Lustre crashes with the following stack trace:

 [<ffffffffc113cbac>] lprocfs_oh_tally+0x2c/0x40 [obdclass]
 [<ffffffffc169719b>] record_start_io.part.14+0x2b/0x40 [osd_zfs]
 [<ffffffffc1698322>] osd_read+0xa2/0x180 [osd_zfs]
 [<ffffffffc1167dee>] dt_record_read+0x1e/0x70 [obdclass]
 [<ffffffffc1190997>] lustre_index_restore+0x527/0x1720 [obdclass]
 [<ffffffffc16b2564>] osd_initial_OI_scrub+0xa34/0xd50 [osd_zfs]
 [<ffffffffc16b34fd>] osd_scrub_setup+0x9ed/0xb90 [osd_zfs]
 [<ffffffffc168a97b>] osd_mount+0xf4b/0x1380 [osd_zfs]

osd_procfs_init()/osd_stats_init() are called *after*
osd_initial_OI_scrub(), so osd stats are not yet initialized
when osd_read() first tries to update them.

This patch separates osd stats initialization from procfs
initialization so that osd stats should become initialized
by the time scrub starts its own initialization.

Change-Id: I15ab03e77eaab76e3dea8067b849c891e89aa9a8
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-8173
Reviewed-on: https://review.whamcloud.com/38554
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10640 tests: ha.sh script improvements 29/31229/9
Elena Gryaznova [Wed, 17 Nov 2021 10:13:26 +0000 (13:13 +0300)]
LU-10640 tests: ha.sh script improvements

In each load iteration check for all created directories
that ls' long format output does not contain question
marks ('?').
'?'s may be reported if
  stat(2)->getattr()->ll_glimpse_size()
fails, which is expected in case of failover. Then the test
is to wait until recovery is completed, repeat the check
and exit with error if '?' still exists after second check.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-4894
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Change-Id: I88495511797aaad53c923c90f88f92f1412380ce
Reviewed-on: https://review.whamcloud.com/31229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15252 mdc: add client tunable to disable LSOM update 19/45619/3
Alexander Boyko [Fri, 19 Nov 2021 08:08:16 +0000 (03:08 -0500)]
LU-15252 mdc: add client tunable to disable LSOM update

It seems that mdt_lsom_update() has a serious issue with a single
shared file because of its mdt-level mutex for every close request.
The patch adds mdc_lsom parameter to mdc, base on it state client
sends or not LSOM updates to MDT. By default LSOM is on.

lctl set_param mdc.*.mdc_lsom=[on|off]

For a configuration when LSOM is not used the patch helps
MDT with load avarage with a specific load when many threads
open/read/close for a single file.

HPE-bug-id: LUS-10604
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Iba0e745a94825641da6b0a1c09488b1e2f54658b
Reviewed-on: https://review.whamcloud.com/45619
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15167 quota: fallocate send UID/GID for quota 75/45475/5
Arshad Hussain [Sun, 7 Nov 2021 14:46:29 +0000 (09:46 -0500)]
LU-15167 quota: fallocate send UID/GID for quota

Calling fallocate() on a newly created file did not account quota
usage properly because the OST object did not have a UID/GID
assigned yet. Update the fallocate code in the OSC to always send
the file UID/GID/PROJID to the OST so that the object ownership
can be updated before space is allocated.

Test-case: sanity-quota/78 added

Fixes: 48457868a02a ("LU-3606 fallocate: Implement fallocate preallocate operation")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I86d80a7f415a80100f7d2fb5f417cf47bf5b2900
Reviewed-on: https://review.whamcloud.com/45475
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14514 flr: mirror split should not make stale file 24/42024/23
Bobi Jam [Thu, 2 Sep 2021 16:27:34 +0000 (00:27 +0800)]
LU-14514 flr: mirror split should not make stale file

Mirror split could leave an all stale mirrors file, this patch
prevent removing the last non-stale mirror from the file.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I63007784929a2cd18d2823e2250f7307ca7d8d45
Reviewed-on: https://review.whamcloud.com/42024
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11952 mdt: fix reconstruct open 12/35112/19
Andriy Skulysh [Sun, 3 Mar 2019 18:10:31 +0000 (20:10 +0200)]
LU-11952 mdt: fix reconstruct open

We shouldn't start a new transaction on resend.

Store fid of an opened object and use it during
reconstruction of the resend.

Change-Id: I8c21e9661903d3d4090ad29e43480e2ba7e35c39
Cray-bug-id: LUS-6957, LUS-7286
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexey Lyashkov <c17817@cray.com>
Reviewed-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/35112
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15169 Revert "LU-14668 lnet: Lock primary NID logic" 86/45386/3
Chris Horn [Tue, 26 Oct 2021 20:23:37 +0000 (15:23 -0500)]
LU-15169 Revert "LU-14668 lnet: Lock primary NID logic"

This patch breaks client mounts under certain LNet configurations.

This reverts commit 024f9303bc6f32a3113357c864765c4f9c93ed03.

Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic1f1d07694fe49df14c803a9434d673e61c7dd67
Reviewed-on: https://review.whamcloud.com/45386
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13601 llite: avoid needless large stats alloc 01/40901/13
Andreas Dilger [Tue, 8 Dec 2020 06:54:58 +0000 (23:54 -0700)]
LU-13601 llite: avoid needless large stats alloc

Allocate the ll_rw_extents_info (5896 bytes), ll_rw_offset_info
(6400 bytes), and ll_rw_process_info (640 bytes) structs only
when these stats are enabled, which is very rarely.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59bbfce8d7f2422d810617d5fa712a67333ebbe5
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-on: https://review.whamcloud.com/40901
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15181 osd-ldiskfs: a typo in osd_declare_write_commit() 23/45423/3
Andrew Perepechko [Sun, 31 Oct 2021 19:03:48 +0000 (22:03 +0300)]
LU-15181 osd-ldiskfs: a typo in osd_declare_write_commit()

A typo in osd_declare_write_commit() makes ASAN emit warnings like:
UBSAN: Undefined behaviour in /lustre/osd-ldiskfs/osd_io.c:1404:2
shift exponent 4096 is too large for 32-bit type 'int'

Change-Id: Ie612d9c6655211445c00bd17fa1bf7a836af3542
Fixes: f0f92773e ("LU-14187 osd-ldiskfs: fix locking in write commit")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-on: https://review.whamcloud.com/45423
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
2 years agoLU-15182 git: Add .gitreview file 07/44907/4
Xinliang Liu [Tue, 14 Sep 2021 01:20:32 +0000 (01:20 +0000)]
LU-15182 git: Add .gitreview file

Add .gitreview file, so that we can use "git review -s" to setup
remote push url and use "git review" to send patch for review.

Git review cmd is a very convenient tool to push, review for
Gerrit review system. See more details here:
https://docs.opendev.org/opendev/git-review/latest/usage.html

Test-Parameters: trivial
Change-Id: Ic8223bdfcb7a696328f921159a63d625359e45a6
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/44907
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-9859 gss: replace cfs_size_roundXX macros. 84/45584/3
James Simmons [Tue, 16 Nov 2021 16:07:13 +0000 (11:07 -0500)]
LU-9859 gss: replace cfs_size_roundXX macros.

Many of the cfs_size_roundX() macros are not even used so delete
them. Replace cfs_size_round4() uses in the GSS layer with
round_up(var, 4);

Change-Id: Id35f0f7b60f8d00f425d9b15d2a76aa4d0fa5f2f
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45584
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 years agoLU-15217 pcc: disable PCC for encrypted files 45/45545/4
Qian Yingjin [Fri, 30 Jul 2021 08:47:55 +0000 (16:47 +0800)]
LU-15217 pcc: disable PCC for encrypted files

When files are encrypted in Lustre using fscrypt, they should
normally not be accessible to users without the proper encyrption
key. However, if a user has then encryption key loaded when they
read a file, it may be decrypted in memory and saved to the PCC
backend in unencrypted form.

Due to the above reason, we just disable PCC caching for encrypted
files.

DDN-bug-id: EX-3571
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I6c363dcad7a6bc8520350c0295f6e221bec3abb0
Reviewed-on: https://review.whamcloud.com/45545
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14325 tests: skip replay-single 134 for older servers 50/45450/3
James Nunez [Wed, 3 Nov 2021 19:39:06 +0000 (13:39 -0600)]
LU-14325 tests: skip replay-single 134 for older servers

The fix for a PFL file lost during recovery was landed to
Lustre 2.13.53.  Servers prior to 2.13.53 will fail the
replay-single test, test_134, added to the original patch
to check that PFL files are not lost.  Thus, we need to
skip this test for Lustre servers less than 2.13.53.

Fixes: 72d45e1d344c ("LU-13809 mdc: fix lovea for replay")
Test-Parameters: trivial env=ONLY=134 testlist=replay-single
Test-Parameters: serverdistro=el7.9 serverversion=2.12.7 env=ONLY=134 testlist=replay-single
Test-Parameters: serverdistro=el7.7 serverversion=2.13.0 env=ONLY=134 testlist=replay-single
Signed-off-by: James Nunez <jnunez@whamcloud.com>
Change-Id: Id70f9e06f6221f88a54d696afce9de70cbcf1efa
Reviewed-on: https://review.whamcloud.com/45450
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Wei Liu <sarah@whamcloud.com>
2 years agoLU-15186 o2iblnd: Default map_on_demand to 1 31/45431/3
Chris Horn [Mon, 1 Nov 2021 20:06:31 +0000 (15:06 -0500)]
LU-15186 o2iblnd: Default map_on_demand to 1

On kernels that provide global MR we default to using that exclusively
even if FMR/FastReg is available. This causes an interop issue if the
active side of a connection request has a higher fragment count than
the passive side  because FMR/FastReg may be needed to map the higher
fragment count. We should change the default map_on_demand to 1 so
that FMR/FastReg is used by default. map_on)demand can still be set
to 0 if needed.

Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I76010a905f151efbb0b109ae6f5fba6fb7ce1956
Reviewed-on: https://review.whamcloud.com/45431
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15166 tests: restore osp-syn threads after test_818 75/45375/4
Vladimir Saveliev [Tue, 26 Oct 2021 16:53:01 +0000 (19:53 +0300)]
LU-15166 tests: restore osp-syn threads after test_818

test_818() is supposed to leave osp-syn threads up after the test end,
otherwise, following tests get "logging isn't available, run LFSCK".

Use fail $SINGLEMDS for that.

Test-Parameters: trivial testlist=sanity
HPE-bug-id: LUS-10495
Change-Id: Ib4876f4c4d39fc87f86788d8611838b8078e4aac
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45375
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12857 tests: allow clients to be IDLE after recovery 18/45318/3
Andreas Dilger [Thu, 21 Oct 2021 01:47:25 +0000 (19:47 -0600)]
LU-12857 tests: allow clients to be IDLE after recovery

If clients are not connected to an OST when it fails (connection
is IDLE), they do not need to be involved in recovery, so this
should not be considered an error when checking the client state.

Test-Parameters: trivial testlist=recovery-mds-scale env=SLOW=no
Test-Parameters: testlist=conf-sanity
Test-Parameters: testlist=replay-dual,replay-single
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6cfeb718acd233378ed1608f22061bc15c3ebbe5
Reviewed-on: https://review.whamcloud.com/45318
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15126 kernel: RHEL 8.5 server support 06/45306/9
Jian Yu [Wed, 17 Nov 2021 20:11:43 +0000 (12:11 -0800)]
LU-15126 kernel: RHEL 8.5 server support

This patch makes changes to support RHEL 8.5 release
with kernel 4.18.0-348.2.1.el8_5 for Lustre server.

Test-Parameters: trivial fstype=ldiskfs \
env=SANITY_EXCEPT="101j" \
clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Test-Parameters: trivial fstype=zfs \
env=SANITY_EXCEPT="101j" \
clientdistro=el8.5 serverdistro=el8.5 testlist=sanity

Change-Id: Ie976d8fd3e6fcf8a564eff8a41ad0fd51b2c858c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45306
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15222 build: Update ZFS version to 2.0.6 67/45567/4
Jian Yu [Mon, 15 Nov 2021 17:01:18 +0000 (09:01 -0800)]
LU-15222 build: Update ZFS version to 2.0.6

Update ZFS version to 2.0.6. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.0.6

Change-Id: I2a7df45b79f402c3d3bce8b137edd11b5224b576
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45567
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15126 kernel: new kernel [RHEL 8.5 4.18.0-348.2.1.el8_5] 85/45285/8
Jian Yu [Wed, 17 Nov 2021 19:58:14 +0000 (11:58 -0800)]
LU-15126 kernel: new kernel [RHEL 8.5 4.18.0-348.2.1.el8_5]

This patch makes changes to support new RHEL 8.5 release
for Lustre client.

Test-Parameters: trivial env=SANITY_EXCEPT="101j" \
clientdistro=el8.5

Change-Id: I068f091817126fffc14402254f45dcd75ba7f3fc
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
2 years agoLU-15184 llite: properly detect SELinux disabled case 01/45501/4
Sebastien Buisson [Tue, 9 Nov 2021 16:03:19 +0000 (17:03 +0100)]
LU-15184 llite: properly detect SELinux disabled case

Usually, security_dentry_init_security() returns -EOPNOTSUPP when
SELinux is disabled. But on some kernels (e.g. rhel 8.5) it returns
0 when SELinux is disabled, and in this case the security context is
empty.
So in both cases make sure the security context name is not set, which
means "SELinux is disabled" for the rest of the code.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I3b9608f9768288de89570c158e8429560fa0213f
Reviewed-on: https://review.whamcloud.com/45501
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15119 tgt: BUG at tgt_brw_read+0x16bf/0x1d80 73/45273/3
Andriy Skulysh [Wed, 6 Oct 2021 10:25:12 +0000 (13:25 +0300)]
LU-15119 tgt: BUG at tgt_brw_read+0x16bf/0x1d80

struct tgt_thread_big_cache {
  local = {{
      lnb_file_offset = 0,
      lnb_page_offset = 0,
      lnb_len = 0,
      lnb_rc = 0,
      lnb_page = 0xffffddee74fae100,
so npages_read becomes 0

Change-Id: Ie2201c9fc6f0350b1c6dcb480cff52f44d5413db
HPE-bug-id: LUS-10510
Signed-off-by: Andriy Skulysh <c17819@cray.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Shaun Tancheff <stancheff@cray.com>
Reviewed-on: https://review.whamcloud.com/45273
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15110 quota: cosmetic changes in PQ 58/45258/3
Sergey Cheremencev [Fri, 15 Oct 2021 14:11:47 +0000 (17:11 +0300)]
LU-15110 quota: cosmetic changes in PQ

cosmetic changes in PQ:
- make tgt_pool_free and qmt_sarr_pool_free void
- remove outdated comment from qmt_pool_lqes_lookup
- replace tabs with spaces

HPE-bug-id: LUS-9547
Change-Id: If4918b647eed1d971d00c521d010d0c72d349207
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45258
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15079 quota: include qsd_thread_info into mgs thread context 81/45181/2
Vladimir Saveliev [Tue, 24 Aug 2021 14:57:37 +0000 (17:57 +0300)]
LU-15079 quota: include qsd_thread_info into mgs thread context

mgs service thread envs do not get supplied with qsd_thread_info, which
may lead to the failure shown below:
(lu_object.h:1274:lu_env_info()) ASSERTION( info ) failed:
(lu_object.h:1274:lu_env_info()) LBUG
Pid: 146951, comm: ll_mgs_0003 3.10.0-957.1.3957.1.3.x4.3.25.x86_64 #1 SMP
Call Trace:
 libcfs_call_trace+0x8e/0xf0 [libcfs]
 lbug_with_loc+0x4c/0xa0 [libcfs]
 qsd_refresh_usage+0x25e/0x2f0 [lquota]
 qsd_op_adjust+0x2f1/0x730 [lquota]
 osd_object_delete+0x2b2/0x360 [osd_ldiskfs]
 lu_object_free.isra.32+0x68/0x170 [obdclass]
 lu_site_purge_objects+0x2fe/0x530 [obdclass]
 lu_object_find_at+0x371/0xa60 [obdclass]
 dt_locate_at+0x1d/0xb0 [obdclass]
 llog_osd_open+0x50e/0xf30 [obdclass]
 llog_open+0x15a/0x3e0 [obdclass]
 llog_origin_handle_open+0x334/0x720 [ptlrpc]
 tgt_llog_open+0x33/0xe0 [ptlrpc]
 mgs_llog_open+0x46/0x460 [mgs]
 tgt_request_handle+0x96a/0x1680 [ptlrpc]

Supply msg service context with qsd_thread_info.

Change-Id: If8664b81e1f64df015dad46ba26c9c1d1e3f54bf
HPE-bug-id: LUS-10334
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45181
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15059 nrs: do not overwrite "cmd" in nrs_tbf_rule 42/45142/4
Etienne AUJAMES [Wed, 6 Oct 2021 20:11:17 +0000 (22:11 +0200)]
LU-15059 nrs: do not overwrite "cmd" in nrs_tbf_rule

"cmd" pointer inside ptlrpc_lprocfs_nrs_tbf_rule_seq_write() and
nrs_tbf_parse_cmd are static. This could cause a double kfree call
because "cmd" could be overwriten by another "nrs_tbf_rule" write
instance.

Let's try to remove the "static" definition.

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I8cd7d9dd0483778c82bbf8711c07e49255983f4b
Reviewed-on: https://review.whamcloud.com/45142
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
2 years agoLU-14699 mdd: proactive changelog garbage collection 68/45068/9
Mikhail Pershin [Fri, 24 Sep 2021 15:47:44 +0000 (18:47 +0300)]
LU-14699 mdd: proactive changelog garbage collection

Currently changelog starts garbage collection when user
exceeds maximum idle timeout, there is also limit by amount
of idle records but it is used only for old changelog users
which have no cur_time field, therefore it is not used at
all nowadays. Another problem is that garbage collection is
started only when changelog is almost full. That causes
often situations when changelog might have very old users
staying much longer than idle timeout and having idle
records above maximum limit consuming space for nothing.

Patch reworks changelog GC in the following way:
- GC starts when changelog is almost full (old way) or
  either idle time or idle records limits are exceeded or
  when (idle_time * idle_records) exceeds its limit as well.
  The latest limit is calculated as:
  (idle_time * idle_records) / 84600 > (1 << 32) which is a
  reasonable heuristic for deciding if a user is "too idle"
  in both cases when lots records being created quickly vs
  user is idle a very long time.
- to avoid the processing of changelog users each time GC is
  checking all conditions both least user record and time
  are tracked when changelog users are initialized or
  purged/canceled. Both values are stored as mdd_changelog
  fields mc_minrec and mc_mintime
- test 160g is changed to test the new approach when idle
  indexes are checked always along with idle time checks
- test 160s is added in sanity.sh to check heuristic approach
  with (idle_time * idle_records) value checking

Fixes: 3442db6faf68 ("LU-7340 mdd: changelogs garbage collection")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I6028f3164212a2377a4fc45b60a826c64f859099
Reviewed-on: https://review.whamcloud.com/45068
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14957 mdd: prepare xattrs before migration 41/44741/2
Lai Siyao [Thu, 5 Aug 2021 15:30:22 +0000 (11:30 -0400)]
LU-14957 mdd: prepare xattrs before migration

In directory migration, the xattrs should be prepared before starting
transaction, otherwise if remote MDT is down, which will cause local
MDT stuck as well.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I79279e7b0c051a7542a71066fffd4ad70f559368
Reviewed-on: https://review.whamcloud.com/44741
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14941 lnet: Fix source specified to routed destination 30/44730/5
Chris Horn [Thu, 12 Aug 2021 21:16:05 +0000 (16:16 -0500)]
LU-14941 lnet: Fix source specified to routed destination

If a source NI is specified for a send then we should not modify the
destination NID that was passed to lnet_send().

HPE-bug-id: LUS-10301
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ie47558d5bce97a0dca30ff7d072dcd39eb903324
Reviewed-on: https://review.whamcloud.com/44730
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14940 lnet: Fix source specified send to different net 28/44728/3
Chris Horn [Thu, 12 Aug 2021 21:08:44 +0000 (16:08 -0500)]
LU-14940 lnet: Fix source specified send to different net

The destination NI is fixed for all source-specified sends. Thus, in
order for a source-specified send to be considered "local", i.e. a
send that does not require a route, the destination NID must be on
the same net as the specified source.

HPE-bug-id: LUS-10303
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4847db1d393bbc36def65123f260b928ebbf944e
Reviewed-on: https://review.whamcloud.com/44728
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14939 lnet: Allow specifying a source NID for lnetctl ping 27/44727/5
Chris Horn [Thu, 12 Aug 2021 16:26:07 +0000 (11:26 -0500)]
LU-14939 lnet: Allow specifying a source NID for lnetctl ping

Add a new --source option for lnetctl ping command. This allows the
user to specify a local NI from which to send the ping. This also
ensures that the specified destination NID is also used. Otherwise,
pings to multi-rail peers may end up going to a different peer NI
based on the multi-rail selection algorithm. The ability to specify
a source NI, and thus fix the destination NI, is a great help in
troubleshooting communication issues between multi-rail peers.

Add test to exercise lnetctl ping --source option.

HPE-bug-id: LUS-10296
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I454217b30a92414de537880f076a11a693b1f0b3
Reviewed-on: https://review.whamcloud.com/44727
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14867 doc: a few words about an asterisk in lfs quota 57/44357/4
Sergey Cheremencev [Thu, 10 Jun 2021 10:08:28 +0000 (13:08 +0300)]
LU-14867 doc: a few words about an asterisk in lfs quota

Clarify the difference between an asterisk printed per OST
in verbouse lfs quota output and an asterisk near the whole
filesystem usage.

Change-Id: I778fe1f7b1f6f8d55c311d81bd2b311d82463390
Test-Parameters: trivial
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/44357
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14437 gnilnd: use ktime_get_seconds() to get time 79/41679/4
Shaun Tancheff [Sat, 9 Oct 2021 04:33:23 +0000 (11:33 +0700)]
LU-14437 gnilnd: use ktime_get_seconds() to get time

Use ktime_get_seconds() to directly get the time inatead of
getting a timespec and converting it.

Fixes: 4b0e495e3c ("LU-14080 gnilnd: updates for SUSE 15 SP2")
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I256855ceb9e038a9960fa76fe6e3bfe63fb16580
Reviewed-on: https://review.whamcloud.com/41679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>