Whamcloud - gitweb
fs/lustre-release.git
23 months agoLU-13578 test: sleep longer in sanity test_39 46/47346/2
John L. Hammond [Fri, 13 May 2022 14:10:59 +0000 (09:10 -0500)]
LU-13578 test: sleep longer in sanity test_39

In sanity test_39r(), sleep for 2 * atime_diff rather than atime_diff + 1.

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=50
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ied508e12c848f6935d2317fb86bddc5341a6156e
Reviewed-on: https://review.whamcloud.com/47346
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-11170 test: increase time limit in sanity test_415 41/47341/4
John L. Hammond [Fri, 13 May 2022 12:29:59 +0000 (07:29 -0500)]
LU-11170 test: increase time limit in sanity test_415

In sanity test_415() double the time limit on renaming 500
files. Really we should not be doing arbitrary performance tests using
VMs on oversubscribed test nodes but whatever.

Test-Parameters: trivial testlist=sanity env=ONLY=415,ONLY_REPEAT=10
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: Ie2677df00f13ffc85d21ceba5ba7115fff6f0980
Reviewed-on: https://review.whamcloud.com/47341
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
23 months agoLU-15757 test: disable sanityn test_102() for zfs 11/47311/3
John L. Hammond [Thu, 12 May 2022 13:12:10 +0000 (08:12 -0500)]
LU-15757 test: disable sanityn test_102() for zfs

sanityn test_102() on zfs causes clients to crash on umount about half
the time. Disable this test until this can be sorted out.

Test-Parameters: trivial testlist=sanityn fstype=zfs mdtcount=4
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: If9f7c77f8c68eee88b3f050fc26c42c21828e2c9
Reviewed-on: https://review.whamcloud.com/47311
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
23 months agoLU-15830 mdt: mkdir to lookup target name 26/47226/4
Alex Zhuravlev [Fri, 6 May 2022 06:49:56 +0000 (09:49 +0300)]
LU-15830 mdt: mkdir to lookup target name

distributed mkdir should lookup the target name to
avoid rollback as much as possible as the latter is
very expensive due to llog re-initialization.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: If28760e0afb804dca11e1e7501e0a53ff9067ca1
Reviewed-on: https://review.whamcloud.com/47226
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15757 llite: check s_root ll_md_blocking_ast() 86/47086/2
Alex Zhuravlev [Tue, 19 Apr 2022 12:01:14 +0000 (15:01 +0300)]
LU-15757 llite: check s_root ll_md_blocking_ast()

ll_md_blocking_ast() can be called in the context of import
invalidation which in turn caused by umount. this way
ll_md_blocking_ast() and umount can race and ll_md_blocking_ast()
can found sb->s_root NULL which should be checked before
calling into is_root_inode().

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I61c1d29a7de3084ad1dfd0c216cee628418b7038
Reviewed-on: https://review.whamcloud.com/47086
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15827 osd: respect filldir buffer limits 24/47224/3
John L. Hammond [Thu, 5 May 2022 19:21:37 +0000 (14:21 -0500)]
LU-15827 osd: respect filldir buffer limits

In osd_ldiskfs_filldir() ensure that the encoded name also fits in the
buffer. In osd_ldiskfs_filldir() and obj_name2lu_name() remove a
superfluous and potentially incorrect check on names that appear to be
FIDs.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I1540f058801b002474ceac1206317344cebb1084
Reviewed-on: https://review.whamcloud.com/47224
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoRevert "LU-12019 build: Recognize Debian Kernel and set KMP dir" 38/47238/2
Minh Diep [Fri, 6 May 2022 17:14:19 +0000 (10:14 -0700)]
Revert "LU-12019 build: Recognize Debian Kernel and set KMP dir"

This reverts commit 230d4500d5a9dfada392199d77fc413382f24750.

Revert this commit because lustre failed to load on MOFED 5.5
See LU-15831 for details

Change-Id: I845431ad2126743c1ca9a59d1b56e1a35dbc9e38
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/47238
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15810 sec: fix enc dir migration 01/47201/3
Sebastien Buisson [Tue, 3 May 2022 15:30:18 +0000 (17:30 +0200)]
LU-15810 sec: fix enc dir migration

Now that the encryption context is stored in an xattr named
"encryption.c" instead of "security.c", we need to fetch this xattr
explicitly in case of encrypted directory migration. Indeed, there is
no xattr handler in ldiskfs for this "encryption." xattr type, so it
is not returned when listing all xattrs to migrate.

Fixes: 4231fab66e ("LU-13717 sec: make client encryption compatible with ext4")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I628f9b253e86343db0b71f6a5b1ad2c5728ca38d
Reviewed-on: https://review.whamcloud.com/47201
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoPost-branch-off of 2.15 master is now 2.16 2.15.50 v2_15_50
Oleg Drokin [Fri, 6 May 2022 19:13:11 +0000 (15:13 -0400)]
Post-branch-off of 2.15 master is now 2.16

Change-Id: I93ff129b038dff4f541ca9497ba21b77eaf82de8
Signed-off-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15787 sec: document enc-unaware clients on enc files 82/47182/2
Sebastien Buisson [Mon, 2 May 2022 13:36:00 +0000 (15:36 +0200)]
LU-15787 sec: document enc-unaware clients on enc files

Document the behavior of encryption-unaware clients when they access
encrypted files.

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I0354e3051e10aa0542baeb8e34c6201d47e65710
Reviewed-on: https://review.whamcloud.com/47182
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
23 months agoNew RC 2.15.0-RC4 2.15.0-RC4 v2_15_0-RC4
Oleg Drokin [Thu, 5 May 2022 18:52:44 +0000 (14:52 -0400)]
New RC 2.15.0-RC4

Change-Id: Iabd55b89a5551e5f42f7f6b112c7a7187da8ae7b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15815 llite: disable fast_read and workaround 04/47204/3
John L. Hammond [Tue, 3 May 2022 19:25:36 +0000 (14:25 -0500)]
LU-15815 llite: disable fast_read and workaround

Revert the fast_read stale data workaround from LU-14541 and disable
fast_read by default. The workaround causes applications to receive
spurious SIGBUGs when reclaim is concurrent with mmap page fault
handlers. We disable fast read to avoid the stale data issue entirely.

This reverts commit f2a16793fa4316fc9ccdc46bcfe54f6b8d1e442b and
re-exposes us to the consistency issues described in LU-14541.

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I716b1fd6ab22242b9267b8883f0371a360aaecef
Reviewed-on: https://review.whamcloud.com/47204
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15761 obdclass: fix locking in llog_cat_refresh() 85/47185/3
Alex Zhuravlev [Mon, 2 May 2022 18:38:47 +0000 (21:38 +0300)]
LU-15761 obdclass: fix locking in llog_cat_refresh()

the patch fixes two problems:
1) pairing up_write() should be used with cathandle
2) llog_read_header() manipulates loghandle's internal
   structures (header, last_idx, etc) which are supposed
   to stay consistent from another user's point of view
   (like llog_add_rec())

Fixes: 71f409c9b31b ("LU-11418 llog: refresh remote llog upon -ESTALE")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib86e10a925b541d02c22d74e6ddbc4368345ac11
Reviewed-on: https://review.whamcloud.com/47185
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15803 sec: correctly handle page lock in ll_io_zero_page 70/47170/5
Sebastien Buisson [Thu, 28 Apr 2022 13:34:57 +0000 (15:34 +0200)]
LU-15803 sec: correctly handle page lock in ll_io_zero_page

In ll_io_zero_page(), we need to make sure we have locked the page,
and it is up-to-date, before zeroing. So modify ll_io_read_page()
behavior to not disown the clpage for our use case. It avoids being
exposed to concurrent modifications.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I58e4cf80374a798c9c4302364cf2fb39da9033bb
Reviewed-on: https://review.whamcloud.com/47170
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15787 sec: block enc unaware clients on enc files 56/47156/5
Sebastien Buisson [Wed, 27 Apr 2022 15:33:57 +0000 (17:33 +0200)]
LU-15787 sec: block enc unaware clients on enc files

Prevent encryption unaware clients from manipulating encrypted files
and directories. Those can be old clients, or clients built without
encryption support (intentionally or because they run on an old
kernel).
In the mdt layer, check that clients have the OBD_CONNECT2_ENCRYPT
connection flag, and if not, block access if they try to manipulate
a file or directory that has the LUSTRE_ENCRYPT_FL flag.
The forbidden operations from encryption unaware clients are:
- open
- create
- link
- rename
- migrate
Improve sanity-sec test_54 to test this use case.

Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 serverdistro=el7.9
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ief0639e49c0a8e1a1a0cb19cb13c006edfdff6c4
Reviewed-on: https://review.whamcloud.com/47156
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: John L. Hammond <jhammond@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15479 tests: clean up sanity test_316/test_319 59/46959/3
Andreas Dilger [Wed, 30 Mar 2022 18:30:13 +0000 (12:30 -0600)]
LU-15479 tests: clean up sanity test_316/test_319

These tests use the deprecated "lfs mv" insted of "lfs migrate".

Fix test_316 to specify which MDT to create 'd' on, so MDT space
balancing does not result in 'd' being created on MDT0001, which
defeats the purpose of the test trying to migrate it to MDT0001.

Fix test_319 to call "lfs migrate" on the parent directory and
not on the test file, since this returns an "ENOTDIR" error.

Also fix some minor script style issues.

Test-Parameters: trivial testlist=sanity env=ONLY="316 319",ONLY_REPEAT=100 mdscount=2 mdtcount=4
Test-Parameters: fstype=zfs testlist=sanity env=ONLY="316 319",ONLY_REPEAT=100 mdscount=2 mdtcount=4
Fixes: ae7ca90713b4 ("LU-11926 ldlm: Lost lease lock on migrate error")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I08d55ce70cbd900c2f4d31b27baa1a33423ebbe5
Reviewed-on: https://review.whamcloud.com/46959
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15740 tests: add more stats to runtests 65/47065/3
Andreas Dilger [Wed, 13 Apr 2022 20:18:32 +0000 (14:18 -0600)]
LU-15740 tests: add more stats to runtests

Print out the space usage at the start, middle, and end of
runtests, so that it is easier to see where the space is
going, and how much is used at peak consumption.  The goal
of the test is to avoid space leakage in object create and
destroy, but it is OK if there is some usage for internal
files like llogs, quotas, etc.

Move the "mkdirmany" call to the first phase, add a statmany
to the middle phase, and "rmdirmany" to the end so that it
is also checking the directory validity after a remount.

Update createmany and statmany to be a bit easier to use.

Test-Parameters: trivial testlist=runtests env=SLOW=yes mdscount=2 mdtcount=4
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iec8cb56501c7e75b620951a2d669b0dd6bb0a36f
Reviewed-on: https://review.whamcloud.com/47065
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: cliff white <cwhite@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15575 tests: interop skip sanity/103e before 2.14.52 25/47125/2
Andreas Dilger [Wed, 20 Apr 2022 19:13:42 +0000 (13:13 -0600)]
LU-15575 tests: interop skip sanity/103e before 2.14.52

Older MDSes can *set* default ACLs with many entries, but it
causes an error when trying to *create* files that inherit them.
Skip test_103e for older MDS versions.

Test-Parameters: trivial testlist=sanity env=ONLY=103e serverversion=2.14.0
Fixes: aa92caa21fa2 ("LU-14430 mdt: fix maximum ACL handling")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4fbed3f788020c6d158ba83496ebd5cd68ba57cb
Reviewed-on: https://review.whamcloud.com/47125
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15358 tests: fix check in sanityn test_80b 06/46706/5
Andreas Dilger [Fri, 4 Mar 2022 19:04:10 +0000 (12:04 -0700)]
LU-15358 tests: fix check in sanityn test_80b

Shellcheck found checks in sanityn test_80b using bad logic:

        [ $rc -ne 0 -o $rc -ne 16 ] || {
                echo "touch file failed with $rc1"
                break;
        }

This can never be false, so the subtest will never detect errors.
Fix these checks, along with some related style issues.

Test-Parameters: trivial testlist=sanityn mdscount=2 mdtcount=4 env=ONLY=80,ONLY_REPEAT=50
Fixes: 220e6cbfa65c5 ("LU-6475 mdt: race between open and migrate")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I541fe6f3ae253ea1c4d7fa2bcfad9052e374e60c
Reviewed-on: https://review.whamcloud.com/46706
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-14826 tests: fix sanityn.sh test_102 version check 93/46693/3
Andreas Dilger [Thu, 3 Mar 2022 18:41:58 +0000 (11:41 -0700)]
LU-14826 tests: fix sanityn.sh test_102 version check

The MDS version check in test_102 had a syntax error, found
by 'shellcheck'.

Also quiet spurios error messages from check_fhandle_syscalls
if the "lctl" binary cannot be found.

Fixes: cbc62b0b829 ("LU-14826 mdt: getattr_name("..") under striped directory")
Test-Parameters: trivial testlist=sanityn mdscount=2 mdtcount=4 env=ONLY=102,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iee432bc147f63809fea095c4ce9e1694f7ce7057
Reviewed-on: https://review.whamcloud.com/46693
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15645 obdclass: llog to handle gaps 37/46837/9
Alex Zhuravlev [Wed, 16 Mar 2022 09:10:38 +0000 (12:10 +0300)]
LU-15645 obdclass: llog to handle gaps

due to old errors an update llog can contaain gaps in index.
this shouldn't block llog processing and recovery. actual
gaps in transaction sequence should be catched by VBR.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I11ec817e356f9658118c34706ef3a533e7faba83
Reviewed-on: https://review.whamcloud.com/46837
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
23 months agoLU-15748 osc: fallocate interop for 2.14 clients 98/47098/4
Arshad Hussain [Wed, 20 Apr 2022 09:07:53 +0000 (05:07 -0400)]
LU-15748 osc: fallocate interop for 2.14 clients

fallocate() start and end are passed in o_size and o_blocks
on the wire.  Clients 2.15.0 and newer should always set
the OBD_MD_FLSIZE and OBD_MD_FLBLOCKS valid flags, but some
older client versions did not.  We permit older clients to
not set these flags, checking their version by proxy using
the missing OBD_CONNECT_TRUNCLOCK to imply 2.14.0 or older.

Test-Parameters: serverversion=2.14.0 testlist=sanity env=SANITY_EXCEPT="64h 103e"
Fixes: 2f496148c31d ("LU-15551 Return EOPNOTSUPP instead of EPROTO")
Fixes: 163870abfb7c ("LU-14382 mdt: implement fallocate in MDC/MDT")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I1ea47854f40d54297bceb03ad32b24737efa4ae7
Reviewed-on: https://review.whamcloud.com/47098
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
23 months agoLU-15719 doc: Update changelog to properly reflect supported kernels 93/46993/3
Oleg Drokin [Tue, 5 Apr 2022 16:37:32 +0000 (12:37 -0400)]
LU-15719 doc: Update changelog to properly reflect supported kernels

According to https://wiki.lustre.org/Release_2.15.0

Also update e2fsprogs version from the same document

Change-Id: Ife26db67b543bb83651b56df41beaf81fea97721
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46993
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoNew RC 2.15.0-RC3 2.15.0-RC3 v2_15_0-RC3
Oleg Drokin [Sun, 3 Apr 2022 16:09:39 +0000 (12:09 -0400)]
New RC 2.15.0-RC3

Change-Id: If8b76076bedd13b39dac94ab582e1855feb6add4
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15670 clio: Disable lockless for DIO with O_APPEND 90/46890/4
Shaun Tancheff [Tue, 22 Mar 2022 13:08:35 +0000 (08:08 -0500)]
LU-15670 clio: Disable lockless for DIO with O_APPEND

Lockless O_DIRECT with O_APPEND can allow interleaved / racy
appends from concurrent I/O.

Disable lockless I/O when O_APPEND is set

HPE-bug-id: LUS-9776
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I5c56f92c90e631c295f56e5958985f516e1990f8
Reviewed-on: https://review.whamcloud.com/46890
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15616 lnet: ln_api_mutex deadlocks 27/46727/7
Chris Horn [Mon, 7 Mar 2022 17:03:50 +0000 (11:03 -0600)]
LU-15616 lnet: ln_api_mutex deadlocks

LNetNIFini() acquires the ln_api_mutex and holds onto it throughout
various shutdown routines. Meanwhile, LND threads (via
lnet_nid2peerni_locked()) or the discovery thread (via
lnet_peer_data_present()) may need to acquire this mutex in order to
progress.

Address these potential deadlocks by setting the_lnet.ln_state to
LNET_STATE_STOPPING earlier in LNetNIFini(), and release the mutex
prior to any call into LND module or before any wait.

LNetNIInit() is modified to return -ESHUTDOWN if it finds that there
is a concurrent shutdown in progress.

HPE-bug-id: LUS-10681
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ia8b28cc95ff71e66a0f99aed4f2c22ec9d44ce1e
Reviewed-on: https://review.whamcloud.com/46727
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13714 lnet: only update gateway NI status on discovery 76/39176/6
Chris Horn [Mon, 14 Feb 2022 20:37:05 +0000 (20:37 +0000)]
LU-13714 lnet: only update gateway NI status on discovery

Move the NI status from DOWN to UP only when receiving
a discovery PING. The discovery PING should be the only
message which should update the NI status since it's used
as the gateway NI keep alive mechanism.

This is done to avoid the following scenario:

The gateway itself can push its updates to the peers which
have removed it from its routing table. The peers would
respond to the PUSH with an ACK, the ACK will bring the
gateway's NI status to up. Therefore other peers which have
avoid_asym_router_failure=1 will have their route status
remain up even though the symmetrical route is gone.

Note: there is no way for the gateway to differentiate between
a keep alive discovery and a manually triggered discovery or ping.
However, this a narrow case which will not be handled.

net_last_alive converted to use ktime_get_seconds() instead of
ktime_get_real_seconds() since the NTP adjustment is not needed.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ifd5b06d4cf783b68b36413ada63f0a1d0095fb5b
Reviewed-on: https://review.whamcloud.com/39176
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15363 tests: don't use lustre module to test lnet 34/45834/6
James Simmons [Wed, 30 Mar 2022 13:10:28 +0000 (09:10 -0400)]
LU-15363 tests: don't use lustre module to test lnet

Currently sanity-lnet.sh loads the lustre modules to properly
initialize the lnet modules. This doesn't work with the native
Linux client since it only starts up LNet after mounting the
file system. We shouldn't be using lustre to test lnet so
load lnet modules with config_on_load option to properly setup
the default LNet configuration.

Also fix ksocklnd-config to use bash so sanity-lnet.sh can
pass on Ubuntu.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ifffc51625f5c2ffbb3ab811b75739c0e6407a821
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/45834
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15545 utils: fix double free in lgss_sk 94/46594/2
Lei Feng [Wed, 23 Feb 2022 10:07:53 +0000 (18:07 +0800)]
LU-15545 utils: fix double free in lgss_sk

Fix double free issue in lgss_sk if write_config_file fails.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: Icd1673b27d699fba78b01fe53f587291e6c36ed6
Reviewed-on: https://review.whamcloud.com/46594
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
2 years agoLU-15692 lmv: change default hash back to fnv_1a_64 50/46950/2
Andreas Dilger [Wed, 30 Mar 2022 04:04:45 +0000 (22:04 -0600)]
LU-15692 lmv: change default hash back to fnv_1a_64

Until performance issue is resolved, change the default directory
hash type from 'crush' back to 'fnv_1a_64'.

Fixes: bb60caa1c6e7 ("LU-14459 lmv: change default hash type to crush")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If2cef70298773f42dc1c62809eea98519b3ebbe5
Reviewed-on: https://review.whamcloud.com/46950
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15702 lov: remove lo_trunc_stripeno 40/46940/2
John L. Hammond [Mon, 28 Mar 2022 17:24:54 +0000 (12:24 -0500)]
LU-15702 lov: remove lo_trunc_stripeno

Remove the lo_trunc_stripeno member of struct lov_layout_raid0 and add
an lis_trunc_stripe_index array to struct lov_io. This makes the
truncate stripe index information belong to the IO and not to the
concurrently accessed object. This is needed because we do not have
locking that protects it from its initialization in lov_io_iter_init()
to its use in lov_lock_sub_init(). Also remove the unused
lo_write_lock member of struct lov_object.

Fixes: 9801500451 ("LU-14128 lov: correctly set OST obj size")
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I799e07059212629fe9d84c5e58c349035a40da9e
Reviewed-on: https://review.whamcloud.com/46940
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for sanity.sh 41/46741/8
Arshad Hussain [Tue, 8 Mar 2022 15:08:18 +0000 (20:38 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for sanity.sh

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/sanity.sh

Test-Parameters: trivial testlist=sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I4223aff8bf740ba5710f58fae697768f3f704591
Reviewed-on: https://review.whamcloud.com/46741
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for sanity-lnet 24/46824/2
Arshad Hussain [Tue, 15 Mar 2022 09:07:12 +0000 (14:37 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for sanity-lnet

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/sanity-lnet.sh

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I55777e6b65a40c90225c40c29e16fbeb44c3411b
Reviewed-on: https://review.whamcloud.com/46824
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for sanity-flr.sh 97/46797/3
Arshad Hussain [Wed, 9 Mar 2022 08:31:24 +0000 (14:01 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for sanity-flr.sh

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/sanity-flr.sh

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I5ce498f9b373650885351ea105121139eb4ed35c
Reviewed-on: https://review.whamcloud.com/46797
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for recovery-small 41/46841/2
Arshad Hussain [Wed, 16 Mar 2022 12:01:36 +0000 (17:31 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for recovery-small

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/recovery-small.sh. This patch also
moves spaces to tabs.

Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I2db37f0579770ab5a51baf5db832d1f316e7cb14
Reviewed-on: https://review.whamcloud.com/46841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for setup-kerberos 22/46822/2
Arshad Hussain [Tue, 15 Mar 2022 08:28:23 +0000 (13:58 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for setup-kerberos

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/setup_kerberos.sh. This patch also
moves spaces to tabs.

Change-Id: I803c35b5fc0470a9eeb9ef3c230a0a01adc5b16c
Test-Parameters: envdefinitions=SHARED_KEY=true testlist=sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/46822
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for sanity-hsm 98/46798/2
Arshad Hussain [Thu, 10 Mar 2022 07:19:28 +0000 (12:49 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for sanity-hsm

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/sanity-hsm

Test-Parameters: trivial testlist=sanity-hsm
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I14b12affb9ec8b8c943569cedf103d8d5c8ec207
Reviewed-on: https://review.whamcloud.com/46798
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for conf-sanity 23/46823/3
Arshad Hussain [Tue, 15 Mar 2022 06:19:06 +0000 (11:49 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for conf-sanity

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/conf-sanity.sh. This patch also
moves spaces to tabs.

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I381af41adb5267e2fd879d4f8ef2c3ccdc10cdae
Reviewed-on: https://review.whamcloud.com/46823
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for recovery-double-scale 25/46825/5
Arshad Hussain [Tue, 15 Mar 2022 10:19:35 +0000 (15:49 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for recovery-double-scale

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/recovery-double-scale. This patch
also moves spaces to tabs.

Test-Parameters: trivial clientcount=6 mdtcount=2 mdscount=2 osscount=2 austeroptions=-R failover=true iscsi=1 env=FAILOVER_PERIOD=180 testlist=recovery-double-scale env=SLOW=yes
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iddb505f717bc87dd314d7bd988ba5db2271d8125
Reviewed-on: https://review.whamcloud.com/46825
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for sanity-dom 96/46796/2
Arshad Hussain [Wed, 9 Mar 2022 07:27:25 +0000 (12:57 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for sanity-dom

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/sanity-dom.sh

Test-Parameters: trivial testlist=sanity-dom
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I185622dcb30abbc43a2118cbfaa9c643f8ce52bc
Reviewed-on: https://review.whamcloud.com/46796
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15626 tests: Fix "error" reported by shellcheck for sanity-lfsck 95/46795/2
Arshad Hussain [Fri, 11 Mar 2022 05:46:19 +0000 (11:16 +0530)]
LU-15626 tests: Fix "error" reported by shellcheck for sanity-lfsck

This patch fixes "error" issues reported by shellcheck
for file lustre/tests/sanity-lfsck

Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I3f85cb04f93125ed132831c14a8a3f3616c99a0d
Reviewed-on: https://review.whamcloud.com/46795
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15614 tests: Variable incorrectly used in sanity-flr/203 90/46690/2
Arshad Hussain [Thu, 3 Mar 2022 08:50:52 +0000 (14:20 +0530)]
LU-15614 tests: Variable incorrectly used in sanity-flr/203

Under sanity-flr.sh/203 variable 'old_id' and
'new_id' is defined incorrectly as 'oldid' and
'newid'. This was exposed using shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In lustre/tests/sanity-flr.sh line 3258:
[[ x$oldid = x$newid ]] ||
^-- SC2154: oldid is referenced but not assigned (did you mean 'old_id'?).
^-- SC2154: newid is referenced but not assigned (did you mean 'new_id'?).
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I37949035f50dfdd33e5181ad888f68fcb204c385
Reviewed-on: https://review.whamcloud.com/46690
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15612 tests: Replace unicode double quotes 87/46687/3
Arshad Hussain [Thu, 3 Mar 2022 04:02:54 +0000 (09:32 +0530)]
LU-15612 tests: Replace unicode double quotes

Under conf-sanity.sh replace unicode double
quotes with ASCII. This was exposed using
shellcheck.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In lustre/tests/conf-sanity.sh line 335:
umount_client $MOUNT -f || error ...
^-- SC1015: This is a unicode double quote. Delete and retype it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Test-Parameters: trivial testlist=conf-sanity
Change-Id: I89e510a1d16059079378fc9399c1377017870477
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/46687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15548 osd-ldiskfs: hide virtual projid xattr 00/46900/3
Li Dongyang [Wed, 23 Mar 2022 09:42:51 +0000 (20:42 +1100)]
LU-15548 osd-ldiskfs: hide virtual projid xattr

Add tunable enable_projid_xattr to hide the virtual
project ID xattr by default.

Change-Id: I21263d91599f9e2d5850cb9d94a8b6df90c8443c
Test-Parameters: trivial testlist=conf-sanity env=ONLY=131
Test-Parameters: testlist=sanity env=ONLY=904
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/46900
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15637 llite: Fix use of uninitialized fields 76/46776/3
Patrick Farrell [Thu, 10 Mar 2022 03:16:50 +0000 (22:16 -0500)]
LU-15637 llite: Fix use of uninitialized fields

We use data from ci_rw to set io_start_index and
io_end_index, which is a problem for mmap because mmap does
not use ci_rw.

When ci_rand_read is set or readahead is disabled, we use
these values to decide how much data to read.

ci_rw is uninitialized, and if the values are non-zero,
we may try to read data beyond the locks we took for our
I/O.

If there is no lock (either because there was never one or
it was cancelled), this results in an LBUG in
osc_req_attr_set when it verifies the pages are covered by
a lock.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If7c8d2eb87a28bf76a6f959e7be7bf636c887cfe
Reviewed-on: https://review.whamcloud.com/46776
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15435 ptlrpc: unregister reply buffer on rq_err 32/46132/11
Alexander Zarochentsev [Fri, 14 Jan 2022 15:35:48 +0000 (10:35 -0500)]
LU-15435 ptlrpc: unregister reply buffer on rq_err

Unregister reply buffer on rq_err and prevent a late reply from
modifying request flags in INTERPRET state.

Fixes: cefabee52586 ("LU-15112 mgc: do not ignore target registration failure")
HPE-bug-id: LUS-10717

Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I0106e3fd5443c1292c103247cdbf6122f91922e8
Reviewed-on: https://review.whamcloud.com/46132
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15571 tests: save/restore debug mask for interop 36/46636/5
Andreas Dilger [Sat, 26 Feb 2022 03:17:21 +0000 (20:17 -0700)]
LU-15571 tests: save/restore debug mask for interop

The "createmany" and "unlinkmany" wrappers were saving the debug
mask on the client and restoring it on the server, which fails in
interop testing if the client has a debug mask set that is unknown
on the server.

Use the debugsave() and debugrestore() helpers to do this properly.

Fixes: 40d286e11138 ("LU-15317 llite: Add D_IOTRACE")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I36fc06d5a62e34d63619fa977d9f80254bbc073e
Reviewed-on: https://review.whamcloud.com/46636
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15661 nodemap: fix map mode value for 'both' 70/46870/2
Sebastien Buisson [Fri, 18 Mar 2022 16:43:31 +0000 (01:43 +0900)]
LU-15661 nodemap: fix map mode value for 'both'

The patch that introduced the ability to map project IDs with
nodemap changed the value used for the "map both uid and gid"
case, from 0 to 3.
This poses a problem in case of upgrade from a previous Lustre
version, so re-introduce the value 0 as NODEMAP_MAP_BOTH_LEGACY.

Change-Id: I1f605de9c97faff32411da5052e8782a60645767
Fixes: 8a770616a5 ("LU-14797 sec: add projid to nodemap")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46870
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15551 ofd: Return EOPNOTSUPP instead of EPROTO 16/46516/13
Arshad Hussain [Mon, 14 Feb 2022 10:02:08 +0000 (15:32 +0530)]
LU-15551 ofd: Return EOPNOTSUPP instead of EPROTO

Modify server to return -EOPNOTSUPP instead of
-EPROTO for unsupported fallocate modes

Test-Parameters: serverversion=2.14.0 testlist=sanity env=ONLY=150
Test-Parameters: serverversion=2.14.0 testlist=sanity-flr env=ONLY=50
Test-Parameters: serverversion=2.14.0 testlist=ost-pools env=ONLY="29 31"
Test-Parameters: serverversion=2.14.0 testlist=sanity-benchmark env=ONLY=fsx
Test-Parameters: serverversion=2.14.0 testlist=sanity-dom env=ONLY=fsx
Test-Parameters: serverversion=2.14.0 testlist=sanityn env=ONLY=16
Fixes: 7462e8cad730 ("LU-14160 fallocate: Add punch mode to fallocate")
Signed-off-by: arshad.hussain@aeoncomputing.com
Change-Id: Id203c0b9abbdd674af33f1f78e81ae7fe105e90f
Reviewed-on: https://review.whamcloud.com/46516
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15634 ptlrpc: Use after free of 'conn' in rhashtable retry 63/46763/2
Shaun Tancheff [Wed, 9 Mar 2022 08:53:24 +0000 (02:53 -0600)]
LU-15634 ptlrpc: Use after free of 'conn' in rhashtable retry

Use after free of 'conn' in the uncommon case of
rhashtable_lookup_get_insert_fast failing with -EBUSY or -ENOMEM

Move OBD_FREE_PTR(conn) below the retry and set conn2 to NULL
on error, propagating to conn and returning to the caller.

HPE-bug-id: LUS-10776
Fixes: 37b29a8f70 ("LU-8130 ptlrpc: convert conn_hash to rhashtable");
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2fb27d4e8fa6a5324d0a8e06afe34a39fa622bc2
Reviewed-on: https://review.whamcloud.com/46763
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15506 tests: fix sha1sum error for disk images with multiple mdts 04/46404/5
Andreas Dilger [Tue, 1 Feb 2022 10:20:53 +0000 (03:20 -0700)]
LU-15506 tests: fix sha1sum error for disk images with multiple mdts

For new disk2_10-ldiskfs and disk2_12-ldiskfs images,
check remote_dir for sha1sum test. For dne image, check
striped_dir.

One minor ost2 replace_nids fix for DNE test images.

Add verbose debug messages to make test flow more clear.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=32 mdscount=1 mdtcount=1
Test-Parameters: testlist=conf-sanity env=ONLY=32 mdscount=2 mdtcount=4
Fixes: f2143c0790bb ("LU-11643 tests: add new images and tests for upgrade")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I582bcdbf72d72e6da636559a24b1ecc89553c895
Reviewed-on: https://review.whamcloud.com/46404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15546 mdt: mdt_reint_open lookup before locking 79/46679/4
Etienne AUJAMES [Wed, 2 Mar 2022 17:58:20 +0000 (18:58 +0100)]
LU-15546 mdt: mdt_reint_open lookup before locking

This patch is an optimization of 33dc40d ("LU-10262 mdt:
mdt_reint_open: check EEXIST without lock").

The current behavior is to take a LCK_PR on parent to verify if the
file exist and then take a LCK_PW to create the file.

Here we do a lookup to determine the mode before tacking a lock.
This avoid to re-lock each time for create cases.

Most of the time we have:
1. lookup the child in parent directory
2. take the parent lock: file_exist ? LCK_PR : LCK_PW
3. re-lookup the child

In a race senario (create/unlink) we have:
1. lookup child in parent directory -> file exists
2. take a LCK_PR on the parent
3. re-lookup the child -> file doesn't exist
2. take a LCK_PW on the parent
4. re-lookup the child

This patch fix the "SKIP" condition for sanityn 41i/43k/45j and clear
the LRU locks cache for sanityn 43k/45j.

Fixes: 33dc40d ("LU-10262 mdt: mdt_reint_open: check EEXIST without lock")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I121abd4babfb516d7a64682b054a6443d38590ef
Reviewed-on: https://review.whamcloud.com/46679
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15608 sec: fix DIO for encrypted files 64/46664/6
Sebastien Buisson [Tue, 1 Mar 2022 16:26:09 +0000 (17:26 +0100)]
LU-15608 sec: fix DIO for encrypted files

With Direct IO, we do not have proper page cache pages. So we need to
retrieve by ourselves the page mapping and the page index of the page
to be encrypted/decrypted.

For the index, we need to use the offset of the page within the file,
and not the object.
So we rename cl_page's cp_osc_index to cp_page_index for that purpose.
cp_osc_index is redundant with osc_async_page's oap_obj_off and only
used by osc_index(), so we also adapt this function.
cp_page_index is initialized in cl_page_alloc(), and accessed in
the OSC layer where the llcrypt primitives are called.

For the mapping, problem is page->mapping is not set to NULL on page
allocation, so it cannot safely be used to see if a page is a direct
I/O page.
Use cl_page for direct I/O and page->mapping for buffered
I/O.  (clpage->cp_inode is only set for direct I/O and
cannot easily be always set.)
Without this, we sometimes get panics when page2inode is
used in the OSC layer.  (Note the remaining use in dom is
safe because ll_dom_readpage is a page cache helper and
will never see DIO pages.)

Fixes: a71e0dd7f7 ("LU-14306 sec: get rid of bad rss-counter state messages")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Icb53a4e45463b8d3febc2e6212b39dc25719d866
Reviewed-on: https://review.whamcloud.com/46664
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15021 quota: protect lqe_glbl_data in lqe 98/45098/9
Hongchao Zhang [Wed, 13 Oct 2021 09:02:23 +0000 (17:02 +0800)]
LU-15021 quota: protect lqe_glbl_data in lqe

The lqe_glbl_data in "struct lquota_entry" is allocated in
qmt_lvbo_init and freed in qmt_lvbo_free, it could be freed
during qmt_seed_glbe called by qmt_set_id_notify, and cause
panic because of using freed memory.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I274f07ee8609c83852572be51625cc929a9130ec
Reviewed-on: https://review.whamcloud.com/45098
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15589 tests: skip sanity test_230s for interop 98/46598/2
Andreas Dilger [Wed, 23 Feb 2022 19:24:12 +0000 (12:24 -0700)]
LU-15589 tests: skip sanity test_230s for interop

Sanity test_230s was added in 2.14.52 but incorrectly checked for
MDS version 2.13.57 for interop, likely because that was the version
present at the time the patch was originally written, but it was
only landed later.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=230s
Fixes: 65e3e4050ec5 ("LU-14366 mdt: lfs mkdir should return -EEXIST if exists")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5d046ad224a29558866453972c9c33b5da3a9037
Reviewed-on: https://review.whamcloud.com/46598
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15467 tests: fix sanity-hsm test_103a timeout issue 52/46252/2
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue

Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.

Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/46252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.56 67/46567/4
Shaun Tancheff [Tue, 22 Feb 2022 07:28:50 +0000 (01:28 -0600)]
LU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.56

Prior to v2_14_55-29-g06e586016d setting create_count greater
than the maximum returned -ERANGE.

During interop testing skip sanity/823 for MDS older than 2.14.56.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=823
Fixes: 06e586016d3a ("LU-13941 osp: Silently lower requested create_count to maximum")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie79617deea047b2a846f696473b9c2b5681953be
Reviewed-on: https://review.whamcloud.com/46567
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15601 osd-ldiskfs: handle read_inode_bitmap() error 60/46660/3
Andreas Dilger [Tue, 1 Mar 2022 05:14:37 +0000 (22:14 -0700)]
LU-15601 osd-ldiskfs: handle read_inode_bitmap() error

Correctly handle a PTR_ERR() error return from read_inode_bitmap().
This changed in upstream kernel commit v4.3-rc2-17-g9008a58e5dce,
so handle this for both types of return value.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I184c09b300ed69c29e4a7ef343f473b67080381f
Reviewed-on: https://review.whamcloud.com/46660
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15513 lod: skip uninit component in lod_fill_mirrors 35/46435/5
Andreas Dilger [Wed, 2 Feb 2022 22:05:18 +0000 (15:05 -0700)]
LU-15513 lod: skip uninit component in lod_fill_mirrors

Do not iterate over the "objects" in lod_fill_mirrors() to check
for non-rotational OSTs if the component is uninitialized.  In
cases where an OST is not present (e.g. sparse OST indexes used)
the lod_tgt_desc[] array has holes and OST_TGT() returns NULL.

Skip the loop entirely if the component is not initialized, but
also add some sanity checks to verify that the OST index values
are sane in case there are other problems in the future (e.g.
corrupt/invalid layout on disk).

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8ec23367059a4ec9e483adb768095b24f03ebbe5
Reviewed-on: https://review.whamcloud.com/46435
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15316 tests: use integers in sanity test_255a 50/46350/2
Andreas Dilger [Fri, 28 Jan 2022 05:51:24 +0000 (22:51 -0700)]
LU-15316 tests: use integers in sanity test_255a

The [[ ... > ... ]] operator doesn't really compare floats, it
compares strings.  That works as expected if the strings are
the same length, but fails for comparisons like [[ 32 > 123 ]].
Use (( ... > ... )) for comparisons, and only use integer values.

This test has been failing intermittently forever, but the error
was ignored because of running in a VM.

Test-Parameters: trivial
Fixes: f3b8f3fad502 ("tests: fix float comparison in sanity test_255a")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6787082cd579ae3f1bdd43222a739c939d3ebbe5
Reviewed-on: https://review.whamcloud.com/46350
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15010 tests: skip sanity test_64g for interop 65/46565/4
Andreas Dilger [Sun, 20 Feb 2022 18:43:33 +0000 (11:43 -0700)]
LU-15010 tests: skip sanity test_64g for interop

Sanity test_64g checks code that was only added in 2.14.56.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=64g
Fixes: 6e116213e3fd ("LU-15010 mdc: add support for grant shrink")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I339231f1b7890e8fffe7e079a052b15f54d4a050
Reviewed-on: https://review.whamcloud.com/46565
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14514 tests: skip sanity-flr test_44b for interop 64/46564/3
Andreas Dilger [Sun, 20 Feb 2022 18:28:08 +0000 (11:28 -0700)]
LU-14514 tests: skip sanity-flr test_44b for interop

Sanity-flr test_44b checks code that was only added in 2.14.56.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity-flr env=ONLY=44b
Fixes: 83c790cbf2f8 ("LU-14514 flr: mirror split should not make stale file")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65cc59dbde3f9711b915a56730e221e224e9b715
Reviewed-on: https://review.whamcloud.com/46564
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15060 tests: skip sanity-flr test_208 in interop 63/46563/4
Andreas Dilger [Sun, 20 Feb 2022 18:07:19 +0000 (11:07 -0700)]
LU-15060 tests: skip sanity-flr test_208 in interop

Sanity test_208[ab] check a feature that was only landed in 2.14.55.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity-flr env=ONLY=208
Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7735582ed9683d686396c14e2a4e254c648f7546
Reviewed-on: https://review.whamcloud.com/46563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15584 utils: ppc64le __le64_to_cpu type mismatch 88/46588/2
Gian-Carlo DeFazio [Tue, 22 Feb 2022 20:23:30 +0000 (12:23 -0800)]
LU-15584 utils: ppc64le __le64_to_cpu type mismatch

Cast values returned by __le64_to_cpu to
long long unsigned int. This is to match print format
strings that use %llx. This mismatch was resulting in a
build failure for ppc64le.

Build log message:
llog_reader.c:921:42: error: format '%llx' expects
argument of type 'long long unsigned int', but
argument 3 has type 'long unsigned int'

Fixes: 80447caf980 LU-14926 utils: print unlink and setattr recs in llog_reader
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I939b94626d2707b6ff644324c5c2798218331c4d
Reviewed-on: https://review.whamcloud.com/46588
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15574 tests: Skip test sanity/77o in interop 68/46568/4
Arshad Hussain [Mon, 21 Feb 2022 04:38:30 +0000 (10:08 +0530)]
LU-15574 tests: Skip test sanity/77o in interop

Test sanity/77o Server checksum proc entries was
introduced in 2.14.55.

During interop testing skip sanity/77o for
MDS and OST version lesser than 2.14.55.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=77o
Fixes: c18d5d892b62 LU-14889 lproc: Add server checksum_type
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Idb634ca349a6be01331057a473cc15747325a075
Reviewed-on: https://review.whamcloud.com/46568
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15577 tests: fix interop issue 66/46566/5
Alexander Zarochentsev [Sun, 20 Feb 2022 19:50:44 +0000 (19:50 +0000)]
LU-15577 tests: fix interop issue

Sanity test 831 expects MDS to have osp.*.max_sync_changes
tunable, appeared in 2.14.56.
Adding a check to skip older MDSes.

Fixes: c226e70007 ("LU-15114 osp: changes queuing throttle")
Test-Parameters: trivial serverversion=2.12 serverdistro=el7.9 testlist=sanity env=ONLY=831
Test-Parameters: trivial testlist=sanity env=ONLY=831
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I911f7b0d9dd606f08f544fce55bf8bcfe9fb69e3
Reviewed-on: https://review.whamcloud.com/46566
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15572 util: mirror delete with old MDS 14/46614/5
Bobi Jam [Fri, 25 Feb 2022 08:34:07 +0000 (16:34 +0800)]
LU-15572 util: mirror delete with old MDS

old MDS does not support mirror delete without volatile file and
clobbers the intent close error as -EBUSY, this patch catch the
ambiguous error and try the mirror delete using old way.

Fixes: b2d73351e6 ("LU-14521 flr: delete mirror without volatile file")
Test-Parameters: trivial
Test-Parameters: serverversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61 203"
Test-Parameters: clientversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61 203"
Test-Parameters: serverversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61 203" serverdistro=el7.9
Test-Parameters: clientversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61 203"
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I497118cbb7da871268f0fdd6bdb88ad6bd831a26
Reviewed-on: https://review.whamcloud.com/46614
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15512 lnet: Stop discovery on deleted peer NI 29/46429/2
Chris Horn [Wed, 2 Feb 2022 18:37:00 +0000 (18:37 +0000)]
LU-15512 lnet: Stop discovery on deleted peer NI

lnet_discover_peer_locked() needs to check whether the peer NI that is
undergoing discovery has been deleted (i.e. its assocaited peer has
LNET_PEER_MARK_DELETED state). Otherwise, we may enter an infinite
loop because this peer will never be considered up to date.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: fd32cd817c ("LU-13895 lnet: Prevent discovery on deleted peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I43d276fc460241c1724c8e30913bb6c5cbb7c8f4
Reviewed-on: https://review.whamcloud.com/46429
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15555 ldiskfs: large directory causes htree corruption 26/46526/2
Andrew Perepechko [Mon, 14 Feb 2022 13:35:10 +0000 (16:35 +0300)]
LU-15555 ldiskfs: large directory causes htree corruption

When creating a lot of files in a single directory, it can
get corrupted because of a typo in ext4-kill-dx-root.patch.

Change-Id: Ia36278580741e1eb905e24a3a6231ba7daaa882a
Fixes: 20a6d32 ("LU-12637 kernel: RHEL 8.1 server support")
HPE-bug-id: LUS-10730
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46526
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew RC 2.15.0-RC2 2.15.0-RC2 v2_15_0-RC2
Oleg Drokin [Tue, 8 Feb 2022 00:12:39 +0000 (19:12 -0500)]
New RC 2.15.0-RC2

Change-Id: Idfbc2ff63d48e2b3ca4801905e1d6ee7667ac427
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew release candidate 2.15.0-RC1 2.15.0-RC1 v2_15_0-RC1
Oleg Drokin [Mon, 7 Feb 2022 23:42:00 +0000 (18:42 -0500)]
New release candidate 2.15.0-RC1

Change-Id: I6a62dffa8d2a1159b9a0abfd0659f8544a0daeab
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15422 build: Update ZFS version to 2.0.7 06/46006/8
James Simmons [Thu, 3 Feb 2022 19:23:16 +0000 (14:23 -0500)]
LU-15422 build: Update ZFS version to 2.0.7

Update ZFS version to 2.0.7. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.0.7

Change-Id: I5dcff31af1458c5c9d2fe17256e31751535578d8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46006
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15404 ldiskfs: truncate during setxattr leads to kernel panic 58/46358/9
Andrew Perepechko [Mon, 31 Jan 2022 16:55:31 +0000 (19:55 +0300)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic

When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.

A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-on: https://review.whamcloud.com/46358
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12585 mdt: Add read/write latency to MDT stats 29/46229/4
Patrick Farrell [Thu, 27 Jan 2022 21:42:23 +0000 (16:42 -0500)]
LU-12585 mdt: Add read/write latency to MDT stats

The MDT does not currently record latency stats for reads
and writes.

Add this, and change the naming to be the same as for the
OFD.

Note on this:
Existing naming on the MDT uses "read/write" instead of
"{read,write}_bytes", which is inconsistent with OFD and
also inconsistent within the MDT, since other ops without
the "_bytes" suffix are latency.

It's not ideal to change the names of existing stats, but I
decided this was less problematic than leaving them
inconsistent.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7b7a5742678cbe0269086f37877833e877a5ca5f
Reviewed-on: https://review.whamcloud.com/46229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12585 obdfilter: Use actual I/O bytes in stats 75/46075/7
Patrick Farrell [Thu, 27 Jan 2022 21:37:46 +0000 (16:37 -0500)]
LU-12585 obdfilter: Use actual I/O bytes in stats

Currently the obdfilter stats note the number of bytes
requested by the client rather than the number of bytes
actually read or written.  This is particularly confusing
for reads because clients can request more data than
exists and some applications do this normally.

This results in statistics that can be off by almost any
amount from the actual number of bytes read.  This patch
moves the stats to be collected just before commit, which
allows the true number of bytes to be recorded but does not
include the commit time in the time stats.  (Since commit
time is not part of the operation latency as experienced by
the client.)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I81fe9a6afdad5b48e8421f4aa72b8ef10a0eee93
Reviewed-on: https://review.whamcloud.com/46075
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15503 quota: fix list entry usage 80/46380/5
Yang Sheng [Sat, 29 Jan 2022 14:24:17 +0000 (22:24 +0800)]
LU-15503 quota: fix list entry usage

Fetch next list entry.

Fixes: d527e81246 (LU-15283 quota: deadlock between reint & lquota_wb)
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I86befdfaa96151a6fd61902ffbf43ee8e5cae8cb
Reviewed-on: https://review.whamcloud.com/46380
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15417 build: find the new path for MOFED 5.5 83/46383/3
Jian Yu [Sun, 30 Jan 2022 08:09:37 +0000 (00:09 -0800)]
LU-15417 build: find the new path for MOFED 5.5

The path of the mofed header files has change to
/usr/src/ofa_kernel/x86_64/<kernel>,
so we cannot assume it's /usr/src/ofa_kernel/default.

Besides updating lbuild, we also need to update
lustre-lnet.m4 and lustre.spec.in.

Test-Parameters: trivial

Change-Id: Iab42ce9e458f78b0dc0233ac6fd23a1760be5324
Fixes: 94a3f1bfa70 ("LU-15417 build: build MOFED 5.5")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46383
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15502 llite: set default LMV hash type with 2.12 MDS 78/46378/8
Lai Siyao [Sat, 29 Jan 2022 05:14:40 +0000 (00:14 -0500)]
LU-15502 llite: set default LMV hash type with 2.12 MDS

If default LMV hash type is CRUSH, or unset, it should be converted
to fnv_16_64, because 2.12 MDS doesn't understand this.

Fix LMV_HASH_FLAG_KNOWN to match actual known flags.

Test-Parameters: testlist=sanity env=ONLY=300 mdtcount=2 serverversion=2.12 serverdistro=el7.9
Fixes: 0a1cf8da8069 ("LU-11025 dne: introduce new directory hash type: "crush")
Fixes: bb60caa1c6e7 ("LU-14459 lmv: change default hash type to crush")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie2ad5a456040dcd01bc2c5ab96db52bf944abbd2
Reviewed-on: https://review.whamcloud.com/46378
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14008 o2iblnd: avoid memory copy for short msg 62/40262/14
Alexey Lyashkov [Wed, 12 Aug 2020 14:59:50 +0000 (17:59 +0300)]
LU-14008 o2iblnd: avoid memory copy for short msg

Modern cards allow to send a kernel memory data without mapping
or copy to the preallocated buffer.
It reduce a lnet selftest cpu consumption by 3% for messages
less than 4k size.

Test-Parameters: trivial
HPe-bug-id: LUS-1796
Change-Id: I96c31be680c8ea7ac289a755df7f1d4c1c7f9aef
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40262
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15446 lnet: Don't use pref NI for reserved portal 78/46078/4
Chris Horn [Wed, 12 Jan 2022 19:19:21 +0000 (19:19 +0000)]
LU-15446 lnet: Don't use pref NI for reserved portal

Don't use the preferred NI when sending traffic on the LNet reserved
portal. This allows local recovery pings to utilize any local NI as
source in the case where we do not have a multi-rail peer entry for
the local host. This is typically the case when MR is not being
configured statically (i.e. when discovery is being used for MR
configuration).

lnet_get_best_ni() was modified to include health values of the NIs
being compared in its debug output.

HPE-bug-id: LUS-10658
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I38f5760bf034f698b7f44ffa89aa91c4f5d4b9ea
Reviewed-on: https://review.whamcloud.com/46078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15440 lnet: lnet_peer_data_present() memory leak 52/46052/3
Chris Horn [Tue, 11 Jan 2022 22:19:16 +0000 (16:19 -0600)]
LU-15440 lnet: lnet_peer_data_present() memory leak

If the ping buffer has nnis <= 1 then the ref on the ping buffer does
not get dropped. This causes a memory leak.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5e3c651ffecbe4f8860afb86770cecef23ebe862
Reviewed-on: https://review.whamcloud.com/46052
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15428 contrib: add branch_comm 31/46031/3
John L. Hammond [Mon, 10 Jan 2022 16:59:02 +0000 (10:59 -0600)]
LU-15428 contrib: add branch_comm

Add a branch comparison (branch_comm) to contrib/scripts.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I13c0b90a48d6d3215bf9959242c5671e83d27d7a
Reviewed-on: https://review.whamcloud.com/46031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15400 tests: sanity-lfsck MDT_DEVNAME fix 25/46025/8
Elena Gryaznova [Fri, 28 Jan 2022 14:34:59 +0000 (17:34 +0300)]
LU-15400 tests: sanity-lfsck MDT_DEVNAME fix

Global MDT_DEVNAME set at the start of sanity-lfsck
equal to a device-mapper device can not be used after
stop() because of a device-mapper device is removed and
facet device is restored:
  stop () ->
     elif dm_flakey_supported $facet; then
        if [[ -n ${!failover_host} && ${!failover_host} != ${!host} ]]
           dm_cleanup_dev $facet ->
              unexport_dm_dev $facet

Without this fix the tests:
    1a, 1b, 1c, 2a, 2b, 2c, 2d, 4, 5, 7a, 7b, 8, 30
fail on failover setup with:
    losetup: /dev/mapper/mds1_flakey: failed to set up loop device

To reproduce the failure just run:
  sh llmountcleanup.sh
  sh sanity-lfsck.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: 54b9e3f ("LU-684 tests: replace dev_read_only patch with dm-flakey")
Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10667
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2736406161d67335f465cf70eb9f21347a8a798f
Reviewed-on: https://review.whamcloud.com/46025
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15334 tests: cleanup conf-sanity test_30a 11/45811/2
Elena Gryaznova [Thu, 9 Dec 2021 18:17:14 +0000 (21:17 +0300)]
LU-15334 tests: cleanup conf-sanity test_30a

Fix typo: use error() instead of not existing fail(),
localize some variables.

Fixes: 5e546603cb ("b=15253 add conf_param -d to remove permanent settings")
Test-Parameters: trivial testlist=conf-sanity env=ONLY=30a
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I970d8422ba8ba75aca922d8ac6bac09c7cfcd67d
Reviewed-on: https://review.whamcloud.com/45811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
2 years agoLU-15506 tests: skip conf-sanity test_32b until fixed 03/46403/2
Andreas Dilger [Tue, 1 Feb 2022 10:04:19 +0000 (03:04 -0700)]
LU-15506 tests: skip conf-sanity test_32b until fixed

The new disk2_10-ldiskfs and disk2_12-ldiskfs images are failing
conf-sanity test_32b.  Rather than remove the images themselves,
which are large and would consume more space in Gerrit if removed
and re-added, instead skip them until the test can be fixed/o

Test-Parameters: trivial testlist=conf-sanity env=ONLY=32
Fixes: f2143c0790bb ("LU-11643 tests: add new images and tests for upgrade tests")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e6af27669773f67b16786aeffc24e995e3ebbe5
Reviewed-on: https://review.whamcloud.com/46403
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15268 mdt: reveal the real intent close error code 36/45636/6
Bobi Jam [Wed, 1 Dec 2021 12:05:49 +0000 (20:05 +0800)]
LU-15268 mdt: reveal the real intent close error code

mdt_mfd_close() clobbers the intent close error so that user space
tool only knows that the close intent hasn't finished and reports
-EBUSY instead of the real error code.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I72f474a73e8b73cdc35ca38eaaec5af182f63ca7
Reviewed-on: https://review.whamcloud.com/45636
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15459 llite: clear async errors on write commit sync 78/46178/4
Vladimir Saveliev [Mon, 24 Jan 2022 17:13:59 +0000 (20:13 +0300)]
LU-15459 llite: clear async errors on write commit sync

Async errors should be cleared after vvp_io_commit_sync(). Otherwise,
that will be done in ll_flush() called from
linux/fs/open.c:filp_close() and close(2) will fail. ll_flush()
replaces any error code with EIO which is confusing.

Test to illustrate the issue is added.
'P' mode is added to multiop. It is like 'w' but does only 1 write
call regardless to how many bytes were written.

HPE-bug-id: LUS-7529
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I6b7a1465268999b48a50f3584f3821f4b088303d
Reviewed-on: https://review.whamcloud.com/46178
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14802 nodemap: return proper error code 26/45626/9
Andreas Dilger [Fri, 19 Nov 2021 21:51:09 +0000 (14:51 -0700)]
LU-14802 nodemap: return proper error code

In nodemap_add_range_helper() it was always returning -ENOMEM when
there was an error inserting a new range into the existing nodemap.

    nodemap_add_range_helper()) cannot insert nodemap range: rc = -17
    mgs_iocontrol_nodemap()) MGS: OBD_IOC_NODEMAP command: rc = -12

This was confusing because the error returned by range_insert() was
typically -EEXIST (i.e. the entry being inserted already was in the
nodemap).  Do not print an error to the console in this common case.

Return the actual error to the caller so that this is more clear
to the end user.  Have l_ioctl() always set errno on error, in
addition to returning the error, since many callers depend on this.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2c80a11dfdf9e6e1c9a8235b8f74f5bcea68c08e
Reviewed-on: https://review.whamcloud.com/45626
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15478 lnet: Check LNET_NID_IS_ANY in LNET_NID_NET 92/46292/3
Chris Horn [Mon, 24 Jan 2022 22:02:25 +0000 (16:02 -0600)]
LU-15478 lnet: Check LNET_NID_IS_ANY in LNET_NID_NET

If LNET_NID_NET is passed the wildcard NID (LNET_ANY_NID) then we
should return the wildcard net (LNET_NET_ANY). This also allows NULL
to be used as an argument to LNET_NID_NET.

Fixes: 005bd7075c ("LU-10391 lnet: Change lnet_send() to take large-addr nids")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic2a7c9af31dcba285c266a872462cf179ab603fa
Reviewed-on: https://review.whamcloud.com/46292
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15218 quota: delete unused quota ID 48/45548/14
Hongchao Zhang [Fri, 21 Jan 2022 00:43:56 +0000 (08:43 +0800)]
LU-15218 quota: delete unused quota ID

Add lfs option '--delete' to delete unused quota ID.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I0d8e6b61dc23c7b22b6054bcced087b8dc94a277
Reviewed-on: https://review.whamcloud.com/45548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13176 mdd: rename file with different project ID 60/45660/8
Hongchao Zhang [Tue, 11 Jan 2022 15:12:55 +0000 (23:12 +0800)]
LU-13176 mdd: rename file with different project ID

This patch relaxes the limitation for rename between different
proeject IDs, and it will allow the normal file rename between
directories with different project IDs.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4a2c21248d1e12ad1d00430e11e5dd50fe5eaf60
Reviewed-on: https://review.whamcloud.com/45660
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15477 osc: osc_extent_wait() deadlock 81/46281/2
Andriy Skulysh [Tue, 11 Jan 2022 07:30:49 +0000 (09:30 +0200)]
LU-15477 osc: osc_extent_wait() deadlock

Thread 1:
vvp_io_write_commit
osc_io_commit_async
osc_page_cache_add
osc_extent_find
osc_extent_wait

Thread 2:
ptlrpcd_check
ptlrpc_check_set
brw_queue_work
osc_extent_make_ready
vvp_page_make_ready_start
__lock_page

We must not hold a page lock while we do osc_extent_find()

Change-Id: Idf669bc8d9c943f28e3f5986826b9637d66ecfca
HPE-bug-id: LUS-10414
Fixes: a7299cb012 "LU-9920 vvp: dirty pages with pagevec"
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46281
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15471 tests: use propper facet device 54/46254/2
Elena Gryaznova [Fri, 21 Jan 2022 15:59:14 +0000 (18:59 +0300)]
LU-15471 tests: use propper facet device

Tests which stop facet are to recalculate facet device
after stop as it changes when device mapper is used:
the device-mapper device is removed and facet device is
restored:
  stop () ->
     elif dm_flakey_supported $facet; then
        if [[ -n ${!failover_host} && ${!failover_host} != ${!host} ]]
           dm_cleanup_dev $facet ->
              unexport_dm_dev $facet

Without this fix sanity 17m, 17n 804 tests fail on failover
setup with:
  Cannot resolve path /dev/mapper/mds1_flakey
  e2fsck: No such file or directory while trying
                     to open /dev/mapper/mds1_flakey
and sanity 228b, 256, tests fail because of:
  mount: /dev/mapper/mds1_flakey: failed to setup loop device:
                     No such file or directory
  losetup: /dev/mapper/mds1_flakey: failed to set up loop device

To reproduce the failures -- just run:
  ONLY="17m 17n 228b 256 804" sh sanity.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: 54b9e3f789 ("LU-684 tests: replace dev_read_only patch with dm-flakey")
Test-Parameters: trivial testlist=sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9808
Change-Id: I02ce9d7cb7cf804fe0596d9aad7f995242c4b3af
Reviewed-on: https://review.whamcloud.com/46254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15465 tests: conf-sanity failed with code 95 36/46236/3
Elena Gryaznova [Fri, 21 Jan 2022 07:24:28 +0000 (10:24 +0300)]
LU-15465 tests: conf-sanity failed with code 95

conf-sanity tests 27b, 47 and 84 (the tests execute 'fail mds1' and
then 'cleanup' at the end of test) failed with code EOPNOTSUPP because
of 'set -e' and lfs df <non_lustre> return code 95.
The scenario:
test_27b () {
  facet_failover $SINGLEMDS
    change_active mds1
  ...
  cleanup -> umount_client $MOUNT
}
formatall
  stopall
    activemds=`facet_active mds1`
    if [ $activemds != "mds1" ]; then
       fail mds1
         clients_up
           lfs_df_check
             + local clients=fre0111,fre0112
             + local rc
             + [ -z fre0111,fre0112 ]
             + pdsh -S -w fre0111,fre0112
                 /usr/bin/lfs df /mnt/lustre << lustre not mounted
pdsh@fre0111: fre0111: ssh exited with exit code 95
pdsh@fre0111: fre0112: ssh exited with exit code 95

To reproduce the issue just run:
  ONLY="27b" sh conf-sanity.sh or:
  ONLY="47" sh conf-sanity.sh or:
  ONLY="84" sh conf-sanity.sh

Fixes: 2d714041ba ("LU-8962 lfs: Handle non-lustre and multiple args")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10680
Change-Id: Ibbe8d624fe341282f55bf8e5140f6362432d64cf
Reviewed-on: https://review.whamcloud.com/46236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14676 lnet: improve hash distribution across CPTs 33/46233/6
Serguei Smirnov [Thu, 20 Jan 2022 16:40:28 +0000 (08:40 -0800)]
LU-14676 lnet: improve hash distribution across CPTs

Change the nid-to-cpt allocation function to use
(sum-by-multiplication of nid bytes) mod (number of CPTs)
to match nid to a CPT. This patch only addresses IPV4 nids.

Make the matching change for the nid-to-cpt function
used by the 'lnetctl cpt-of-nid' utility.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1052414947c4cae8c63993ffa21f67cb389bb463
Reviewed-on: https://review.whamcloud.com/46233
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15176 sec: present .fscrypt in subdir mount 67/46167/3
Sebastien Buisson [Wed, 12 Jan 2022 10:13:44 +0000 (11:13 +0100)]
LU-15176 sec: present .fscrypt in subdir mount

fscrypt userspace tool works with a .fscrypt directory at the root of
the file system. In case of subdirectory mount, we virtually present
this .fscrypt directory at the root of the mount point so that fscrypt
can be used. This makes it possible to even do a subdirectory mount of
an encrypted directory, making clients access encrypted content only.
Internally, the .fscrypt directory is always stored at the root of
Lustre.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2a0ee360f724da1df49b2be0df986d52e06f45fd
Reviewed-on: https://review.whamcloud.com/46167
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15452 utils: support lctl getattr for osc 31/46131/3
John L. Hammond [Fri, 14 Jan 2022 16:58:59 +0000 (10:58 -0600)]
LU-15452 utils: support lctl getattr for osc

In lctl:jt_obd_getattr(), support FIDs in addition to OIDs and print
whatever valid attributes were returned. Add a supporting
OBD_IOC_GETATTR case to osc_iocontrol().

  # function lctl_osc_device() {
    # Find osc device name for file and index.
    # lctl_osc_device /mnt/lustre/... 42 => lustre-OST002a-osc-ffff89cca1555000
    local path="$1"
    local index="$2"
    local fsname=$(lfs getname --fsname "$path")
    local instance=$(lfs getname --instance "$path")

    printf '%s-OST%04x-osc-%sn' "$fsname" "$index" "$instance"
  }
  # lfs getstripe /mnt/lustre/f0 | grep l_ost_idx
        - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }
        - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] }
        - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] }
        - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] }
  # lctl --device $(lctl_osc_device /mnt/lustre 1) getattr '[0x100010000:0x2:0x0]'
  valid: 0x110000001008fff
  oi.oi.oi_id: 0x100020000
  oi.oi.oi_seq: 0x2
  oi.oi_fid: [0x100020000:0x2:0x0]
  atime: 0
  mtime: 1642178551
  ctime: 1642178551
  size: 0
  blocks: 0
  blksize: 4194304
  mode: 0107666
  uid: 0
  gid: 0
  flags: 2097152
  layout_version: 3
  projid: 0
  data_version: 4294967298

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I57d5778e9ac39030ae9477a0979f20b7f7460fc8
Reviewed-on: https://review.whamcloud.com/46131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15398 lnet: Avoid peer NI recovery for local interface 33/45933/10
Chris Horn [Thu, 23 Dec 2021 20:15:27 +0000 (14:15 -0600)]
LU-15398 lnet: Avoid peer NI recovery for local interface

If a MR peer has a MR peer entry for itself (can happen if manually
created or discovery is run on itself for some reason), then it is
possible for it to put its own interfaces into peer recovery. Problems
with local interfaces should be handled via local NI recovery.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10661
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5b28195979a6113fa863b5795a4528b072610891
Reviewed-on: https://review.whamcloud.com/45933
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15398 tests: Use remote peers for health tests 75/45975/8
Chris Horn [Tue, 4 Jan 2022 20:42:26 +0000 (14:42 -0600)]
LU-15398 tests: Use remote peers for health tests

LNet health may take different action depending on whether a NID
belongs to the local host or a remote peer. As such, the test cases
need to be careful to use remote or local NIs appropriately.

Introduce helper functions to create and cleanup LNet peers that are
needed for these tests. Convert existing test cases to use the new
helpers.

New function, lnet_if_list(), is added to test-framework.sh to
facilitate configuration of remote interfaces. do_rpc_nodes() modified
to recognize '--quiet' flag to ease parsing of lnet_if_list() output.

Tests 204 and 206 were re-worked to check the health state after each
simulated error. lnet_health_post() modified to reset peer and local
NI health so they are at max value when each error condition is
simulated.

Test 214, 215, and 250 were using hardcoded "eth0" names. These were
switched to use the INTERFACES variable.

The lnet_recovery_limit parameter is deprecated so remove lines that
were setting that parameter.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10661
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I685fda8a84bcce024a765ddfc81c085acf24607a
Reviewed-on: https://review.whamcloud.com/45975
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o 51/44651/4
Patrick Farrell [Thu, 12 Aug 2021 20:28:29 +0000 (16:28 -0400)]
LU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o

Logging to the console is relatively expensive and doing it
for fake i/o is very expensive in terms of CPU time.

If we use FAIL_CHECK_QUIET, a debug message is logged only once
to the console, and the rest at D_INFO level (probably not at all).

This should hugely reduce the CPU cost of the debugging.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I46a5042efd116a4f5c80eaf0d5dae7fe132f6a79
Reviewed-on: https://review.whamcloud.com/44651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>