Whamcloud - gitweb
fs/lustre-release.git
2 years agoLU-15661 nodemap: fix map mode value for 'both' 70/46870/2
Sebastien Buisson [Fri, 18 Mar 2022 16:43:31 +0000 (01:43 +0900)]
LU-15661 nodemap: fix map mode value for 'both'

The patch that introduced the ability to map project IDs with
nodemap changed the value used for the "map both uid and gid"
case, from 0 to 3.
This poses a problem in case of upgrade from a previous Lustre
version, so re-introduce the value 0 as NODEMAP_MAP_BOTH_LEGACY.

Change-Id: I1f605de9c97faff32411da5052e8782a60645767
Fixes: 8a770616a5 ("LU-14797 sec: add projid to nodemap")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/46870
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15551 ofd: Return EOPNOTSUPP instead of EPROTO 16/46516/13
Arshad Hussain [Mon, 14 Feb 2022 10:02:08 +0000 (15:32 +0530)]
LU-15551 ofd: Return EOPNOTSUPP instead of EPROTO

Modify server to return -EOPNOTSUPP instead of
-EPROTO for unsupported fallocate modes

Test-Parameters: serverversion=2.14.0 testlist=sanity env=ONLY=150
Test-Parameters: serverversion=2.14.0 testlist=sanity-flr env=ONLY=50
Test-Parameters: serverversion=2.14.0 testlist=ost-pools env=ONLY="29 31"
Test-Parameters: serverversion=2.14.0 testlist=sanity-benchmark env=ONLY=fsx
Test-Parameters: serverversion=2.14.0 testlist=sanity-dom env=ONLY=fsx
Test-Parameters: serverversion=2.14.0 testlist=sanityn env=ONLY=16
Fixes: 7462e8cad730 ("LU-14160 fallocate: Add punch mode to fallocate")
Signed-off-by: arshad.hussain@aeoncomputing.com
Change-Id: Id203c0b9abbdd674af33f1f78e81ae7fe105e90f
Reviewed-on: https://review.whamcloud.com/46516
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15634 ptlrpc: Use after free of 'conn' in rhashtable retry 63/46763/2
Shaun Tancheff [Wed, 9 Mar 2022 08:53:24 +0000 (02:53 -0600)]
LU-15634 ptlrpc: Use after free of 'conn' in rhashtable retry

Use after free of 'conn' in the uncommon case of
rhashtable_lookup_get_insert_fast failing with -EBUSY or -ENOMEM

Move OBD_FREE_PTR(conn) below the retry and set conn2 to NULL
on error, propagating to conn and returning to the caller.

HPE-bug-id: LUS-10776
Fixes: 37b29a8f70 ("LU-8130 ptlrpc: convert conn_hash to rhashtable");
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2fb27d4e8fa6a5324d0a8e06afe34a39fa622bc2
Reviewed-on: https://review.whamcloud.com/46763
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15506 tests: fix sha1sum error for disk images with multiple mdts 04/46404/5
Andreas Dilger [Tue, 1 Feb 2022 10:20:53 +0000 (03:20 -0700)]
LU-15506 tests: fix sha1sum error for disk images with multiple mdts

For new disk2_10-ldiskfs and disk2_12-ldiskfs images,
check remote_dir for sha1sum test. For dne image, check
striped_dir.

One minor ost2 replace_nids fix for DNE test images.

Add verbose debug messages to make test flow more clear.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=32 mdscount=1 mdtcount=1
Test-Parameters: testlist=conf-sanity env=ONLY=32 mdscount=2 mdtcount=4
Fixes: f2143c0790bb ("LU-11643 tests: add new images and tests for upgrade")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I582bcdbf72d72e6da636559a24b1ecc89553c895
Reviewed-on: https://review.whamcloud.com/46404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15546 mdt: mdt_reint_open lookup before locking 79/46679/4
Etienne AUJAMES [Wed, 2 Mar 2022 17:58:20 +0000 (18:58 +0100)]
LU-15546 mdt: mdt_reint_open lookup before locking

This patch is an optimization of 33dc40d ("LU-10262 mdt:
mdt_reint_open: check EEXIST without lock").

The current behavior is to take a LCK_PR on parent to verify if the
file exist and then take a LCK_PW to create the file.

Here we do a lookup to determine the mode before tacking a lock.
This avoid to re-lock each time for create cases.

Most of the time we have:
1. lookup the child in parent directory
2. take the parent lock: file_exist ? LCK_PR : LCK_PW
3. re-lookup the child

In a race senario (create/unlink) we have:
1. lookup child in parent directory -> file exists
2. take a LCK_PR on the parent
3. re-lookup the child -> file doesn't exist
2. take a LCK_PW on the parent
4. re-lookup the child

This patch fix the "SKIP" condition for sanityn 41i/43k/45j and clear
the LRU locks cache for sanityn 43k/45j.

Fixes: 33dc40d ("LU-10262 mdt: mdt_reint_open: check EEXIST without lock")
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I121abd4babfb516d7a64682b054a6443d38590ef
Reviewed-on: https://review.whamcloud.com/46679
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15608 sec: fix DIO for encrypted files 64/46664/6
Sebastien Buisson [Tue, 1 Mar 2022 16:26:09 +0000 (17:26 +0100)]
LU-15608 sec: fix DIO for encrypted files

With Direct IO, we do not have proper page cache pages. So we need to
retrieve by ourselves the page mapping and the page index of the page
to be encrypted/decrypted.

For the index, we need to use the offset of the page within the file,
and not the object.
So we rename cl_page's cp_osc_index to cp_page_index for that purpose.
cp_osc_index is redundant with osc_async_page's oap_obj_off and only
used by osc_index(), so we also adapt this function.
cp_page_index is initialized in cl_page_alloc(), and accessed in
the OSC layer where the llcrypt primitives are called.

For the mapping, problem is page->mapping is not set to NULL on page
allocation, so it cannot safely be used to see if a page is a direct
I/O page.
Use cl_page for direct I/O and page->mapping for buffered
I/O.  (clpage->cp_inode is only set for direct I/O and
cannot easily be always set.)
Without this, we sometimes get panics when page2inode is
used in the OSC layer.  (Note the remaining use in dom is
safe because ll_dom_readpage is a page cache helper and
will never see DIO pages.)

Fixes: a71e0dd7f7 ("LU-14306 sec: get rid of bad rss-counter state messages")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Icb53a4e45463b8d3febc2e6212b39dc25719d866
Reviewed-on: https://review.whamcloud.com/46664
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15021 quota: protect lqe_glbl_data in lqe 98/45098/9
Hongchao Zhang [Wed, 13 Oct 2021 09:02:23 +0000 (17:02 +0800)]
LU-15021 quota: protect lqe_glbl_data in lqe

The lqe_glbl_data in "struct lquota_entry" is allocated in
qmt_lvbo_init and freed in qmt_lvbo_free, it could be freed
during qmt_seed_glbe called by qmt_set_id_notify, and cause
panic because of using freed memory.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I274f07ee8609c83852572be51625cc929a9130ec
Reviewed-on: https://review.whamcloud.com/45098
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15589 tests: skip sanity test_230s for interop 98/46598/2
Andreas Dilger [Wed, 23 Feb 2022 19:24:12 +0000 (12:24 -0700)]
LU-15589 tests: skip sanity test_230s for interop

Sanity test_230s was added in 2.14.52 but incorrectly checked for
MDS version 2.13.57 for interop, likely because that was the version
present at the time the patch was originally written, but it was
only landed later.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=230s
Fixes: 65e3e4050ec5 ("LU-14366 mdt: lfs mkdir should return -EEXIST if exists")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5d046ad224a29558866453972c9c33b5da3a9037
Reviewed-on: https://review.whamcloud.com/46598
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15467 tests: fix sanity-hsm test_103a timeout issue 52/46252/2
Etienne AUJAMES [Fri, 21 Jan 2022 14:49:18 +0000 (15:49 +0100)]
LU-15467 tests: fix sanity-hsm test_103a timeout issue

Add check mds version in "sanity-hsm test_103a" for interop test.
Limit the number of parralel hsm restore requests to
max_rpcs_in_flight.

Fixes: b449f3d ("LU-15145 hsm: unlock the restore layout lock for a cancel")
Test-Parameters: trivial
Test-Parameters: testlist=sanity-hsm env=ONLY=103a,ONLY_REPEAT=20
Test-Parameters: testlist=sanity-hsm
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I78098042d1316cdcc9d2e25860099a0ffdba2503
Reviewed-on: https://review.whamcloud.com/46252
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.56 67/46567/4
Shaun Tancheff [Tue, 22 Feb 2022 07:28:50 +0000 (01:28 -0600)]
LU-15576 osp: Interop skip sanity test 823 for MDS < 2.14.56

Prior to v2_14_55-29-g06e586016d setting create_count greater
than the maximum returned -ERANGE.

During interop testing skip sanity/823 for MDS older than 2.14.56.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=823
Fixes: 06e586016d3a ("LU-13941 osp: Silently lower requested create_count to maximum")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie79617deea047b2a846f696473b9c2b5681953be
Reviewed-on: https://review.whamcloud.com/46567
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15601 osd-ldiskfs: handle read_inode_bitmap() error 60/46660/3
Andreas Dilger [Tue, 1 Mar 2022 05:14:37 +0000 (22:14 -0700)]
LU-15601 osd-ldiskfs: handle read_inode_bitmap() error

Correctly handle a PTR_ERR() error return from read_inode_bitmap().
This changed in upstream kernel commit v4.3-rc2-17-g9008a58e5dce,
so handle this for both types of return value.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I184c09b300ed69c29e4a7ef343f473b67080381f
Reviewed-on: https://review.whamcloud.com/46660
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15513 lod: skip uninit component in lod_fill_mirrors 35/46435/5
Andreas Dilger [Wed, 2 Feb 2022 22:05:18 +0000 (15:05 -0700)]
LU-15513 lod: skip uninit component in lod_fill_mirrors

Do not iterate over the "objects" in lod_fill_mirrors() to check
for non-rotational OSTs if the component is uninitialized.  In
cases where an OST is not present (e.g. sparse OST indexes used)
the lod_tgt_desc[] array has holes and OST_TGT() returns NULL.

Skip the loop entirely if the component is not initialized, but
also add some sanity checks to verify that the OST index values
are sane in case there are other problems in the future (e.g.
corrupt/invalid layout on disk).

Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8ec23367059a4ec9e483adb768095b24f03ebbe5
Reviewed-on: https://review.whamcloud.com/46435
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15316 tests: use integers in sanity test_255a 50/46350/2
Andreas Dilger [Fri, 28 Jan 2022 05:51:24 +0000 (22:51 -0700)]
LU-15316 tests: use integers in sanity test_255a

The [[ ... > ... ]] operator doesn't really compare floats, it
compares strings.  That works as expected if the strings are
the same length, but fails for comparisons like [[ 32 > 123 ]].
Use (( ... > ... )) for comparisons, and only use integer values.

This test has been failing intermittently forever, but the error
was ignored because of running in a VM.

Test-Parameters: trivial
Fixes: f3b8f3fad502 ("tests: fix float comparison in sanity test_255a")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6787082cd579ae3f1bdd43222a739c939d3ebbe5
Reviewed-on: https://review.whamcloud.com/46350
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15010 tests: skip sanity test_64g for interop 65/46565/4
Andreas Dilger [Sun, 20 Feb 2022 18:43:33 +0000 (11:43 -0700)]
LU-15010 tests: skip sanity test_64g for interop

Sanity test_64g checks code that was only added in 2.14.56.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=64g
Fixes: 6e116213e3fd ("LU-15010 mdc: add support for grant shrink")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I339231f1b7890e8fffe7e079a052b15f54d4a050
Reviewed-on: https://review.whamcloud.com/46565
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14514 tests: skip sanity-flr test_44b for interop 64/46564/3
Andreas Dilger [Sun, 20 Feb 2022 18:28:08 +0000 (11:28 -0700)]
LU-14514 tests: skip sanity-flr test_44b for interop

Sanity-flr test_44b checks code that was only added in 2.14.56.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity-flr env=ONLY=44b
Fixes: 83c790cbf2f8 ("LU-14514 flr: mirror split should not make stale file")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65cc59dbde3f9711b915a56730e221e224e9b715
Reviewed-on: https://review.whamcloud.com/46564
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15060 tests: skip sanity-flr test_208 in interop 63/46563/4
Andreas Dilger [Sun, 20 Feb 2022 18:07:19 +0000 (11:07 -0700)]
LU-15060 tests: skip sanity-flr test_208 in interop

Sanity test_208[ab] check a feature that was only landed in 2.14.55.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity-flr env=ONLY=208
Fixes: 8507472dd37e ("LU-14996 lov: prefer mirrors on non-rotational OSTs")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7735582ed9683d686396c14e2a4e254c648f7546
Reviewed-on: https://review.whamcloud.com/46563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Alena Nikitenko <anikitenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15584 utils: ppc64le __le64_to_cpu type mismatch 88/46588/2
Gian-Carlo DeFazio [Tue, 22 Feb 2022 20:23:30 +0000 (12:23 -0800)]
LU-15584 utils: ppc64le __le64_to_cpu type mismatch

Cast values returned by __le64_to_cpu to
long long unsigned int. This is to match print format
strings that use %llx. This mismatch was resulting in a
build failure for ppc64le.

Build log message:
llog_reader.c:921:42: error: format '%llx' expects
argument of type 'long long unsigned int', but
argument 3 has type 'long unsigned int'

Fixes: 80447caf980 LU-14926 utils: print unlink and setattr recs in llog_reader
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I939b94626d2707b6ff644324c5c2798218331c4d
Reviewed-on: https://review.whamcloud.com/46588
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15574 tests: Skip test sanity/77o in interop 68/46568/4
Arshad Hussain [Mon, 21 Feb 2022 04:38:30 +0000 (10:08 +0530)]
LU-15574 tests: Skip test sanity/77o in interop

Test sanity/77o Server checksum proc entries was
introduced in 2.14.55.

During interop testing skip sanity/77o for
MDS and OST version lesser than 2.14.55.

Test-Parameters: trivial serverversion=2.14.0 testlist=sanity env=ONLY=77o
Fixes: c18d5d892b62 LU-14889 lproc: Add server checksum_type
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Idb634ca349a6be01331057a473cc15747325a075
Reviewed-on: https://review.whamcloud.com/46568
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15577 tests: fix interop issue 66/46566/5
Alexander Zarochentsev [Sun, 20 Feb 2022 19:50:44 +0000 (19:50 +0000)]
LU-15577 tests: fix interop issue

Sanity test 831 expects MDS to have osp.*.max_sync_changes
tunable, appeared in 2.14.56.
Adding a check to skip older MDSes.

Fixes: c226e70007 ("LU-15114 osp: changes queuing throttle")
Test-Parameters: trivial serverversion=2.12 serverdistro=el7.9 testlist=sanity env=ONLY=831
Test-Parameters: trivial testlist=sanity env=ONLY=831
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I911f7b0d9dd606f08f544fce55bf8bcfe9fb69e3
Reviewed-on: https://review.whamcloud.com/46566
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 years agoLU-15572 util: mirror delete with old MDS 14/46614/5
Bobi Jam [Fri, 25 Feb 2022 08:34:07 +0000 (16:34 +0800)]
LU-15572 util: mirror delete with old MDS

old MDS does not support mirror delete without volatile file and
clobbers the intent close error as -EBUSY, this patch catch the
ambiguous error and try the mirror delete using old way.

Fixes: b2d73351e6 ("LU-14521 flr: delete mirror without volatile file")
Test-Parameters: trivial
Test-Parameters: serverversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61 203"
Test-Parameters: clientversion=2.14.0 testlist=sanity env=ONLY="0 50 60 61 203"
Test-Parameters: serverversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61 203" serverdistro=el7.9
Test-Parameters: clientversion=2.12.8 testlist=sanity env=ONLY="0 50 60 61 203"
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I497118cbb7da871268f0fdd6bdb88ad6bd831a26
Reviewed-on: https://review.whamcloud.com/46614
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15512 lnet: Stop discovery on deleted peer NI 29/46429/2
Chris Horn [Wed, 2 Feb 2022 18:37:00 +0000 (18:37 +0000)]
LU-15512 lnet: Stop discovery on deleted peer NI

lnet_discover_peer_locked() needs to check whether the peer NI that is
undergoing discovery has been deleted (i.e. its assocaited peer has
LNET_PEER_MARK_DELETED state). Otherwise, we may enter an infinite
loop because this peer will never be considered up to date.

Test-Parameters: trivial testlist=sanity-lnet
Fixes: fd32cd817c ("LU-13895 lnet: Prevent discovery on deleted peer")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I43d276fc460241c1724c8e30913bb6c5cbb7c8f4
Reviewed-on: https://review.whamcloud.com/46429
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15555 ldiskfs: large directory causes htree corruption 26/46526/2
Andrew Perepechko [Mon, 14 Feb 2022 13:35:10 +0000 (16:35 +0300)]
LU-15555 ldiskfs: large directory causes htree corruption

When creating a lot of files in a single directory, it can
get corrupted because of a typo in ext4-kill-dx-root.patch.

Change-Id: Ia36278580741e1eb905e24a3a6231ba7daaa882a
Fixes: 20a6d32 ("LU-12637 kernel: RHEL 8.1 server support")
HPE-bug-id: LUS-10730
Signed-off-by: Andrew Perepechko <c17827@cray.com>
Signed-off-by: Alexander Zarochentsev <c17826@cray.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/46526
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew RC 2.15.0-RC2 2.15.0-RC2 v2_15_0-RC2
Oleg Drokin [Tue, 8 Feb 2022 00:12:39 +0000 (19:12 -0500)]
New RC 2.15.0-RC2

Change-Id: Idfbc2ff63d48e2b3ca4801905e1d6ee7667ac427
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoNew release candidate 2.15.0-RC1 2.15.0-RC1 v2_15_0-RC1
Oleg Drokin [Mon, 7 Feb 2022 23:42:00 +0000 (18:42 -0500)]
New release candidate 2.15.0-RC1

Change-Id: I6a62dffa8d2a1159b9a0abfd0659f8544a0daeab
Signed-off-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15422 build: Update ZFS version to 2.0.7 06/46006/8
James Simmons [Thu, 3 Feb 2022 19:23:16 +0000 (14:23 -0500)]
LU-15422 build: Update ZFS version to 2.0.7

Update ZFS version to 2.0.7. The changes are listed in:
https://github.com/openzfs/zfs/releases/tag/zfs-2.0.7

Change-Id: I5dcff31af1458c5c9d2fe17256e31751535578d8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46006
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Olaf Faaland-LLNL <faaland1@llnl.gov>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15404 ldiskfs: truncate during setxattr leads to kernel panic 58/46358/9
Andrew Perepechko [Mon, 31 Jan 2022 16:55:31 +0000 (19:55 +0300)]
LU-15404 ldiskfs: truncate during setxattr leads to kernel panic

When changing a large xattr value to a different large xattr value,
the old xattr inode is freed. Truncate during the final iput causes
current transaction restart. Eventually, parent inode bh is marked
dirty and kernel panic happens when jbd2 figures out that this bh
belongs to the committed transaction.

A possible fix is to call this final iput in a separate thread.
This way, setxattr transactions will never be split into two.
Since the setxattr code adds xattr inodes with nlink=0 into the
orphan list, old xattr inodes will be properly cleaned up in
any case.

Change-Id: Idd70befa6a83818ece06daccf9bb6256812674b9
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-10534
Reviewed-on: https://review.whamcloud.com/46358
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12585 mdt: Add read/write latency to MDT stats 29/46229/4
Patrick Farrell [Thu, 27 Jan 2022 21:42:23 +0000 (16:42 -0500)]
LU-12585 mdt: Add read/write latency to MDT stats

The MDT does not currently record latency stats for reads
and writes.

Add this, and change the naming to be the same as for the
OFD.

Note on this:
Existing naming on the MDT uses "read/write" instead of
"{read,write}_bytes", which is inconsistent with OFD and
also inconsistent within the MDT, since other ops without
the "_bytes" suffix are latency.

It's not ideal to change the names of existing stats, but I
decided this was less problematic than leaving them
inconsistent.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7b7a5742678cbe0269086f37877833e877a5ca5f
Reviewed-on: https://review.whamcloud.com/46229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12585 obdfilter: Use actual I/O bytes in stats 75/46075/7
Patrick Farrell [Thu, 27 Jan 2022 21:37:46 +0000 (16:37 -0500)]
LU-12585 obdfilter: Use actual I/O bytes in stats

Currently the obdfilter stats note the number of bytes
requested by the client rather than the number of bytes
actually read or written.  This is particularly confusing
for reads because clients can request more data than
exists and some applications do this normally.

This results in statistics that can be off by almost any
amount from the actual number of bytes read.  This patch
moves the stats to be collected just before commit, which
allows the true number of bytes to be recorded but does not
include the commit time in the time stats.  (Since commit
time is not part of the operation latency as experienced by
the client.)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I81fe9a6afdad5b48e8421f4aa72b8ef10a0eee93
Reviewed-on: https://review.whamcloud.com/46075
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15503 quota: fix list entry usage 80/46380/5
Yang Sheng [Sat, 29 Jan 2022 14:24:17 +0000 (22:24 +0800)]
LU-15503 quota: fix list entry usage

Fetch next list entry.

Fixes: d527e81246 (LU-15283 quota: deadlock between reint & lquota_wb)
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I86befdfaa96151a6fd61902ffbf43ee8e5cae8cb
Reviewed-on: https://review.whamcloud.com/46380
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15417 build: find the new path for MOFED 5.5 83/46383/3
Jian Yu [Sun, 30 Jan 2022 08:09:37 +0000 (00:09 -0800)]
LU-15417 build: find the new path for MOFED 5.5

The path of the mofed header files has change to
/usr/src/ofa_kernel/x86_64/<kernel>,
so we cannot assume it's /usr/src/ofa_kernel/default.

Besides updating lbuild, we also need to update
lustre-lnet.m4 and lustre.spec.in.

Test-Parameters: trivial

Change-Id: Iab42ce9e458f78b0dc0233ac6fd23a1760be5324
Fixes: 94a3f1bfa70 ("LU-15417 build: build MOFED 5.5")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46383
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15502 llite: set default LMV hash type with 2.12 MDS 78/46378/8
Lai Siyao [Sat, 29 Jan 2022 05:14:40 +0000 (00:14 -0500)]
LU-15502 llite: set default LMV hash type with 2.12 MDS

If default LMV hash type is CRUSH, or unset, it should be converted
to fnv_16_64, because 2.12 MDS doesn't understand this.

Fix LMV_HASH_FLAG_KNOWN to match actual known flags.

Test-Parameters: testlist=sanity env=ONLY=300 mdtcount=2 serverversion=2.12 serverdistro=el7.9
Fixes: 0a1cf8da8069 ("LU-11025 dne: introduce new directory hash type: "crush")
Fixes: bb60caa1c6e7 ("LU-14459 lmv: change default hash type to crush")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie2ad5a456040dcd01bc2c5ab96db52bf944abbd2
Reviewed-on: https://review.whamcloud.com/46378
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14008 o2iblnd: avoid memory copy for short msg 62/40262/14
Alexey Lyashkov [Wed, 12 Aug 2020 14:59:50 +0000 (17:59 +0300)]
LU-14008 o2iblnd: avoid memory copy for short msg

Modern cards allow to send a kernel memory data without mapping
or copy to the preallocated buffer.
It reduce a lnet selftest cpu consumption by 3% for messages
less than 4k size.

Test-Parameters: trivial
HPe-bug-id: LUS-1796
Change-Id: I96c31be680c8ea7ac289a755df7f1d4c1c7f9aef
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/40262
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15446 lnet: Don't use pref NI for reserved portal 78/46078/4
Chris Horn [Wed, 12 Jan 2022 19:19:21 +0000 (19:19 +0000)]
LU-15446 lnet: Don't use pref NI for reserved portal

Don't use the preferred NI when sending traffic on the LNet reserved
portal. This allows local recovery pings to utilize any local NI as
source in the case where we do not have a multi-rail peer entry for
the local host. This is typically the case when MR is not being
configured statically (i.e. when discovery is being used for MR
configuration).

lnet_get_best_ni() was modified to include health values of the NIs
being compared in its debug output.

HPE-bug-id: LUS-10658
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I38f5760bf034f698b7f44ffa89aa91c4f5d4b9ea
Reviewed-on: https://review.whamcloud.com/46078
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15440 lnet: lnet_peer_data_present() memory leak 52/46052/3
Chris Horn [Tue, 11 Jan 2022 22:19:16 +0000 (16:19 -0600)]
LU-15440 lnet: lnet_peer_data_present() memory leak

If the ping buffer has nnis <= 1 then the ref on the ping buffer does
not get dropped. This causes a memory leak.

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5e3c651ffecbe4f8860afb86770cecef23ebe862
Reviewed-on: https://review.whamcloud.com/46052
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15428 contrib: add branch_comm 31/46031/3
John L. Hammond [Mon, 10 Jan 2022 16:59:02 +0000 (10:59 -0600)]
LU-15428 contrib: add branch_comm

Add a branch comparison (branch_comm) to contrib/scripts.

Test-Parameters: trivial
Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I13c0b90a48d6d3215bf9959242c5671e83d27d7a
Reviewed-on: https://review.whamcloud.com/46031
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15400 tests: sanity-lfsck MDT_DEVNAME fix 25/46025/8
Elena Gryaznova [Fri, 28 Jan 2022 14:34:59 +0000 (17:34 +0300)]
LU-15400 tests: sanity-lfsck MDT_DEVNAME fix

Global MDT_DEVNAME set at the start of sanity-lfsck
equal to a device-mapper device can not be used after
stop() because of a device-mapper device is removed and
facet device is restored:
  stop () ->
     elif dm_flakey_supported $facet; then
        if [[ -n ${!failover_host} && ${!failover_host} != ${!host} ]]
           dm_cleanup_dev $facet ->
              unexport_dm_dev $facet

Without this fix the tests:
    1a, 1b, 1c, 2a, 2b, 2c, 2d, 4, 5, 7a, 7b, 8, 30
fail on failover setup with:
    losetup: /dev/mapper/mds1_flakey: failed to set up loop device

To reproduce the failure just run:
  sh llmountcleanup.sh
  sh sanity-lfsck.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: 54b9e3f ("LU-684 tests: replace dev_read_only patch with dm-flakey")
Test-Parameters: trivial testlist=sanity-lfsck
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10667
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2736406161d67335f465cf70eb9f21347a8a798f
Reviewed-on: https://review.whamcloud.com/46025
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15334 tests: cleanup conf-sanity test_30a 11/45811/2
Elena Gryaznova [Thu, 9 Dec 2021 18:17:14 +0000 (21:17 +0300)]
LU-15334 tests: cleanup conf-sanity test_30a

Fix typo: use error() instead of not existing fail(),
localize some variables.

Fixes: 5e546603cb ("b=15253 add conf_param -d to remove permanent settings")
Test-Parameters: trivial testlist=conf-sanity env=ONLY=30a
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Change-Id: I970d8422ba8ba75aca922d8ac6bac09c7cfcd67d
Reviewed-on: https://review.whamcloud.com/45811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
2 years agoLU-15506 tests: skip conf-sanity test_32b until fixed 03/46403/2
Andreas Dilger [Tue, 1 Feb 2022 10:04:19 +0000 (03:04 -0700)]
LU-15506 tests: skip conf-sanity test_32b until fixed

The new disk2_10-ldiskfs and disk2_12-ldiskfs images are failing
conf-sanity test_32b.  Rather than remove the images themselves,
which are large and would consume more space in Gerrit if removed
and re-added, instead skip them until the test can be fixed/o

Test-Parameters: trivial testlist=conf-sanity env=ONLY=32
Fixes: f2143c0790bb ("LU-11643 tests: add new images and tests for upgrade tests")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5e6af27669773f67b16786aeffc24e995e3ebbe5
Reviewed-on: https://review.whamcloud.com/46403
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
2 years agoLU-15268 mdt: reveal the real intent close error code 36/45636/6
Bobi Jam [Wed, 1 Dec 2021 12:05:49 +0000 (20:05 +0800)]
LU-15268 mdt: reveal the real intent close error code

mdt_mfd_close() clobbers the intent close error so that user space
tool only knows that the close intent hasn't finished and reports
-EBUSY instead of the real error code.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I72f474a73e8b73cdc35ca38eaaec5af182f63ca7
Reviewed-on: https://review.whamcloud.com/45636
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15459 llite: clear async errors on write commit sync 78/46178/4
Vladimir Saveliev [Mon, 24 Jan 2022 17:13:59 +0000 (20:13 +0300)]
LU-15459 llite: clear async errors on write commit sync

Async errors should be cleared after vvp_io_commit_sync(). Otherwise,
that will be done in ll_flush() called from
linux/fs/open.c:filp_close() and close(2) will fail. ll_flush()
replaces any error code with EIO which is confusing.

Test to illustrate the issue is added.
'P' mode is added to multiop. It is like 'w' but does only 1 write
call regardless to how many bytes were written.

HPE-bug-id: LUS-7529
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I6b7a1465268999b48a50f3584f3821f4b088303d
Reviewed-on: https://review.whamcloud.com/46178
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14802 nodemap: return proper error code 26/45626/9
Andreas Dilger [Fri, 19 Nov 2021 21:51:09 +0000 (14:51 -0700)]
LU-14802 nodemap: return proper error code

In nodemap_add_range_helper() it was always returning -ENOMEM when
there was an error inserting a new range into the existing nodemap.

    nodemap_add_range_helper()) cannot insert nodemap range: rc = -17
    mgs_iocontrol_nodemap()) MGS: OBD_IOC_NODEMAP command: rc = -12

This was confusing because the error returned by range_insert() was
typically -EEXIST (i.e. the entry being inserted already was in the
nodemap).  Do not print an error to the console in this common case.

Return the actual error to the caller so that this is more clear
to the end user.  Have l_ioctl() always set errno on error, in
addition to returning the error, since many callers depend on this.

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2c80a11dfdf9e6e1c9a8235b8f74f5bcea68c08e
Reviewed-on: https://review.whamcloud.com/45626
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15478 lnet: Check LNET_NID_IS_ANY in LNET_NID_NET 92/46292/3
Chris Horn [Mon, 24 Jan 2022 22:02:25 +0000 (16:02 -0600)]
LU-15478 lnet: Check LNET_NID_IS_ANY in LNET_NID_NET

If LNET_NID_NET is passed the wildcard NID (LNET_ANY_NID) then we
should return the wildcard net (LNET_NET_ANY). This also allows NULL
to be used as an argument to LNET_NID_NET.

Fixes: 005bd7075c ("LU-10391 lnet: Change lnet_send() to take large-addr nids")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Ic2a7c9af31dcba285c266a872462cf179ab603fa
Reviewed-on: https://review.whamcloud.com/46292
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15218 quota: delete unused quota ID 48/45548/14
Hongchao Zhang [Fri, 21 Jan 2022 00:43:56 +0000 (08:43 +0800)]
LU-15218 quota: delete unused quota ID

Add lfs option '--delete' to delete unused quota ID.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I0d8e6b61dc23c7b22b6054bcced087b8dc94a277
Reviewed-on: https://review.whamcloud.com/45548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13176 mdd: rename file with different project ID 60/45660/8
Hongchao Zhang [Tue, 11 Jan 2022 15:12:55 +0000 (23:12 +0800)]
LU-13176 mdd: rename file with different project ID

This patch relaxes the limitation for rename between different
proeject IDs, and it will allow the normal file rename between
directories with different project IDs.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I4a2c21248d1e12ad1d00430e11e5dd50fe5eaf60
Reviewed-on: https://review.whamcloud.com/45660
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15477 osc: osc_extent_wait() deadlock 81/46281/2
Andriy Skulysh [Tue, 11 Jan 2022 07:30:49 +0000 (09:30 +0200)]
LU-15477 osc: osc_extent_wait() deadlock

Thread 1:
vvp_io_write_commit
osc_io_commit_async
osc_page_cache_add
osc_extent_find
osc_extent_wait

Thread 2:
ptlrpcd_check
ptlrpc_check_set
brw_queue_work
osc_extent_make_ready
vvp_page_make_ready_start
__lock_page

We must not hold a page lock while we do osc_extent_find()

Change-Id: Idf669bc8d9c943f28e3f5986826b9637d66ecfca
HPE-bug-id: LUS-10414
Fixes: a7299cb012 "LU-9920 vvp: dirty pages with pagevec"
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/46281
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15471 tests: use propper facet device 54/46254/2
Elena Gryaznova [Fri, 21 Jan 2022 15:59:14 +0000 (18:59 +0300)]
LU-15471 tests: use propper facet device

Tests which stop facet are to recalculate facet device
after stop as it changes when device mapper is used:
the device-mapper device is removed and facet device is
restored:
  stop () ->
     elif dm_flakey_supported $facet; then
        if [[ -n ${!failover_host} && ${!failover_host} != ${!host} ]]
           dm_cleanup_dev $facet ->
              unexport_dm_dev $facet

Without this fix sanity 17m, 17n 804 tests fail on failover
setup with:
  Cannot resolve path /dev/mapper/mds1_flakey
  e2fsck: No such file or directory while trying
                     to open /dev/mapper/mds1_flakey
and sanity 228b, 256, tests fail because of:
  mount: /dev/mapper/mds1_flakey: failed to setup loop device:
                     No such file or directory
  losetup: /dev/mapper/mds1_flakey: failed to set up loop device

To reproduce the failures -- just run:
  ONLY="17m 17n 228b 256 804" sh sanity.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: 54b9e3f789 ("LU-684 tests: replace dev_read_only patch with dm-flakey")
Test-Parameters: trivial testlist=sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9808
Change-Id: I02ce9d7cb7cf804fe0596d9aad7f995242c4b3af
Reviewed-on: https://review.whamcloud.com/46254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15465 tests: conf-sanity failed with code 95 36/46236/3
Elena Gryaznova [Fri, 21 Jan 2022 07:24:28 +0000 (10:24 +0300)]
LU-15465 tests: conf-sanity failed with code 95

conf-sanity tests 27b, 47 and 84 (the tests execute 'fail mds1' and
then 'cleanup' at the end of test) failed with code EOPNOTSUPP because
of 'set -e' and lfs df <non_lustre> return code 95.
The scenario:
test_27b () {
  facet_failover $SINGLEMDS
    change_active mds1
  ...
  cleanup -> umount_client $MOUNT
}
formatall
  stopall
    activemds=`facet_active mds1`
    if [ $activemds != "mds1" ]; then
       fail mds1
         clients_up
           lfs_df_check
             + local clients=fre0111,fre0112
             + local rc
             + [ -z fre0111,fre0112 ]
             + pdsh -S -w fre0111,fre0112
                 /usr/bin/lfs df /mnt/lustre << lustre not mounted
pdsh@fre0111: fre0111: ssh exited with exit code 95
pdsh@fre0111: fre0112: ssh exited with exit code 95

To reproduce the issue just run:
  ONLY="27b" sh conf-sanity.sh or:
  ONLY="47" sh conf-sanity.sh or:
  ONLY="84" sh conf-sanity.sh

Fixes: 2d714041ba ("LU-8962 lfs: Handle non-lustre and multiple args")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10680
Change-Id: Ibbe8d624fe341282f55bf8e5140f6362432d64cf
Reviewed-on: https://review.whamcloud.com/46236
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14676 lnet: improve hash distribution across CPTs 33/46233/6
Serguei Smirnov [Thu, 20 Jan 2022 16:40:28 +0000 (08:40 -0800)]
LU-14676 lnet: improve hash distribution across CPTs

Change the nid-to-cpt allocation function to use
(sum-by-multiplication of nid bytes) mod (number of CPTs)
to match nid to a CPT. This patch only addresses IPV4 nids.

Make the matching change for the nid-to-cpt function
used by the 'lnetctl cpt-of-nid' utility.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1052414947c4cae8c63993ffa21f67cb389bb463
Reviewed-on: https://review.whamcloud.com/46233
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15176 sec: present .fscrypt in subdir mount 67/46167/3
Sebastien Buisson [Wed, 12 Jan 2022 10:13:44 +0000 (11:13 +0100)]
LU-15176 sec: present .fscrypt in subdir mount

fscrypt userspace tool works with a .fscrypt directory at the root of
the file system. In case of subdirectory mount, we virtually present
this .fscrypt directory at the root of the mount point so that fscrypt
can be used. This makes it possible to even do a subdirectory mount of
an encrypted directory, making clients access encrypted content only.
Internally, the .fscrypt directory is always stored at the root of
Lustre.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2a0ee360f724da1df49b2be0df986d52e06f45fd
Reviewed-on: https://review.whamcloud.com/46167
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15452 utils: support lctl getattr for osc 31/46131/3
John L. Hammond [Fri, 14 Jan 2022 16:58:59 +0000 (10:58 -0600)]
LU-15452 utils: support lctl getattr for osc

In lctl:jt_obd_getattr(), support FIDs in addition to OIDs and print
whatever valid attributes were returned. Add a supporting
OBD_IOC_GETATTR case to osc_iocontrol().

  # function lctl_osc_device() {
    # Find osc device name for file and index.
    # lctl_osc_device /mnt/lustre/... 42 => lustre-OST002a-osc-ffff89cca1555000
    local path="$1"
    local index="$2"
    local fsname=$(lfs getname --fsname "$path")
    local instance=$(lfs getname --instance "$path")

    printf '%s-OST%04x-osc-%sn' "$fsname" "$index" "$instance"
  }
  # lfs getstripe /mnt/lustre/f0 | grep l_ost_idx
        - 0: { l_ost_idx: 1, l_fid: [0x100010000:0x2:0x0] }
        - 1: { l_ost_idx: 2, l_fid: [0x100020000:0x2:0x0] }
        - 0: { l_ost_idx: 3, l_fid: [0x100030000:0x2:0x0] }
        - 1: { l_ost_idx: 0, l_fid: [0x100000000:0x2:0x0] }
  # lctl --device $(lctl_osc_device /mnt/lustre 1) getattr '[0x100010000:0x2:0x0]'
  valid: 0x110000001008fff
  oi.oi.oi_id: 0x100020000
  oi.oi.oi_seq: 0x2
  oi.oi_fid: [0x100020000:0x2:0x0]
  atime: 0
  mtime: 1642178551
  ctime: 1642178551
  size: 0
  blocks: 0
  blksize: 4194304
  mode: 0107666
  uid: 0
  gid: 0
  flags: 2097152
  layout_version: 3
  projid: 0
  data_version: 4294967298

Signed-off-by: John L. Hammond <jhammond@whamcloud.com>
Change-Id: I57d5778e9ac39030ae9477a0979f20b7f7460fc8
Reviewed-on: https://review.whamcloud.com/46131
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15398 lnet: Avoid peer NI recovery for local interface 33/45933/10
Chris Horn [Thu, 23 Dec 2021 20:15:27 +0000 (14:15 -0600)]
LU-15398 lnet: Avoid peer NI recovery for local interface

If a MR peer has a MR peer entry for itself (can happen if manually
created or discovery is run on itself for some reason), then it is
possible for it to put its own interfaces into peer recovery. Problems
with local interfaces should be handled via local NI recovery.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10661
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I5b28195979a6113fa863b5795a4528b072610891
Reviewed-on: https://review.whamcloud.com/45933
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15398 tests: Use remote peers for health tests 75/45975/8
Chris Horn [Tue, 4 Jan 2022 20:42:26 +0000 (14:42 -0600)]
LU-15398 tests: Use remote peers for health tests

LNet health may take different action depending on whether a NID
belongs to the local host or a remote peer. As such, the test cases
need to be careful to use remote or local NIs appropriately.

Introduce helper functions to create and cleanup LNet peers that are
needed for these tests. Convert existing test cases to use the new
helpers.

New function, lnet_if_list(), is added to test-framework.sh to
facilitate configuration of remote interfaces. do_rpc_nodes() modified
to recognize '--quiet' flag to ease parsing of lnet_if_list() output.

Tests 204 and 206 were re-worked to check the health state after each
simulated error. lnet_health_post() modified to reset peer and local
NI health so they are at max value when each error condition is
simulated.

Test 214, 215, and 250 were using hardcoded "eth0" names. These were
switched to use the INTERFACES variable.

The lnet_recovery_limit parameter is deprecated so remove lines that
were setting that parameter.

Test-Parameters: trivial testlist=sanity-lnet
HPE-bug-id: LUS-10661
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I685fda8a84bcce024a765ddfc81c085acf24607a
Reviewed-on: https://review.whamcloud.com/45975
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o 51/44651/4
Patrick Farrell [Thu, 12 Aug 2021 20:28:29 +0000 (16:28 -0400)]
LU-14935 tests: Use FAIL_CHECK_QUIET for fake i/o

Logging to the console is relatively expensive and doing it
for fake i/o is very expensive in terms of CPU time.

If we use FAIL_CHECK_QUIET, a debug message is logged only once
to the console, and the rest at D_INFO level (probably not at all).

This should hugely reduce the CPU cost of the debugging.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I46a5042efd116a4f5c80eaf0d5dae7fe132f6a79
Reviewed-on: https://review.whamcloud.com/44651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
2 years agoLU-15286 build: only use baseonly option on el7 77/45677/9
Minh Diep [Mon, 29 Nov 2021 23:32:17 +0000 (15:32 -0800)]
LU-15286 build: only use baseonly option on el7

el7 baseonly option allow to build perf package while
in el8 does not.

Test-Parameters: trivial

Change-Id: Ie973c5cc816b4b98ef71ab7080bd11286bcd644a
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45677
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14879 ldiskfs: Support for SUSE 15 sp3 75/44375/9
Shaun Tancheff [Tue, 18 Jan 2022 16:14:26 +0000 (23:14 +0700)]
LU-14879 ldiskfs: Support for SUSE 15 sp3

Add a configure test and updated series for sles15sp3 for the
updated ext4-data-in-dirent.patch

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie56de51701ae903c515d9184e5e79e4cfaf76606
Reviewed-on: https://review.whamcloud.com/44375
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10973 lutf: use configured tmp directory for tar 43/44843/11
Amir Shehata [Fri, 3 Sep 2021 19:12:33 +0000 (12:12 -0700)]
LU-10973 lutf: use configured tmp directory for tar

After the LUTF run is done all the test results on all
the agent nodes need to be tarred, in order to make them
available for review later on. Don't assume the lutf tmp
files are in /tmp/lutf. Use the tmp-dir directory configured
in the lutf configuration.

Add the master only once to the agent list.

Pass PYTHOPATH to agent.

Test-Parameters: trivial
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: Ic3effcb53f7d27bf31b6adfd8a22900767ff9524
Reviewed-on: https://review.whamcloud.com/44843
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13621 lnet: utility to print cpt number 13/39113/9
Amir Shehata [Fri, 19 Jun 2020 23:31:36 +0000 (16:31 -0700)]
LU-13621 lnet: utility to print cpt number

Added a command to lnetctl to print the cpt of the NID.
lnetctl cpt-of-nid --nid <nid> --ncpt <number of cpts>
ex:
lnetctl cpt-of-nid --nid 192.28.12.35@tcp9 --ncpt 7
This will return what cpt the NID will hash to within the 0-6 range.
If the NI is bound to specific set of CPTs, then the ncpts refers
to the number of CPTs the NI is bound to. The cpt value returned
will be an index into the list of bound CPTs.

For example if an NI is bound to [0,4,5,7], then the ncpt should be
4. And the returned value will be an index in the array:
ex:
lnetctl cpt-of-nid --nid 192.28.12.35@tcp9 --ncpt 4
cpt:
    value: 1
therefore, the actual CPT the NID will be bound to is 4.

Test-parameters: trivial testlist=sanity-lnet

Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Change-Id: I3cb562842448bfb663c2d41007be65299a919300
Reviewed-on: https://review.whamcloud.com/39113
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10633 mdt: Convert MDS restoring RPC message to D_WARNING 14/31214/6
Chris Horn [Wed, 7 Feb 2018 21:26:04 +0000 (15:26 -0600)]
LU-10633 mdt: Convert MDS restoring RPC message to D_WARNING

Using D_WARNING instead of D_RPCTRACE causes the message to be both
logged in the Lustre DK logs and on the system console.  This patch
changes the MDS restore/replay debug message to use D_WARNING.

A restored/replayed metadata request indicates some sort of underlying
error, and even when handled correctly, should generate a warning.

Test-Parameters: trivial
Change-Id: Iff98521853323469fc5d6c7d546ca83477b1cb9f
HPE-bug-id: LUS-2578
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/31214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12056 ldiskfs: add ext4-projid-xattrs.patch for Linux 5.10 95/46295/2
Li Dongyang [Tue, 25 Jan 2022 02:12:49 +0000 (13:12 +1100)]
LU-12056 ldiskfs: add ext4-projid-xattrs.patch for Linux 5.10

ext4-projid-xattrs.patch was missed during the landing/review
process for ldiskfs-5.10.0-ml.series.
We also need a small change in base/ext4-projid-xattrs.patch to
make it apply on v5.10.

Change-Id: I2b7a6c957bd8b40cf78dbd9f4680b722e8d4418a
Test-Parameters: trivial
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/46295
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15340 llite: Delay dput in ll_dirty_page_discard_warn 84/45784/4
Oleg Drokin [Wed, 8 Dec 2021 04:30:06 +0000 (23:30 -0500)]
LU-15340 llite: Delay dput in ll_dirty_page_discard_warn

Otherwise we can be final dput and need to wait for pages
to clear which is bad because this is called from ptlrpcd
that is not supposed to block esp. for network traffic as
it can cause livelocks if it happens to be needed to kill
the very same RPC we are waiting on.

Additionally pass in the inode from IO since the page
we are using might come from directio and that is
probably not even a valid inode.

Fixes: 624a3ac23393 ("LU-921 llite: warning in case of discarding dirty pages")
Change-Id: Ie2f1a34047145202c11a4e1a0b18b2e01d9e4601
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45784
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 years agoLU-15282 lod: less spinlock on the alloc rr 94/45694/9
Alexey Lyashkov [Wed, 1 Dec 2021 13:38:45 +0000 (16:38 +0300)]
LU-15282 lod: less spinlock on the alloc rr

Don't need to hold spinlock for so much time, anyway it's released
in middle of loop, so RR cannot be perfect in multithreaded case.

Fix small bug in RR precession for stripecount=4+OSTCOUNT=6.

Fixes: 665e36b780f ("OST pools on HEAD")
HPe-bug-id: LUS-10627
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I66eded451c8256de0e5a9a0eb862af8b306da9e1
Reviewed-on: https://review.whamcloud.com/45694
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15285 mdt: fix same-dir racing rename deadlock 76/45676/9
Oleg Drokin [Mon, 29 Nov 2021 21:45:16 +0000 (16:45 -0500)]
LU-15285 mdt: fix same-dir racing rename deadlock

With LU-12125 lifting the BFL for same directory rename,
a deadlock possibility opens up since we lock source and target
of rename in the source-target order, if there are two renames
racing to rename arguments in reverse order:
mv a b &
mv b a

a lock inversion happens and a deadlock has been observed.

To avert this - instill additional order requirement:
lower PDO hash value is to be locked ahead of the higher one.

Fixes: d76cc65d5d68 ("LU-12125 mds: allow parallel regular file rename")
Fixes: b50bb830f92e ("LU-3538 dne: Commit-on-Sharing for DNE")
Fixes: 9f1711f3d7d1 ("LU-12081 mdt: rename shouldn't PDO lock if parent is remote")
Change-Id: I88dd3aebb394ea40e97e6029d6dcc161116f982e
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45676
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
2 years agoLU-14645 utils: optimise setstripe 52/46152/5
Vitaly Fertman [Mon, 17 Jan 2022 19:54:12 +0000 (22:54 +0300)]
LU-14645 utils: optimise setstripe

skip some excessive checks:
- do not check the file is on lustre fs, the following ioctl does it;
- do not check the stripe-index is valid, done on MDS side;
- do not check the pool exists for a !PFL file (align with a setstripe
  for PFL files);

Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: Ia21f85c3ab73a970bad8d11e175c0063ab3a307f
Reviewed-on: https://review.whamcloud.com/46152
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14645 utils: fix API for llapi_sanity_check 51/46151/3
Vitaly Fertman [Mon, 17 Jan 2022 18:49:52 +0000 (21:49 +0300)]
LU-14645 utils: fix API for llapi_sanity_check

fix the previous patch which introduced a change in API.

Fixes: 149934fe28 ("LU-14645 utils: setstripe cleanup")
Signed-off-by: Vitaly Fertman <c17818@cray.com>
Change-Id: I43ae6822768ac70c9348af270c17830b13133f8c
Reviewed-on: https://review.whamcloud.com/46151
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13514 tests: replace nid in conf-sanity test_32 54/46354/2
Yang Sheng [Wed, 4 Nov 2020 18:36:43 +0000 (02:36 +0800)]
LU-13514 tests: replace nid in conf-sanity test_32

Need replace_nid for test_32a. Else the mdc cannot
be initialzed and prevent client mounting hung.

Test-Parameters: trivial
Test-Parameters: testlist=conf-sanity env=ONLY=32a,ONLY_REPEAT=20
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I651f5728ad4ff96a309ed599490c9dd6ed9c5274
Reviewed-on: https://review.whamcloud.com/40537
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46354
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15220 utils: use 'fallthrough' pseudo keyword for switch 70/46270/4
Jian Yu [Sun, 23 Jan 2022 02:28:56 +0000 (18:28 -0800)]
LU-15220 utils: use 'fallthrough' pseudo keyword for switch

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

Test-Parameters: trivial
Change-Id: Icace4c9953950f86d3c48068d8c6bba7dd1160a7
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46270
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
2 years agoLU-15220 lustre: use 'fallthrough' pseudo keyword for switch 69/46269/3
Jian Yu [Sun, 23 Jan 2022 02:21:04 +0000 (18:21 -0800)]
LU-15220 lustre: use 'fallthrough' pseudo keyword for switch

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

Test-Parameters: trivial
Change-Id: Icace4c9953950f86d3c48068d8c6bba7dd1160a6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46269
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
2 years agoLU-15220 lnet: use 'fallthrough' pseudo keyword for switch 66/45566/12
Jian Yu [Thu, 20 Jan 2022 18:19:34 +0000 (10:19 -0800)]
LU-15220 lnet: use 'fallthrough' pseudo keyword for switch

'/* fallthrough */' hits implicit-fallthrough error with GCC 11.

This patch replaces the existing '/* fallthrough */' comments and
its variants with the 'fallthrough' pseudo keyword, which was added
by Linux kernel commit v5.4-rc2-141-g294f69e662d1.

Test-Parameters: trivial
Change-Id: Icace4c9953950f86d3c48068d8c6bba7dd1160a5
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45566
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15220 tests: avoid gcc-11 -Werror=stringop-overread warning 77/45777/7
Jian Yu [Thu, 20 Jan 2022 19:06:42 +0000 (11:06 -0800)]
LU-15220 tests: avoid gcc-11 -Werror=stringop-overread warning

GCC 11 warns about string and memory operations on fixed address:

In function 'memcpy', inlined from 'obd_uuid2str' at
lustre/include/uapi/linux/lustre/lustre_user.h:1222:3,
include/linux/fortify-string.h:20:33: error: '__builtin_memcpy'
reading 39 bytes from a region of size 0 [-Werror=stringop-overread]
   20 | #define __underlying_memcpy     __builtin_memcpy
      |                                 ^
include/linux/fortify-string.h:191:16: note:
in expansion of macro '__underlying_memcpy'
  191 |         return __underlying_memcpy(p, q, size);
      |                ^~~~~~~~~~~~~~~~~~~

The patch avoids the above warning by not using a fixed address.

badarea_io.c:47:14: error: 'write' reading 5 bytes from a region
of size 0 [-Werror=stringop-overread]
   47 |         rc = write(fd, (void *)0x4096000, 5);
      |              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The patch avoids the above warning by making the pointer volatile
as suggested in:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99578#c16

Change-Id: I90b936835c6236a0f47e744013e3e480442f682c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45777
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15462 gnilnd: Fix syntax accessor to nid_addr 26/46226/2
Shaun Tancheff [Thu, 20 Jan 2022 03:23:58 +0000 (10:23 +0700)]
LU-15462 gnilnd: Fix syntax accessor to nid_addr

Minor typo breaking build of gnilnd

Test-Parameters: trivial
Fixes: 57c03f307075 ("LU-10391 lnet: extend nids in struct lnet_msg")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia8444d541c5ec175eacb2bf96d72e4b0fd80d19f
Reviewed-on: https://review.whamcloud.com/46226
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15455 tests: fix [: ==: unary operator expected 50/46150/2
Elena Gryaznova [Mon, 17 Jan 2022 17:47:22 +0000 (20:47 +0300)]
LU-15455 tests: fix [: ==: unary operator expected

PARALLEL is not initiallized in recovery-small.
Patch fixes the bash syntax error.

Fixes: 26e8f1137b ("LU-13116 mgc: do not lose sptlrpc config lock")
Fixes: 688d5da6a8 ("LU-12846 mdd: return error while delete failed")
Test-Parameters: trivial env=ONLY="141 143" testlist=recovery-small
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10679
Change-Id: I495bfd077edf3d15f1d47ccb4723e1de46de94e7
Reviewed-on: https://review.whamcloud.com/46150
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: James Nunez <jnunez@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15420 libcfs: replace deprecated CPU-hotplug functions 85/46085/2
Jian Yu [Thu, 13 Jan 2022 01:04:12 +0000 (17:04 -0800)]
LU-15420 libcfs: replace deprecated CPU-hotplug functions

Kernel 5.15 commit 8c854303ce0e38e5bbedd725ff39da7e235865d8
removed deprecated CPU-hotplug functions get_online_cpus()
and put_online_cpus(). They map directly to cpus_read_lock()
and cpus_read_unlock().

Change-Id: I09d489cd3ca9a575b20ea25f24210702fbfdd725
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/46085
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
2 years agoLU-15445 tests: sanity test_160p() fix 73/46073/2
Elena Gryaznova [Wed, 12 Jan 2022 17:03:02 +0000 (20:03 +0300)]
LU-15445 tests: sanity test_160p() fix

start() requires 2nd parameter device.
If start() is called without the 2nd parameter - the
empty mds1_dev is exported:
        eval export ${dev_alias}_dev=${device}
and test fails on failover setup with:
  CMD: lm0301 loop_dev=$(losetup -j  | cut -d : -f 1);
  lm0301: losetup: option requires an argument -- 'j'
dm_create_dev()
  local real_dev=<empty>
        -> setup_loop_device $facet <empty>
To reproduce the failure just run:
  ONLY=160p sh sanity.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: c7d8fe3106 ("LU-14731 mdd: clear orphans changelog entries")
Test-Parameters: trivial env=ONLY="160p" testlist=sanity
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10674
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I2661567672aa9c6e23b5f17500d81053cf9c9fdd
Reviewed-on: https://review.whamcloud.com/46073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15429 tests: mount_mds_client() fix 43/46043/2
Elena Gryaznova [Tue, 11 Jan 2022 17:23:30 +0000 (20:23 +0300)]
LU-15429 tests: mount_mds_client() fix

mount/umount client is to be executed on active facet/host,
not on mds1_HOST. Without this fix test_140a() fails on
failover setup:
  CMD: lm0101 umount /mnt/lustre2 2>&1
  CMD: lm0102 rmdir /mnt/lustre2
  lm0102: rmdir: failed to remove '/mnt/lustre2':
                 No such file or directory
  test_140a: FAIL: no clients with recovery disabled

To reproduce the failure just run:
  ONLY="107 140a" sh recovery-small.sh
on failover setup where mds1_HOST != mds1failover_HOST.

Fixes: 8bd04b4e57 ("LU-12722 target: disable recovery for local clients")
Test-Parameters: trivial env=ONLY="140a 140b" testlist=recovery-small
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10669
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ifbdedfda840e8421fa8a969f73131ca23982a28b
Reviewed-on: https://review.whamcloud.com/46043
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13639 tests: increase limit in sanity-quota t_2 43/45943/3
Sergey Cheremencev [Mon, 27 Dec 2021 16:45:51 +0000 (19:45 +0300)]
LU-13639 tests: increase limit in sanity-quota t_2

If limit is equal to the least_qunit, slave target may
preacquire more quota while creating "limit number" of files
causing EDQUOT. Change limit from soft_qunit to 2 soft_qunit.
The patch also changes test behaviour - createmany is devided
to 2 parts. Firstly, it creates (limit-least_qunit) nodes and
check that there is no EDQUOT. Then it creates least_qunit
nodes and ignore the result of creating - it is a valid case
if it hits the limit. And only after that check that we
can't create nodes over quota.

Change-Id: Iad7c1cc05119c8d3e0f1cfc2adffb276d79f18c7
Test-Parameters: testgroup=review-dne-zfs-part-4
Test-Parameters: testlist=sanity-quota env=ONLY=2, ONLY_REPEAT=200
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/45943
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15381 hsm: update size upon completion of data version 35/45935/4
Qian Yingjin [Fri, 17 Dec 2021 08:53:37 +0000 (16:53 +0800)]
LU-15381 hsm: update size upon completion of data version

We found a HSM retore followed by a HSM release will set the
file size with 0 wrongly during the tests.
The reason is that the file size and blocks information is
incorrect obtained via @ll_merger_attr().
The data version operation will flush dirty pages from all
clients, the size and blocks information returns from the Lustre
OST is correct.
In this patch, we update the size and block attributes for a file
upon the completion of the data version operation accordingly.
By this way, HSM release will set the size and blocks information
correctly after data version ioctl operation.

Add sanity-hsm test_261.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ifdbf6b58ecd00dc9677a2328438ef68529b72882
Reviewed-on: https://review.whamcloud.com/45935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Bobi Jam <bobijam@hotmail.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15366 nrs: increase maximum rate for tbf rule 38/45838/3
Etienne AUJAMES [Mon, 13 Dec 2021 15:30:35 +0000 (16:30 +0100)]
LU-15366 nrs: increase maximum rate for tbf rule

The maximum rpc rate for a tbf rule is 65535. This value could be
problematic for cluster with a large number of clients.

This patch uniformizes the usage of __u64 to store a rpc rate.
And changes the maximum rate for a tbf rule to 1000000 (1 rpc/us)

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I91fd416b9d91bbb5d5674c66ec8ceb0d77a9f7e0
Reviewed-on: https://review.whamcloud.com/45838
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-11643 tests: add new images and tests for upgrade tests 27/45827/3
Wei Liu [Fri, 10 Dec 2021 20:41:11 +0000 (12:41 -0800)]
LU-11643 tests: add new images and tests for upgrade tests

Add new images for conf-sanity.sh 32

disk2_10-ldiskfs.tar.bz2
disk2_12-ldiskfs.tar.bz2

Test-Parameters: trivial
Test-Parameters: fstype=ldiskfs envdefinitions=ONLY="32f 32g" testlist=conf-sanity

Change-Id: I6682e247308d7cf3fb57eee595751d6d140a421f
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45827
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15328 lnet: Set rc to ENOMEM in lnet_selftest_init on error 63/45763/2
Oleg Drokin [Tue, 7 Dec 2021 03:49:52 +0000 (22:49 -0500)]
LU-15328 lnet: Set rc to ENOMEM in lnet_selftest_init on error

Test-Parameters: trivial
Change-Id: I9d4eb7b830521ddd50f76544c38ebb0cd939800a
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/45763
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 years agoLU-15164 tests: remove wrong chattr -h check 97/45697/3
Elena Gryaznova [Wed, 1 Dec 2021 18:23:56 +0000 (21:23 +0300)]
LU-15164 tests: remove wrong chattr -h check

sanity-quota test_62() is always skipped because
of wrong chattr project ID support check.
chattr.c:
Usage: %s [-pRVf] [-+=aAcCdDeijPsStTuFx] [-v version] files...

Let's remove this check: e2fsprogs project ID support for
chattr/lsattr exists since 2016. If one is using so old chattr
- he will be forced to update by error "root failed to clear
inherit".

Fixes: 2d3bbce0c9 ("LU-11101 quota: fix setattr project check")
Test-Parameters: trivial testlist=sanity-quota env=ONLY=62
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9967
Change-Id: I0cfd98735e5e0b5956f3dd6385ce626584443bea
Reviewed-on: https://review.whamcloud.com/45697
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15200 llite: "lfs getdirstripe -D" shows inherit layout 70/45570/7
Lai Siyao [Sun, 7 Nov 2021 05:15:56 +0000 (01:15 -0400)]
LU-15200 llite: "lfs getdirstripe -D" shows inherit layout

Once system-wide default LMV is set, "lfs getdirstripe -D subdir"
should show inherited layout from it.

Add sanity 413e.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: If7354cb4093c58f6d56a6a4d449fb69a9deec7cc
Reviewed-on: https://review.whamcloud.com/45570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10378 utils: add formatted printf to lfs find 36/45136/5
Anjus George [Wed, 12 Jan 2022 06:30:03 +0000 (01:30 -0500)]
LU-10378 utils: add formatted printf to lfs find

Introduce new --printf option with lfs find utility along with support for
the backslash escapes and format directives given below that allow users to
obtain metadata in formatted style.

List of backslash escapes supported by --printf option:
-------------------------------------
   Description               | Escape
-------------------------------------
   Newline character         | \n
   Tab character             | \t
   Literal backslash         | \\

List of format directives used with --printf option:
----------------------------------------------------------
   Description                                  | Directive
----------------------------------------------------------
   Literal % character                          | %%
   Access time (in ctime format)                | %a
   Access time (in secs since epoch)            | %A@
   File size (in 512B blocks)                   | %b
   Last change time (in ctime format)           | %c
   Last change time (in secs since epoch)       | %C@
   Numeric group ID of file/dir owner           | %G
   File size (in 1K blocks)                     | %k
   File mode (octal)                            | %m
   Path name of file                            | %p
   File size (in bytes)                         | %s
   Modification time (in ctime format)          | %t
   Modification time (in secs since epoch)      | %T@
   Numeric user ID of file/dir owner            | %U
   Birth time (in ctime format)                 | %w
   Birth time (in secs since epoch)             | %W@
   File type | %y
   Stripe count                                 | %Lc
   Lustre FID | %LF
   Directory hash type                          | %Lh
   Starting OST (file) or MDT (dir) index       | %Li
   List of all OST (file) or MDT (dir) indices  | %Lo
   OST pool name                                | %Lp
   Numeric project id assigned to file/dir      | %LP
   Stripe size in bytes                         | %LS
---------------------------------------------------------
Note: Stripe size and OST pool name are not defined for
directories whereas Hash type is not defined for files.
%Li gives starting OST index for files and starting MDT index
for directories. For composite files %Lo provides list of all
OST indices for all components whereas %Lc, %LS, %Li and %Lp
provide details for last initialized component only.

A usage example for --printf option and its output for a composite
file with three components are shown below.

   lfs find --printf '%a | %t | %c | %w | %W@ | %b | %s | %U | %G |
   %A@ | %T@ | %C@ | %LP | %Lc | %LS | %Li | %Lo | %Lp | %pn'
   /lustre/lustre/composite.txt

   Tue Oct 26 16:06:18 2021 | Tue Oct 26 16:06:50 2021 | Tue Oct 26
   16:06:50 2021 | Tue Oct 26 16:06:18 2021 | 1635278778 | 204800 |
   104857600 | 0 | 0 | 1635278778 | 1635278810 | 1635278810 | 0 | 3 |
   2097152 | 2 | [1][2,0][2,0,1] | pool1 |
   /lustre/lustre/composite.txt

Change-Id: I370c0978900a4837b0ea3060e08dabb1fcb6e115
Signed-off-by: Anjus George <georgea@ornl.gov>
Signed-off-by: Rick Mohr <mohrrf@ornl.gov>
Reviewed-on: https://review.whamcloud.com/45136
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14948 build: Warn about /usr/src/lustre.tar.bz2 77/44677/5
Shaun Tancheff [Tue, 4 Jan 2022 15:22:57 +0000 (22:22 +0700)]
LU-14948 build: Warn about /usr/src/lustre.tar.bz2

When /usr/src/lustre.tar.bz2 exists, make debs (and dkms-debs)
will fail with an error like:

  Extracting the package tarball, /usr/src/lustre.tar.bz2, ...
  ../../generic.sh: line 73: debian/rules: Permission denied
  BUILD FAILED!

Add the current git hash to the lustre tarball, as well as
attempt to remove the conflict from /usr/src.  Failing that,
give a warning to ask the user to remove the conflicting file.

HPE-bug-id: LUS-10308
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I4aaa803cb81c2ed8ffc0182bb49ea0bff5064df4
Reviewed-on: https://review.whamcloud.com/44677
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Remove unnecessary page get/put 93/44293/8
Patrick Farrell [Fri, 30 Jul 2021 16:15:20 +0000 (12:15 -0400)]
LU-13799 llite: Remove unnecessary page get/put

Part of the aio cleanup code has the slightly strange
behavior of doing get on every page before calling page
cleanup, then doing a put after.

This was required because we call cl_page_list_del before
calling cl_page_delete, and cl_page_list_del was holding
the last reference on the page struct.

If we reverse the order, then we don't need the extra
get/put to keep the pages live.  This should save
significant CPU time in the ptlrpcd threads when finishing
i/o, since this removes a get/put on every page.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I3b1639061d775faa43c91e2d0a0f209f2d0df10c
Reviewed-on: https://review.whamcloud.com/44293
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13594 obdclass: Add OOM handler for obdclass 21/42121/7
Arshad Hussain [Sun, 21 Mar 2021 01:02:07 +0000 (06:32 +0530)]
LU-13594 obdclass: Add OOM handler for obdclass

This patch adds OOM handler for obdclass. The handler
currently only prints max memory that was used by obdclass
along with current memory being used before attempting
to kill the user process.

Currently, when the handler is kicked in the output under
dmesg would look like:

Output:
~~~~~~~~
...
Mar 21 07:02:02 devbox kernel: obd_memory max: 244859953, obd_memory current: 0
...

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I0259f800b1f219ff3427f1d2a17b6a874dd456d3
Reviewed-on: https://review.whamcloud.com/42121
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13911 tests: take osc max_rpcs_in_flight limit into account 87/39687/3
Vladimir Saveliev [Wed, 10 Oct 2018 12:08:49 +0000 (08:08 -0400)]
LU-13911 tests: take osc max_rpcs_in_flight limit into account

max_rpcs_in_flight for osc is limited to 256. sanity.sh:test_115()
tries to set it to ost.OSS.ost_io.threads_started * 4. The test should
make sure that it does not exceed 256.

HPE-bug-id: LUS-5917
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I38929e1ed0fe7855e7e60ea43742740c01ae1bd8
Reviewed-on: https://review.whamcloud.com/39687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Do not get/put DIO pages 38/39438/27
Patrick Farrell [Fri, 30 Jul 2021 16:14:16 +0000 (12:14 -0400)]
LU-13799 llite: Do not get/put DIO pages

We've already told the kernel we're working with these pages
using the get/put_user_pages functions, and userspace must
hold references on them throughout the i/o anyway.

So getting/putting these vmpages is unnecessary.  This
saves around 7% of the time in DIO page submission, netting
about that much of a performance improvement.

This patch reduces i/o time in ms/GiB by:
Write: 22 ms/GiB
Read: 19 ms/GiB

Totals:
Write: 135 ms/GiB
Read: 143 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     6470 MiB/s
read      6354 MiB/s

Plus this patch:
write     7531 MiB/s
read      7179 MiB/s

Signed-off-by: Patrick Farrel <pfarrell@whamcloud.com>
Change-Id: Ic457c21ebca9624da2422463da453b535dcfd10e
Reviewed-on: https://review.whamcloud.com/39438
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-13799 llite: Move free user pages 43/39443/29
Patrick Farrell [Fri, 30 Jul 2021 16:13:45 +0000 (12:13 -0400)]
LU-13799 llite: Move free user pages

It is incorrect to release our reference on the user pages
before we're done with them - We need to keep it until the
i/o is complete, otherwise we access them after releasing
our reference.  This has not caused any known bugs so far,
but it's still wrong.

So only drop these references when we free the aio struct,
which is only freed once i/o is complete.

Also rename free_user_pages to release_user_pages, because
it does not free them - it just releases our reference.

This also helps performance by moving free_user_pages to
the daemon threads.  This is a 5-10% boost.

This patch reduces i/o time in ms/GiB by:
Write: 18 ms/GiB
Read: 19 ms/GiB

Totals:
Write: 180 ms/GiB
Read: 178 ms/GiB

mpirun -np 1  $IOR -w -r -t 64M -b 64G -o ./iorfile --posix.odirect

With previous patches in series:
write     5183 MiB/s
read      5201 MiB/s

Plus this patch:
write        5702 MiB/s
read         5756 MiB/s

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ibfe808611bbe6743a1b5fe3aa6a8d42691256d22
Reviewed-on: https://review.whamcloud.com/39443
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yingjin Qian <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-12130 lod: make pool inheritance policy more consistent 36/34536/7
Vladimir Saveliev [Fri, 24 Dec 2021 21:02:13 +0000 (00:02 +0300)]
LU-12130 lod: make pool inheritance policy more consistent

If directory's striping includes pool info, setstriping behaves
differently in relation to pool inheritance:
- if setstriping non-PFL layout the pool is inherited
- otherwise, it is not

Make inheritance policy consistent:
- when specified PFL does not include pool information - embed current
  pool specification into new layout

sanity.sh:test_65n is modified to illustrate the case.

HPE-bug-id: LUS-7180
Signed-off-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Change-Id: I92b415e18ba7aadd2059da702878905249dd33c3
Reviewed-on: https://review.whamcloud.com/34536
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-14651 build: remove KALLSYMS build requirement 70/46070/2
James Simmons [Wed, 12 Jan 2022 16:55:46 +0000 (11:55 -0500)]
LU-14651 build: remove KALLSYMS build requirement

Now that kallsyms is no longer exported some distros kernels are
disabling it by default. If kallsyms is disabled lustre will fail
configure. Remove this hard requirmenet.

Test-Parameters: trivial
Change-Id: I710433e99afd75eea6a3bf1d77878b97beaed605
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/46070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-15403 tests: fix some false alarm 58/45958/2
Alexey Lyashkov [Thu, 30 Dec 2021 13:04:59 +0000 (16:04 +0300)]
LU-15403 tests: fix some false alarm

Current test implementation have a two bugs.
1) client mgc llogs processing is async to mount,
so we can start a lock check before all locks processed.
it caused a false alarm. Same story with several client mounts.

2) Server locks counting is unsafe, as it include an other server
locks. so any servers reconnect may cause a false alarm.

Let's fix it.

HPe-bug-id: LUS-8326
Test-Parameters: trivial testlist=sanityn
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: I59d6e5deb79ca9f040385231738b8698a3309e8e
Reviewed-on: https://review.whamcloud.com/45958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change lnet_del_route() to take lnet_nid 15/43615/11
Mr NeilBrown [Tue, 31 Aug 2021 13:54:33 +0000 (09:54 -0400)]
LU-10391 lnet: change lnet_del_route() to take lnet_nid

The gateway NID passed to lnet_del_route is now a struct lnet_nid.
Instead of passing LNET_NID_ANY as a wildcard, we pass
a NULL pointer.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I1243be20d9f40e4ac3ebc6ec5dd9bbcbae6653c3
Reviewed-on: https://review.whamcloud.com/43615
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: Fix NULL-deref in lnet_nidstr_r() 38/44838/7
Mr NeilBrown [Fri, 3 Sep 2021 03:22:17 +0000 (13:22 +1000)]
LU-10391 lnet: Fix NULL-deref in lnet_nidstr_r()

It is valid to pass NULL as the nid for lnet_nidstr_r() - it indicate
"any" nid.  LNET_NID_IS_ANY() tests for this and the function exits
early.

However, 'lnd' is assigned from "nid->nid_type" and 'nnum' from
"nid->nid_num", causing a NULL-pointer dereference.

So move these assignments later.

Fixes: 82a17076f880 ("LU-10391 lnet: introduce struct lnet_nid")
Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ie29dd4d0ef7fac0f11c1ece714278a7dd9860602
Reviewed-on: https://review.whamcloud.com/44838
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: change src_nid arg to lnet_parse() to 16byte 14/43614/9
Mr NeilBrown [Fri, 7 Jan 2022 01:13:58 +0000 (20:13 -0500)]
LU-10391 lnet: change src_nid arg to lnet_parse() to 16byte

lnet_parse() now gets the source nid as 'struct lnet_nid *'.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7afac71c97e4564e544695f057fd0b002d97afc9
Reviewed-on: https://review.whamcloud.com/43614
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: convert nids in lnet_parse to lnet_nid 13/43613/10
Mr NeilBrown [Sat, 11 Sep 2021 14:20:51 +0000 (10:20 -0400)]
LU-10391 lnet: convert nids in lnet_parse to lnet_nid

src_nid and dest_nid in lnet_parse() are changed to
struct lnet_nid, and this change propagates out to
affect a few support function.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Ic6922d3f643e493f92f8f64974ad30f66457e842
Reviewed-on: https://review.whamcloud.com/43613
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: Convert ping to support 16-bytes address 12/43612/10
Mr NeilBrown [Thu, 9 Jul 2020 06:35:58 +0000 (16:35 +1000)]
LU-10391 lnet: Convert ping to support 16-bytes address

Now that ksocknal can send hello messages with 16-byte address, we can
change lnet_send_ping() to ping hosts with large-address nids.

Note that this doesn't change the addresses in the ping message sent,
only the sending and receiving of the message.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I6f591c2f053698876195575c71da42f64788637e
Reviewed-on: https://review.whamcloud.com/43612
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: add hello message version 4 11/43611/10
Mr NeilBrown [Tue, 28 Apr 2020 04:57:09 +0000 (14:57 +1000)]
LU-10391 socklnd: add hello message version 4

KSOCK_PROTO_V4 uses a 'hello' message that contains
lnet_hdr_nid16 with 16 byte addresses

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I52a36739d3a84dc059537059a586ce3dab2b20f0
Reviewed-on: https://review.whamcloud.com/43611
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: Change ksock_hello_msg to struct lnet_nid 10/43610/10
Mr NeilBrown [Tue, 7 Jul 2020 04:03:23 +0000 (14:03 +1000)]
LU-10391 socklnd: Change ksock_hello_msg to struct lnet_nid

'struct ksock_hello_msg' now stores 'struct lnet_nid', but it is
converted to 'struct ksock_hello_msg_nid4' - the old format - for
transmit, which is converted back on receive.

This opens the way for a new version of the hello protocol
which will use 16byte addresses.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I22e86f9088f6001f203f24f93ef292fcf2a8e69f
Reviewed-on: https://review.whamcloud.com/43610
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 socklnd: move lnet_hdr unpack into ->pro_unpack 09/43609/10
Mr NeilBrown [Mon, 6 Jul 2020 05:07:24 +0000 (15:07 +1000)]
LU-10391 socklnd: move lnet_hdr unpack into ->pro_unpack

Converting the lnet_hdr from network-format to host-format
is currently done in ksocknal_process_recv().
Move it to ->pro_unpack() so that a different protocol
can send it in a different format.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: Icc22f4b52c1391d382c28bad157795f5477f4d7c
Reviewed-on: https://review.whamcloud.com/43609
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 years agoLU-10391 lnet: alter lnd_notify_peer_down() to take lnet_nid 08/43608/10
Mr NeilBrown [Mon, 6 Jul 2020 01:47:56 +0000 (11:47 +1000)]
LU-10391 lnet: alter lnd_notify_peer_down() to take lnet_nid

The lnd_notify_peer_down() interface now takes a large nid.

Test-Parameters: trivial
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I9926caf0508ff257e9e64d5537597addbce657d7
Reviewed-on: https://review.whamcloud.com/43608
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>