Whamcloud - gitweb
fs/lustre-release.git
14 months agoNew tag 2.15.54 2.15.54 v2_15_54
Oleg Drokin [Thu, 9 Feb 2023 17:38:57 +0000 (12:38 -0500)]
New tag 2.15.54

Change-Id: I592cabccefa9bbdf3d1d97fa313103b8b1b1eb3b
Signed-off-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16500 utils: set default ost index for lfs migrate 19/49819/3
Jian Yu [Wed, 1 Feb 2023 07:11:56 +0000 (23:11 -0800)]
LU-16500 utils: set default ost index for lfs migrate

Running "lfs migrate <file>" without any SETSTRIPE arguments
to balance space usage keeps the PFL file layout, but preserves
the OST selection exactly, which makes the migration virtually
useless for space balancing.

This patch fixes the above issue by clearing the specific
OST indices from the source layout before using the layout to
create the volatile file in lfs_migrate().

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I82e1dc0a11fdda7d555df994cf4e5f6e3dbdcb5c
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49819
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-930 ptlrpc: clarify AT error message 48/49548/5
Aurelien Degremont [Tue, 18 Jan 2022 13:55:01 +0000 (13:55 +0000)]
LU-930 ptlrpc: clarify AT error message

Clarify the error message related to passed deadline
for AT early replies. It was indicating that the system
was CPU bound which is most of the time wrong, as the issue
is rather communication failure delaying RPC traffic.
This could be confusing to people which will look for
CPU resource consumption where the network traffic is
more at cause.

Also try to use less cryptic keywords which makes only
sense to the feature developer, and not to admins.

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: Icdff8f4c6fb9905233f6b8ed1b961b2fd1127667
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49548
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16367 utils: clean up ldiskfs feature handling 16/49316/2
Andreas Dilger [Mon, 5 Dec 2022 18:59:02 +0000 (11:59 -0700)]
LU-16367 utils: clean up ldiskfs feature handling

Update the default ldiskfs features used by mkfs.lustre:
- enable large_dir on OSTs as well as MDTs
- remove obsolete handling of "ext3" filesystems
- clean up handling of other features that have become a bit messy

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id717c3ba939ccf9b2de34e868d4415e88429ef39
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49316
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16221 kernel: new kernel [RHEL 9.1 5.14.0-162.12.1.el9_1] 38/48938/9
Jian Yu [Fri, 27 Jan 2023 20:34:11 +0000 (12:34 -0800)]
LU-16221 kernel: new kernel [RHEL 9.1 5.14.0-162.12.1.el9_1]

This patch makes changes to support new RHEL 9.1 release
for Lustre client.

Test-Parameters: trivial clientdistro=el9.1 \
env=SANITY_EXCEPT="130 244a" testlist=sanity

Change-Id: I8af730f84c9ddf9dcb7e3ddfbd24a68173f51e8d
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48938
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
14 months agoLU-16510 build: fortified memcpy from linux 6.1 11/49811/7
Shaun Tancheff [Mon, 30 Jan 2023 04:17:12 +0000 (22:17 -0600)]
LU-16510 build: fortified memcpy from linux 6.1

The fortified memcpy() from Linux v5.11-11104-ga28a6e860c6c
through v5.18-rc5-1405-g43213daed6d6 incorrectly reports
a false positive out of bounds check.

In function 'memcpy' ...
  '__read_overflow2' declared with attribute error: detected
   read beyond size of object passed as 2nd parameter

Test-Parameters: trivial
HPE-bug-id: LUS-11459
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3a59d8b647833c05ff4b51e327ed8bce894141fe
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49811
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
14 months agoLU-16292 llite: delete_from_page_cache not exported 69/49069/14
Shaun Tancheff [Thu, 19 Jan 2023 07:38:02 +0000 (01:38 -0600)]
LU-16292 llite: delete_from_page_cache not exported

Linux commit v5.16-rc4-44-g452e9e6992fe
filemap: Add filemap_remove_folio and __filemap_remove_folio

Directly removing a folio/page from the page cache is not
available.

Fallback to generic_error_remove_page for regular files,
and truncate_inode_pages_range as appropriate.

Test-Parameters: trivial
HPE-bug-id: LUS-11198
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I634e7d7719d497ce035a78b424be8e9e8c5a8104
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49069
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
14 months agoLU-16188 mdt: fix incompatible HSM request handling 58/48658/6
Aurelien Degremont [Mon, 26 Sep 2022 12:27:37 +0000 (12:27 +0000)]
LU-16188 mdt: fix incompatible HSM request handling

When the coordinator tries to send multiple hsm actions in
a single request, if one of the request fails incompat checks all the
requests are marked as STARTED but none of the requests are
sent to the agent.

Return EAGAIN from mdt_agent_hsm_send() so that the coordinator would
not mark the requests as STARTED. It would retry them later.

Add a sanity-hsm test.

Test-Parameters: trivial testlist=sanity-hsm
Change-Id: Id4fb858021be6dc6b0cbcf140c3f2051efce57ad
Signed-off-by: Jeya Ganesh Babu Jegatheesan <jeyaga@amazon.com>
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48658
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16118 build: Workaround __write_overflow_field errors 64/48364/23
Shaun Tancheff [Sun, 22 Jan 2023 17:43:29 +0000 (11:43 -0600)]
LU-16118 build: Workaround __write_overflow_field errors

Linux commit v5.17-rc3-1-gf68f2ff91512
   fortify: Detect struct member overflows in memcpy() at compile-time

memcpy and memset of collections of struct members
will trigger:

error: call to ‘__write_overflow_field’ declared with attribute
   warning: detected write beyond size of field (1st parameter);
   maybe use struct_group()?
   [-Werror] __write_overflow_field(p_size_field, size);

Test-Parameters: trivial
HPE-bug-id: LUS-11194
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iacd1ab03d1b90ce62b5d7b65e1cd518a5f7981f2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48364
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16354 ldiskfs: RHEL9.1 server support 83/49283/9
Shaun Tancheff [Sat, 21 Jan 2023 06:16:25 +0000 (00:16 -0600)]
LU-16354 ldiskfs: RHEL9.1 server support

ldiskfs patch series for RHEL9.1

Test-Parameters: trivial
HPE-bug-id: LUS-11332
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ia0757995ac7200eb50fadf5e106fe1d7b3dc0443
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49283
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
14 months agoLU-16477 ldiskfs: Add ext4-enc-flag patch for RHEL9 35/49635/7
Shaun Tancheff [Fri, 20 Jan 2023 15:27:15 +0000 (09:27 -0600)]
LU-16477 ldiskfs: Add ext4-enc-flag patch for RHEL9

Update ext4-enc-flag for linux 5.14 and include it
the 5.14 based RHEL9 and SUSE 15 SP4 ldiskfs series

Test-Parameters: trivial
HPE-bug-id: LUS-11442
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iaf4ba914fafe6a9e4ad58b74ae63343bb2918a44
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49635
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
14 months agoLU-15728 llite: fix relatime support 17/47017/16
Aurelien Degremont [Thu, 7 Apr 2022 12:58:00 +0000 (12:58 +0000)]
LU-15728 llite: fix relatime support

relatime behavior is properly managed by VFS, however
Lustre also stores acmtime on OST objects and atime
updates for OST objects should honor relatime behavior.

This patch updates 'ci_noatime' feature which was introduced to
properly honor noatime option for OST objects, to also support
'relatime'.
file_is_noatime() code already comes from upstream touch_atime().
Add missing parts from touch_atime() to also support relatime.

It also forces atime to disk on MDD if ondisk atime is older than
ondisk mtime/ctime to match relatime (even if relatime is not enabled)

Add a new test for relatime feature.

Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I7a26f39841300a60c015944f9e544115b4446ead
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47017
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-6142 ldlm: minor list_entry improvements in ldlm_request.c 38/49738/3
Mr. NeilBrown [Mon, 23 Jan 2023 21:51:18 +0000 (16:51 -0500)]
LU-6142 ldlm: minor list_entry improvements in ldlm_request.c

Small clarify improvements, and one local variable avoided.

Linux-commit: cb830bef04f1bd80da7eca3d3edaea590f4b350b

Change-Id: I1a34849adca228a465a2b771fb0aa707a9283c7c
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49738
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ldlm: use list_for_each_entry in ldlm_lock.c 34/49734/4
Mr. NeilBrown [Mon, 23 Jan 2023 21:24:59 +0000 (16:24 -0500)]
LU-6142 ldlm: use list_for_each_entry in ldlm_lock.c

This makes some slightly-confusing code a bit clearer, and
avoids the need for 'tmp'.

Linux-commit: 557d001aa51fd6171d7a68dec21f8327fc824173

Change-Id: If9d070492e0016fa235fb38726f7c7a3b380d580
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49734
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-12275 tests: skip new nodemap params on old MGS 28/49828/3
Andreas Dilger [Mon, 30 Jan 2023 21:46:37 +0000 (14:46 -0700)]
LU-12275 tests: skip new nodemap params on old MGS

Skip setting forbid_encryption and readonly_mount parameters on old
MGSes that do not support these options.  Otherwise test_61 failures
are seen during interop testing.  Running test_36 would also fail in
this case, except that it is already skipped due to encryption checks.

Test-Parameters: trivial testlist=sanity-sec
Fixes: 598c48707c ("LU-12275 tests: exercise file content encryption/decryption")
Fixes: e7ce67de92 ("LU-15451 sec: read-only nodemap flag")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I94f2e2f609927fea618a3a22f103bd32ae3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49828
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16412 llite: check read page past requested 23/49723/9
Qian Yingjin [Fri, 20 Jan 2023 17:30:27 +0000 (12:30 -0500)]
LU-16412 llite: check read page past requested

Due to a kernel bug introduced in 5.12 in commit:
cbd59c48ae2bcadc4a7599c29cf32fd3f9b78251
("mm/filemap: use head pages in generic_file_buffered_read")
if the page immediately after the current read is in cache,
the kernel will try to read it.

This attempts to read a page past the end of requested
read from userspace, and so has not been safely locked by
Lustre.

For a page after the end of the current read, check wether
it is under the protection of a DLM lock. If so, we take a
reference on the DLM lock until the page read has finished
and then release the reference.  If the page is not covered
by a DLM lock, then we are racing with the page being
removed from Lustre.  In that case, we return
AOP_TRUNCATED_PAGE, which makes the kernel release its
reference on the page and retry the page read.  This allows
the page to be removed from cache, so the kernel will not
find it and incorrectly attempt to read it again.

NB: Earlier versions of this description refer to stripe
boundaries, but the locking issue can occur whether or
not the page is on a stripe boundary, because dlmlocks
can cover part of a stripe.  (This is rare, but is
allowed.)

Change-Id: Ib93bd0624fda0ed1c2b89f609d15208c86e21c29
Signed-off-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49723
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16457 tests: wait for remote sleep in sanity-pcc/101a 87/49587/9
Andreas Dilger [Tue, 10 Jan 2023 15:37:03 +0000 (08:37 -0700)]
LU-16457 tests: wait for remote sleep in sanity-pcc/101a

Wait longer for the remote sleep command to start on the agent node.

Test-Parameters: trivial testlist=sanity-pcc env=ONLY=101a,ONLY_REPEAT=200
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5dcbd6a7127b3e17aa658c87f5c75874432dc353
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49587
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16159 osp: destroy should not overtake writes 87/49787/5
Alex Zhuravlev [Thu, 26 Jan 2023 07:34:25 +0000 (10:34 +0300)]
LU-16159 osp: destroy should not overtake writes

use transaction versioning for object destroy so that
destroy doesn't overtake writes, so writes don't hit
non-existing objects.

Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Test-Parameters: mdscount=2 mdtcount=4 testlist=replay-single env=ONLY="70b 71a 119",ONLY_REPEAT=10
Fixes: b054fcd785 ("LU-16159 lod: cancel update llogs upon recovery abort")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Iec2a5c72f27825820d36ebbe20d55fa303358982
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49787
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
14 months agoLU-16431 mds: Close request is dropped during replay 06/49506/4
Andriy Skulysh [Mon, 21 Mar 2022 12:00:59 +0000 (14:00 +0200)]
LU-16431 mds: Close request is dropped during replay

MDS_CLOSE can have the same transno with SETATTR update.
But it still needs to be processed to close the file.

Change-Id: I44c8e10c5e30f2dca4fab4d49a74d147495640c2
HPE-bug-id: LUS-10838
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49506
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16392 utils: use --list-commands for bash completion 84/49484/9
Thomas Bertschinger [Wed, 21 Dec 2022 16:52:50 +0000 (11:52 -0500)]
LU-16392 utils: use --list-commands for bash completion

The CLI utils lctl and lfs currently use a pseudo option
--non-existent-option to generate a list of completions. However, this
was broken when the help output for an invalid command was changed.
Using --list-commands instead means that the format of the help output
can be kept succinct.

However, currently there are 2 issues that make --list-commands
unsuitable.

First, --list-commands truncates long commands. This commit resolves
this by not truncating long commands, and removing the fixed-length
char buffer and writing directly to stdout so that the line length
can overflow slightly if needed.

Second, --list-commands recursively displays sub-commands. For
example, for `lctl`, it will display `pcc add`, `pcc del`, etc in
additon to just `pcc`. The bash completion tools would view these
as separate tokens and thus would inappropriately suggest `add`,
`del`, etc. as completions for `lctl`. This commit removes the
recursive behavior.

Removing the recursive behavior resolves an unrelated bug with the
recursion that can be observed for `lctl`, where a number of
top-level commands are skipped following recursion into a previous
sub-command, equal to the number of subcommands processed in the
recursive call. Specifically, the commands in the section "device
setup", e.g. `attach`, `detach`, were not displayed following the
recursive call into `pcc`.

Finally, this commit changes the command parser to recognize --help
and print the list of commands when this argument is seen.

Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout or short cmd error")
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Ib6e139402b9cd18e5a54b8fd3d6a2652d301e736
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49484
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16382 spec: Declare correct license 63/49363/4
Mr NeilBrown [Mon, 12 Dec 2022 04:20:08 +0000 (15:20 +1100)]
LU-16382 spec: Declare correct license

Lustre is primarily licensed under GPL-v2.  Some files claim v2+,
others claim v2-only, but all are consistent with v2.

liblustreapi is LGPL2.1+

So make that explicit in lustre.spec.  All 'kmp' packages are
GPL-v2-only, all the rest add "AND LGPL-2.1-or-later.

The Open Build Service complains that "GPL" is too vague.

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I4f10c50a39b5b48fed71b179bc888b0ae144444e
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49363
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16310 sec: Lustre/HSM on enc file with enc key 53/49153/7
Sebastien Buisson [Mon, 14 Nov 2022 16:28:36 +0000 (17:28 +0100)]
LU-16310 sec: Lustre/HSM on enc file with enc key

Support for Lustre/HSM on encrypted files when the encryption key is
available requires similar attention as with file migration.
The volatile file used for HSM restore must have the same encryption
context as the Lustre file being restored, so that file content
remains accessible after the layout swap at the end of the restore
procedure.

Please note that using Lustre/HSM with the encryption key creates
clear text copies of encrypted files on the HSM backend storage.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I99cba202cd2c7c747bbe5c4ec7d9208c7f6baf4b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49153
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
14 months agoLU-16205 sec: fid2path for encrypted files 30/48930/8
Sebastien Buisson [Thu, 3 Nov 2022 10:52:02 +0000 (11:52 +0100)]
LU-16205 sec: fid2path for encrypted files

Add support of fid2path for encrypted files. Server side returns raw
encrypted path name to client, which needs to process the returned
string. This is done from top to bottom, by iteratively decrypting
parent name and then doing a lookup on it, so that child can in turn
be decrypted.

For encrypted files that do not have their names encrypted, lookups
can be skipped. Indeed, name decryption is a no-op in this case, which
means it is not necessary to fetch the encryption key associated with
the parent inode.

Without the encryption key, lookups are skipped for the same reason.
But names have to be encoded and/or digested. So server needs to
insert FIDs of individual path components in the returned string.
These FIDs are interpreted by the client to build encoded/digested
names.

Add sanity-sec test_63 to exercise this new capability.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I165bf2e5657037ae2e25c9378e4713537ea94bec
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48930
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-13703 utils: fix lfs_migrate with PFL arguments 45/39145/21
Andreas Dilger [Wed, 17 Jun 2020 10:14:39 +0000 (04:14 -0600)]
LU-13703 utils: fix lfs_migrate with PFL arguments

Pass the '-c', '-S', and '--pool' options to "lfs migrate" when
they are part of a PFL component (after -E), rather than using
them to set the stripe_count and stripe_size of the whole file.

This precludes using '-A' and '-R' with explicitly specified PFL
file layouts, but that didn't make sense in the first place.

Fix the handling of "--pool <pool>" to use "-p <pool>" since
the script later only strips "-p " from the pool name.

Test-Parameters: trivial
Fixes: 60c5bc25025 ("LU-8235 scripts: pass unrecognized options to lfs migrate")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib7fb6e08d81dbae77e8348fc5f09837c612540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/39145
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-13475 utils: disable lfs_migrate rsync and warning 14/40614/6
Andreas Dilger [Wed, 11 Nov 2020 21:53:22 +0000 (14:53 -0700)]
LU-13475 utils: disable lfs_migrate rsync and warning

The --rsync option is no longer enabled by default for fallback if
'lfs migrate' fails for some reason, and is mandatory for rsync usage.
The warning message and "-y" option of lfs_migrate is no longer needed
if rsync is not used, and is only shown if --rsync is used.

Remove the LFS_MIGRATE_RSYNC_MODE variable that was used for tests
and instead pass the "--rsync" option directly when needed.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I70b70d969f2dc8b4836c6c7692e6a73a0e2540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/40614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ldlm: use list_for_each_entry in ldlm_resource.c 39/49739/2
Mr. NeilBrown [Mon, 23 Jan 2023 21:55:55 +0000 (16:55 -0500)]
LU-6142 ldlm: use list_for_each_entry in ldlm_resource.c

Having a stand-alone "list_entry()" call is often a sign
that something like "list_for_each_entry()" would
make the code clearer.

Linux-commit: 5eb50608ed0fa076d2783898055fb20934a3828c

Change-Id: I5abd6cc7ec0abd31acc55f5af58f440c4f7609a7
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49739
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ldlm: use list_first_entry in ldlm_lockd.c 37/49737/2
Mr. NeilBrown [Mon, 23 Jan 2023 21:45:05 +0000 (16:45 -0500)]
LU-6142 ldlm: use list_first_entry in ldlm_lockd.c

This is only a small simplification, but it makes the code
a little clearer.

Linux-commit: 7378caf4fe5198ce572654c926437fba12fb2255

Change-Id: Ie65049e12a1b1bbe448baefc38a6657d831e0670
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49737
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 lustre: obdclass: simplify cl_lock_fini() 36/49736/2
Mr. NeilBrown [Mon, 23 Jan 2023 21:33:34 +0000 (16:33 -0500)]
LU-6142 lustre: obdclass: simplify cl_lock_fini()

Using list_first_entry_or_null() makes this (slightly)
simpler.

Linux-commit: 988b9ea9129bc24baf36ee421feb823285f234c4

Change-Id: Ic2fe2bb58b67781c8bc7b4e81cbf6b61dcaa56fb
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49736
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 lov: simplfy lov_finish_set() 35/49735/2
Mr. NeilBrown [Mon, 23 Jan 2023 21:29:57 +0000 (16:29 -0500)]
LU-6142 lov: simplfy lov_finish_set()

When deleting everything from a list, a while loop
is cleaner than list_for_each_safe().

Linux-commit: dff162689a4061ff30d3a05f9d790e375c06ab8f

Change-Id: I90d98ebf14f461796d6f9d31a2c62de1520034cc
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49735
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16349 o2iblnd: Fix key mismatch issue 14/49714/3
Dean Luick [Thu, 19 Jan 2023 20:38:04 +0000 (21:38 +0100)]
LU-16349 o2iblnd: Fix key mismatch issue

If a pool memory region (mr) is mapped then unmapped without being
used, its key becomes out of sync with the RDMA subsystem.

At pool mr map time, the present code will create a local
invalidate work request (wr) using the mr's present key and then
change the mr's key.  When the mr is first used after being mapped,
the local invalidate wr will invalidate the original mr key, and
then a fast register wr is used with the modified key.  The fast
register will update the RDMA subsystem's key for the mr.

The error occurs when the mr is never used.  The next time the mr
is mapped, a local invalidate wr will again be created, but this
time it will use the mr's modified key.  The RDMA subsystem never
saw the original local invalidate, so now the RDMA subsystem's
key for the mr and o2iblnd's key for the mr are out of sync.

Fix the issue by tracking if the invalidate has been used.
Repurpose the boolean frd->frd_valid.  Presently, frd_valid is
always false.  Remove the code that used frd_valid to conditionally
split the invalidate from the fast register.  Instead, use frd_valid
to indicate when a new invalidate needs to be generated.  After a
post, evaluate if the invalidate was successfully used in the post.

These changes are only meaningful to the FRWR code path.  The failure
has only been observed when using Omni-Path Architecture.

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I532a11f10ae6a5917a4c054f37747d08eb4d6331
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49714
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-13482 utils: bandwidth limit for lfs migrate 20/49620/4
Timothy Day [Tue, 10 Jan 2023 04:55:47 +0000 (04:55 +0000)]
LU-13482 utils: bandwidth limit for lfs migrate

Add an option -W to control how much bandwidth
an lfs migrate job can consume. The migrate job
will periodically sleep to meet the bandwidth
restrictions.

This patch also adds a --stats option. The option
produces regular logs entries tracking the progress
of the migrate job. The logs are output in YAML
format. The frequency of the logs is controlled
by --stats-interval. This interval defaults to 5
seconds.

Also included are two tests, 56xh and 56xi. The
first verifies the functionality of the bandwidth
control. The second checks that the output is in
valid YAML and that the stats get printed without
using -W.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic71cceb2434a737e3ad8bd325f719e37a70b0047
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-6142 ldlm: Fix style issues for ldlm_extent.c 36/49536/6
Arshad Hussain [Fri, 23 Dec 2022 12:26:27 +0000 (17:56 +0530)]
LU-6142 ldlm: Fix style issues for ldlm_extent.c

This patch fixes issues reported by checkpatch
for file lustre/ldlm/ldlm_extent.c

Test-Parameters: trivial
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9cecd1f377f33f3d4129cddcd7b59c3a7c003e04
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49536
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
14 months agoLU-14980 lfsck: lock object in __lfsck_layout_update_pfid() 23/44823/28
Alex Zhuravlev [Thu, 2 Sep 2021 15:50:19 +0000 (18:50 +0300)]
LU-14980 lfsck: lock object in __lfsck_layout_update_pfid()

once the transaction has been started

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie43fe89009a123c88eb0e202ec961b52157e56c6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/44823
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-14692 tests: restore sanity/312 to always_except 20/49720/2
Andreas Dilger [Fri, 20 Jan 2023 06:22:18 +0000 (23:22 -0700)]
LU-14692 tests: restore sanity/312 to always_except

The sanity test_312 was incorrectly removed from ALWAYS_EXCEPT.

Fixes: eaae465556 ("LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000")
Test-Parameters: trivial testlist=sanity fstype=zfs
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6e8ed42561809b28fd6d5b4f7ee1104080ebe756
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49720
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16492 tests: sanity/398b variable used without assignment 87/49687/2
Arshad Hussain [Thu, 19 Jan 2023 02:41:43 +0000 (08:11 +0530)]
LU-16492 tests: sanity/398b variable used without assignment

This patch initilizes 'before' variable with UNIX
timestamp. Variable 'before' was used without assigning
any value.

Test-Parameters: trivial testlist=sanity env=ONLY=398b
Fixes: b4880f37582a ("LU-15483 tests: Improve test 398b")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Iba9e361735272d9c640a115f520ee7c60ac41239
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49687
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16425 tests: skip interop recovery-small/144a/144b 79/49679/6
Andreas Dilger [Wed, 18 Jan 2023 18:26:09 +0000 (11:26 -0700)]
LU-16425 tests: skip interop recovery-small/144a/144b

Skip recovery-small test_144a and test_144b for old MDS
missing the fix and for its corresponding test.

Fixes: 240938f7b1 ("LU-8367 tests: cleanup_orphans hang reproducer")
Fixes: aa6250b741 ("LU-15724 tests: MDT failover hang reproducer")
Test-Parameters: trivial testlist=recovery-small env=ONLY=144 serverversion=2.14.0
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I77bfdf55d0218aa9e252f742cc90f1c61216d506
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-15581 misc: update .gitignore files 32/49632/3
Timothy Day [Fri, 13 Jan 2023 19:33:15 +0000 (19:33 +0000)]
LU-15581 misc: update .gitignore files

Ignore the binary for check_iam utility
in lustre/utils.

Also, ignore more files for commit messages.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I5b11dc2d2f3761f778549a121ac940edeeb70980
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49632
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
14 months agoLU-14707 tests: Prefer #!/bin/bash 79/49479/4
Timothy Day [Thu, 15 Dec 2022 06:19:01 +0000 (06:19 +0000)]
LU-14707 tests: Prefer #!/bin/bash

Change remaining #!/bin/sh to use bash.
Add a warning to the git-hook about using
sh in shebangs. Using bash allows scripts to
freely use bash-isms and lowers the risks
of bugs on Debian based platforms.

Also, change remaining callers to use bash
rather than sh.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I10f3e8f71435c38cfc1650dd13168d7ed5d3b31f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49479
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
14 months agoLU-16412 llite: check truncated page in ->readpage() 33/49433/6
Qian Yingjin [Mon, 19 Dec 2022 06:57:39 +0000 (01:57 -0500)]
LU-16412 llite: check truncated page in ->readpage()

The page end offset calculation in filemap_get_read_batch() was
off by one. This bug was introduced in commit v5.11-10234-gcbd59c48ae
("mm/filemap: use head pages in generic_file_buffered_read")

When a read is submitted with end offset 1048575, it calculates
the end page index for read of 256 where it should be 255. This
results in the readpage() call for the page with index 256 is over
stripe boundary and may not be covered by a DLM extent lock.

This happens in a corner race case: filemap_get_read_batch()
batches the page with index 256 for read, but later this page is
removed from page cache due to the lock protected it being revoked,
but has a reference count due to the batch.  This results in this
page in the read path is not covered by any DLM lock.

The solution is simple. We can check whether the page was
truncated and removed from page cache in ->readpage() by the
address_sapce pointer of the page. If it was truncated, return
AOP_TRUNCATED_PAGE to the upper caller.  This will cause the
kernel to retry to batch pages and the truncated page will not
be added as it was already removed from page cache of the file.

Add sanityn/test_95 to verify it.

Test-Parameters: testlist=sanityn env=ONLY=95 clientdistro=ubuntu2204
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I192df92b1d1b79057055430cc81cb7cc760cc9ed
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49433
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-15880 quota: fix insane grant quota 81/48981/11
Hongchao Zhang [Mon, 16 Jan 2023 02:21:09 +0000 (21:21 -0500)]
LU-15880 quota: fix insane grant quota

Fix the insane grant value in quota master/slave index,
the logs often contain the content similar to the following,

LustreError: 39815:0:(qmt_handler.c:527:qmt_dqacq0())
$$$ Release too much! uuid:work-MDT0000-lwp-MDT0002_UUID
release:18446744070274413724 granted:18446744070291193856,
total:4118877744 qmt:work-QMT0000 pool:0-dt id:40212 enforced:1
hard:128849018880 soft:12884901888 granted:4118877744 time:0
qunit: 16777216 edquot:0 may_rel:0 revoke:0 default:no

It could be caused by chgrp, which reserves quota before changing
GID for some file at MDT, then release the reserved quota after
the file GID has been changed on the corresponding OST, (this issue
is tracked at LU-5152 and LU-11303)

In some case, some quota could be released even the quota was not
reserved correctly, which cause the grant quota to be some negative
value, which is regarded as some insane big value because the type
of grant is "__u64", then the normal grant release will fail and
the grant field of some quota ID in the quota file (both at QMT and
QSD) contain insane value, but can't be reset correctly.

This patch resets the affected quota by clear the quota limits and
grant, and the grant will be reported by each QSD when the quota ID
is enforced again, then rebuild the grant at QMT.

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I083afa3b6648db5a1ccca0235667da022ff27e65
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48981
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16374 enc: align Base64 encoding with RFC 4648 base64url 81/49581/3
Sebastien Buisson [Sun, 18 Jul 2021 00:01:25 +0000 (19:01 -0500)]
LU-16374 enc: align Base64 encoding with RFC 4648 base64url

Lustre encryption uses a Base64 encoding to encode no-key filenames
(the filenames that are presented to userspace when a directory is
listed without its encryption key).
Make this Base64 encoding compliant with RFC 4648 base64url. And use
'+' leading character to distringuish digested names.

This is adapted from kernel commit
ba47b515f594 fscrypt: align Base64 encoding with RFC 4648 base64url

To maintain compatibility with older clients, a new llite parameter
named 'filename_enc_use_old_base64' is introduced, set to 0 by
default. When 0, Lustre uses new-fashion base64 encoding. When set to
1, Lustre uses old-style base64 encoding.

To set this parameter globally for all clients, do on the MGS:
mgs# lctl set_param -P llite.*.filename_enc_use_old_base64={0,1}

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaa2256da7fb591d842b5bb7aa474b2ee6de9899d
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49581
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16476 ldiskfs: Fix old ea inode handling 34/49634/5
Shaun Tancheff [Sat, 14 Jan 2023 07:13:01 +0000 (01:13 -0600)]
LU-16476 ldiskfs: Fix old ea inode handling

ext4-old_ea_inodes_handling_fix.patch is applicable
to all linux version 4.18 and higher.

Apply it to all the current 5* series

Test-Parameters: trivial
HPE-bug-id: LUS-11441
Fixes: 8da23f070c ("LU-15544 ldiskfs: SUSE 15 SP4 kernel 5.14.21 SUSE")
Fixes: 1819f6006f ("LU-15801 ldiskfs: Server support for RHEL9")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I1d22ed9d505e1bb407d9388cac9c881b366b96a8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49634
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16096 recovery: upgrade reply data after recovery finish 61/48261/15
Qian Yingjin [Fri, 19 Aug 2022 02:32:36 +0000 (22:32 -0400)]
LU-16096 recovery: upgrade reply data after recovery finish

As the batched RPC protocol will change the disk format of the
client reply data "REPLY_DATA" for recovery, thus we need to
handle the compatibility during upgrade carefully for this new
format change of the reply data.

The solution is as follows:
When the client recovery has finished, the target truncates the
reply data file with zero size and rewrite the header to use the
new magic and reply data record size.
And then new reply data records will be written in the new format.

Enable the test case conf-sanity/32, 108 as the compatibility issue
is fixed.

This patch also fixes the usage of struct lsd_reply_data in
lustre/utils/lr_reader.c to support both struct versions.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I26921d41915b8cad2d913e15f502f4543180c5c6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48261
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-11170 tests: add debugging to sanity/415 24/49724/4
Andreas Dilger [Fri, 20 Jan 2023 20:34:42 +0000 (13:34 -0700)]
LU-11170 tests: add debugging to sanity/415

Add a loop of renames without the concurrent 'touch' operation to
measure the test time, and then a second loop that has the 'touch'
so that we can see whether slow renames are because of COS (which
would make the test failure a kernel bug to be fixed), or because
the test is running in a VM and the server/disk is slow (which is
something to be fixed in the test (e.g. by making "slow" relative
to the non-touch baseline time).

Test-Parameters: trivial testlist=sanity env=ONLY=415,ONLY_REPEAT=120 mdscount=2 mdtcount=4
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic1a952be0b861005b46da3e673216e455f3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49724
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16464 osp: fix off-by-one errors in oxe_can_hold() 17/49617/4
Nikitas Angelinas [Fri, 6 Jan 2023 19:01:52 +0000 (21:01 +0200)]
LU-16464 osp: fix off-by-one errors in oxe_can_hold()

There are a couple of off-by-one errors when calculating the required
buffer size in oxe_can_hold(), which can cause the xattr entry to be
reallocated unnecessarily.

HPE-bug-id: LUS-11423
Fixes: a1c5adf7f466 ("LU-14607 osp: separate buffer for large XATTR")
Change-Id: I486963066d7f8783ad64f1ea110fb73db0a8274b
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49617
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16456 tests: skip conf-sanity test_129/132 in interop 01/49601/2
Andreas Dilger [Wed, 11 Jan 2023 19:02:04 +0000 (12:02 -0700)]
LU-16456 tests: skip conf-sanity test_129/132 in interop

test_129 was added in commit v2_14_56-40-gcefabee52
test_132 was added in commit v2_14_56-96-ge26d7cc39
They should be skipped for older MDS versions.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=122-133 serverversion=2.14.0
Fixes: cefabee52 ("LU-15112 mgc: do not ignore target registration failure")
Fixes: e26d7cc399 ("LU-14399 hsm: process hsm_actions in coordinator")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If1e276c816ecf2f30dc970f9b5afe85d722540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49601
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16461 kfilnd: Modify peer credits and RX buffers 92/49592/2
Chris Horn [Tue, 3 Jan 2023 17:03:38 +0000 (10:03 -0700)]
LU-16461 kfilnd: Modify peer credits and RX buffers

It's desirable to lower peer credits because smaller values allow us
to cancel outstanding traffic to down peers faster (because there is
less traffic in flight). Testing shows that peer_credits 16 does not
perform any worse than our current default. Let's make 16 the
new default.

In addition, testing shows a benefit for further increasing the
default number of immediate receive buffers. Increase this to 8.

HPE-bug-id: LUS-11421
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I877fe408b276071f33a99c8b3b50d13f597aaa29
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49592
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16451 kfilnd: Improve CQ error logging 89/49589/2
Chris Horn [Tue, 1 Nov 2022 19:39:39 +0000 (13:39 -0600)]
LU-16451 kfilnd: Improve CQ error logging

Improve CQ error logging for send events by printing the errno from
the CQ event as well as the provider error. This should allow us to
better root cause TN failures.

Also remove an extra newline character.

HPE-bug-id: LUS-11314
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I79bbe0312a9124dd34285d43b6e83f9d897923c1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49589
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-13530 build: Add kernel version to depmod 73/49573/4
Timothy Day [Fri, 6 Jan 2023 20:56:13 +0000 (20:56 +0000)]
LU-13530 build: Add kernel version to depmod

The depmod commands in the postrm and
postinst scripts should use the kernel
version the package is built against.
Otherwise, depmod will use the current
kernel version - which might be different.

This patch also adds a line indicating that
the file has been modified.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I355420a85ea0ed301433816588758197795b5ede
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49573
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16445 sec: make nodemap root squash independent of map_mode 61/49561/3
Sebastien Buisson [Thu, 5 Jan 2023 14:06:39 +0000 (15:06 +0100)]
LU-16445 sec: make nodemap root squash independent of map_mode

When the admin property is set to 0 on a nodemap, the root user must
be squashed, even if the map_mode property specifies to not map uids
or gids.

Enhance sanity-sec test_17 to exercise this use case.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1b41caa1ccc6e544ce9fac45b47d0c4c129221f7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49561
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16285 ldlm: send the cancel RPC asap 27/49527/8
Yang Sheng [Sat, 14 Jan 2023 17:56:14 +0000 (01:56 +0800)]
LU-16285 ldlm: send the cancel RPC asap

This patch try to send cancel RPC ASAP when bl_ast
received from server. The exist problem is that
lock could be added in regular queue before bl_ast
arrived since other reason. It will prevent lock
canceling in timely manner. The other problem is
that we collect many locks in one RPC to save
the network traffic. But this process could take
a long time when dirty pages flushing.

 - The lock canceling will be processed even lock has
   been added to bl queue while bl_ast arrived. Unless
   the cancel RPC has been sent.
 - Send the cancel RPC immediatly for bl_ast lock. Don't
   try to add more locks in such case.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie5efff3f1ed4e46448371185a0c08968233e7644
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49527
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16415 quota: enforce project quota for root 60/49460/7
Sergey Cheremencev [Sat, 17 Dec 2022 21:42:10 +0000 (01:42 +0400)]
LU-16415 quota: enforce project quota for root

Patch adds an option to enforce project quotas for root.
It is disabled by default, to enable set
osd-ldiskfs.*.quota_slave.root_prj_enable to 1
at each target where you need this option.

Patch also adds sanity-quota_1j to test new feature.

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I978dc8442235149794f85110309f63bc560defdc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49460
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16367 misc: remove deprecated code 38/49338/4
Andreas Dilger [Sat, 26 Nov 2022 06:46:32 +0000 (23:46 -0700)]
LU-16367 misc: remove deprecated code

Remove code that is or will become deprecated in this release based
on the LUSTRE_VERSION_CODE checks.

Fixes: 53fa817657 ("LU-12514 llite: move client mounting from obdclass to llite")
Fixes: 3919a282ca ("LU-15106 ofd: quiet deprecated param warning")
Fixes: 115bba9ffb ("LU-11110 ofd: remove obdfilter.*.* symlinks in few releases")
Fixes: 73f15ad0f1 ("LU-9378 utils: split getstripe and find from lfs.1")
Fixes: 88d8f0f86b ("LU-11891 utils: getstripe use --mdt-index consistently")
Fixes: 78be823f33 ("LU-15218 quota: delete unused quota ID")
Fixes: 0c5fbd80f1 ("LU-5969 lustreapi: replace llapi_get_version()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id59371084a102d6d8257c45b55d68077f2ce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16345 ofd: ofd_commitrw_read() with non-existing object 55/49255/6
Alex Zhuravlev [Mon, 28 Nov 2022 09:17:25 +0000 (12:17 +0300)]
LU-16345 ofd: ofd_commitrw_read() with non-existing object

a client can get evicted during OST_READ's bulk so it's LDLM
lock is cancelled and OST_DESTOY can remove the object.
ofd_commitrw_read() still needs to release the buffers and
ignore the object doesn't exist.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ibe9413de41c23b1b4f6d52e9b17a06590b3c0726
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49255
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16338 readahead: add stats for read-ahead page count 24/49224/4
Qian Yingjin [Wed, 23 Nov 2022 09:42:53 +0000 (04:42 -0500)]
LU-16338 readahead: add stats for read-ahead page count

This patch adds the stats for read-ahead page count:

lctl get_param llite.*.read_ahead_stats
llite.lustre-ffff938b7849d000.read_ahead_stats=
snapshot_time             4011.320890492 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              4011.320890492 secs.nsecs
hits                      4 samples [pages]
misses                    1 samples [pages]
zero_size_window          4 samples [pages]
failed_to_reach_end       1 samples [pages]
failed_to_fast_read       1 samples [pages]
readahead_pages           1 samples [pages] 255 255 255

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Iada06eb7d78ab28cfcc7167e49d25da252da4009
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16333 osc: page fault in osc_release_bounce_pages() 10/49210/4
Andriy Skulysh [Fri, 3 Jun 2022 12:28:14 +0000 (15:28 +0300)]
LU-16333 osc: page fault in osc_release_bounce_pages()

pga[i] can be uninitialized. It happens after following
code path in osc_build_rpc():

        OBD_SLAB_ALLOC_PTR_GFP(oa, osc_obdo_kmem, GFP_NOFS);
        if (oa == NULL)
                GOTO(out, rc = -ENOMEM);

Fixes: a9ed5b149646 ("LU-12275 sec: encryption for write path")
HPE-bug-id: LUS-10991
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Change-Id: I7e21cb9ab69f0bce9c1bdb53a4b0ac7a673887cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49210
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-15495 tests: fixed dbench test 88/49088/23
Alex Deiter [Wed, 9 Nov 2022 17:06:37 +0000 (21:06 +0400)]
LU-15495 tests: fixed dbench test

* Using awk to get list shared libraries
* Fixed shellcheck warnings

Test-Parameters: trivial testlist=sanity \
clientdistro=el7.9 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el8.6 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el8.6 clientarch=aarch64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el8.6 clientarch=ppc64le \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el9.0 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=sles12sp5 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=sles15sp4 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=ubuntu2004 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=ubuntu2204 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial env=SLOW=yes,ONLY=26 testlist=replay-dual

Test-Parameters: trivial env=SLOW=yes,ONLY=70b testlist=replay-single

Test-Parameters: trivial env=SLOW=yes,ONLY=8 testlist=sanity-pfl

Test-Parameters: trivial env=SLOW=yes,ENABLE_QUOTA=yes,ONLY=8 \
testlist=sanity-quota

Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Change-Id: Ic28bd67dcfb5ff24e65e33278ac867409a2c1cc6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49088
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16267 lnet: fix missing error check in LUTF 51/48951/2
Cyril Bordage [Tue, 25 Oct 2022 16:52:30 +0000 (18:52 +0200)]
LU-16267 lnet: fix missing error check in LUTF

In find_replace_file function, the file is opened with default
encoding option. If the file has a different encoding it will fail.
The solution is to use a try/except for UnicodeDecodeError and skip
bad encoded files.

Test-Parameters: @lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I9115d39414d31b628d550e8289b3193d13787288
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48951
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16228 utils: add lljobstat util 88/48888/28
Lei Feng [Mon, 17 Oct 2022 05:36:14 +0000 (13:36 +0800)]
LU-16228 utils: add lljobstat util

lljobstat util read datas from job_stats file(s),
parse, aggregate the data and list top jobs.

For example:
$ ./lljobstats -n 1 -c 3
---
timestamp: 1665984678
top_jobs:
- ll_sa_3508505.0: {ops: 64, ga: 64}
- touch.500:       {ops: 6, op: 1, cl: 1, mn: 1, ga: 1, sa: 2}
- bash.0:          {ops: 3, ga: 3}
...

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I0c4ac619496c184a5aebbaf8674f5090ab722d72
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48888
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16115 build: Linux 5.17 external module support 60/48360/9
Shaun Tancheff [Mon, 3 Oct 2022 07:51:16 +0000 (14:51 +0700)]
LU-16115 build: Linux 5.17 external module support

Linux commit v5.16-rc3-26-g129ab0d2d9f3

Added quotes around "$(CONFIG_CC_VERSION_TEXT)", however
.config stores CONFIG_CC_VERSION_TEXT with quotes thus
breaking the GNU make Makefile for external modules.

We need to supply a non-quoted value and override the
value in .config before it is used.

Test-Parameters: trivial
HPE-bug-id: LUS-11190
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I997b6987ef37a8c5b9d8f0984e81d9402a2ea705
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-13485 ldiskfs: Parallel configure tests for ldiskfs 51/38351/30
Shaun Tancheff [Wed, 7 Dec 2022 02:42:33 +0000 (20:42 -0600)]
LU-13485 ldiskfs: Parallel configure tests for ldiskfs

Transform the compile tests in ldiskfs to run in parallel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3a097ab5cd18b57e9311980d9aa708ed25f58464
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38351
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-13485 kernel: Parallel core configure tests 61/38361/37
Shaun Tancheff [Sun, 15 Jan 2023 02:42:31 +0000 (20:42 -0600)]
LU-13485 kernel: Parallel core configure tests

Transform the compile tests in lustre-core to run in parallel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I46fa852659eb4db86a12ec4ad3efddd0fdd3a655
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38361
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-13485 libcfs: Parallel configure tests for libcfs 49/38349/36
Shaun Tancheff [Sun, 15 Jan 2023 01:57:02 +0000 (19:57 -0600)]
LU-13485 libcfs: Parallel configure tests for libcfs

Transform the compile tests in libcfs to run in parallel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I76ab65558dd456dc08d6ef4a1985455ce1f17913
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38349
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16160 llite: SIGBUS is possible on a race with page reclaim 47/49647/2
Andrew Perepechko [Sun, 15 Jan 2023 16:55:58 +0000 (11:55 -0500)]
LU-16160 llite: SIGBUS is possible on a race with page reclaim

We can restart fault handling if page truncation happens
in parallel with the fault handler.

Change-Id: I6e60783e3334f87e799dc8b0e6e63d0bb358a236
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-16480 lov: fiemap improperly handles fm_extent_count=0 45/49645/5
Andrew Perepechko [Mon, 16 Jan 2023 13:13:34 +0000 (08:13 -0500)]
LU-16480 lov: fiemap improperly handles fm_extent_count=0

FIEMAP calls with fm_extent_count=0 are supposed only to
return the number of extents.

lov_object_fiemap() attempts to initialize stripe_last
based on fiemap->fm_extents[0] which is not initialized
in userspace and not even allocated in kernelspace.

Eventually, the call exits with -EINVAL and "FIEMAP does
not init start entry" kernel log message.

Fixes: 409719608c ("LU-11848 lov: FIEMAP support for PFL and FLR file")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: I65e706b5dd5c8a6db90a539c2602af839b4da823
HPE-bug-id: LUS-11443
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49645
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-9680 utils: new llapi_param_display_value(). 91/49491/7
James Simmons [Thu, 12 Jan 2023 14:02:54 +0000 (09:02 -0500)]
LU-9680 utils: new llapi_param_display_value().

Currently the special YAML handling is done in lustre_cfg.c
for param handling. Other functionality internal to
liblustreapi.so will use this as well so move the handling
internal to liblustreapi.so. Currently we only make the new
llapi_param_display_value() function visible only to the
liblustreapi internal code. Later when we support /sys access
we can make this available for general use.

The "lctl get_param" and "lctl list_param" generally worked
for non-root users, but not for parameters under
/sys/kernel/debug due to permission changes in the kernel.
We still lacked proper non-root access for lctl get_param and
lctl list_param. Implement full lctl get_param functionality
for non-root users. Also make lctl list_param work for
non-root users. These changes will also work with any
parameters implemented with Netlink.

Change-Id: Ifd9aad16decb0803a336314d4dea38664ff41aa4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-14393 recovery: reply reconstruction for batched RPCs 28/48228/13
Qian Yingjin [Tue, 16 Aug 2022 07:57:47 +0000 (03:57 -0400)]
LU-14393 recovery: reply reconstruction for batched RPCs

Batched RPC can boost the metadata performance for Lustre
dramatically. However, it also increases the complexity of the
recovery, such as how to reconstruct the reply in case of the RPC
resend if the reply was lost.

In this patch, it adds a new field @lrd_batch_idx in the data
structure @lsd_reply_data to store each slot of the "reply_data"
file:
struct lsd_reply_data {
__u64 lrd_transno; /* transaction number */
__u64 lrd_xid; /* transmission id */
__u64 lrd_data; /* per-operation data */
__u32 lrd_result; /* request result */
__u32 lrd_client_gen; /* client generation */
__u32 lrd_batch_idx; /* index in a batched RPC */
__u32 lrd_padding[7]; /* unused fields */
};

When found that a batched RPC was a resend RPC request, and if
the index of the sub request in the batched RPC is smaller or
equal than @lrd_batch_idx in the reply data, it means that the sub
request has already executed, the server will reconstruct the
reply for this sub request; if the index is larger than
@lrd_batch_idx, the server will re-execute the sub reqeust in the
batched RPC.

Disable conf-sanity/32{a,b,c,d,e,f,g}, 108{a,b} temporarily until
the compatibility issue during upgrade for new reply data format
is fixed.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Id48ecc263002cb783f5032642d05e1f3f6673837
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoLU-14393 protocol: basic batching processing framework 78/41378/17
Qian Yingjin [Mon, 1 Feb 2021 03:51:08 +0000 (11:51 +0800)]
LU-14393 protocol: basic batching processing framework

Batching processing can obtain boost performace. The larger the
batch size, the higher the latency for the entire batch. Although
the latency for the entire batch of operations is higher than the
latency of any single operation, the throughput of the batch of
operations is much high.

This patch implements the basic batching processing framework for
Lustre. It could be used for the future batching statahead and
WBC.

A batched RPC does not require that the opcodes of sub requests in
a batch are same. Each sub request has its own opcode. It allows
batching not only read-only requests but also multiple
modification updates with different opcodes, and even a mixed
workload which contains both read-only requests and modification
updates.

For the recovery, only the batched RPC contains a client XID,
there is no separate client XID for each sub-request. Although the
server will generate a transno for each update sub request, but
the transno only stores into the batched RPC (in @ptlrpc_body)
when the sub update request is finished. Thus the batched RPC only
stores the transno of the last sub update request. Only the
batched RPC contains the @ptlrpc_body message field. Each sub
request in a batched RPC does not contain @ptlrpc_body field.

A new field named @lrd_batch_idx is added in the client reply data
@lsd_reply_data. It indicates the sub request index in a batched
RPC. When the server finished a sub update request, it will update
@lrd_batch_idx accordingly.
When found that a batched RPC was a resend RPC, and if the index
of the sub request in the batched RPC is smaller or equal than
@lrd_batch_idx in the reply data, it means that the sub request has
already executed and committed, the server will reconstruct the
reply for this sub request; if the index is larger than
@lrd_batch_idx, the server will re-execute the sub request in the
batched RPC.

To simplify the reply/resend of the batched RPCs, the batch
processing stops at the first failure in the current design.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idaa814e82c968811bdda1c750b18c878b2c2ca67
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16452 kfilnd: Check replay deadline before send 93/49593/2
Chris Horn [Sat, 29 Oct 2022 22:30:17 +0000 (16:30 -0600)]
LU-16452 kfilnd: Check replay deadline before send

The LND timeout needs to account for the total time needed for bulk
operations to complete. On cassini, this can be ~120 seconds due to
the CXI retry-handler timeout on both the sender and target. i.e. LND
timeout is really the max round trip time, and (LND timeout)/2 is the
max one-way trip time.

When we replay a transaction we want to at least ensure we have enough
time to deliver the message to the receiver, as this gives us a
chance at still completing transactions. We should ensure that we
still have (LND timeout)/2 seconds remaining before posting a new
transaction.

Introduce kfilnd_transaction::tn_replay_deadline,
which is set to the transaction deadline minus (LND timeout)/2.

Check the replay deadline in kfilnd_tn_state_idle() before attempting
to post the transaction. If we've exceeded that deadline then fail
the transaction with -ETIMEDOUT and set a NETWORK_TIMEOUT health
status.

Modify the throttle check in kfilnd_tn_state_idle() to check
kfilnd_transaction::tn_replay_deadline instead of
kfilnd_transaction::deadline to determine when we should timeout
a transaction that is being throttled. Note, this check is switched
to using ktime_before() rather than ktime_after() since the case
is about checking whether we are currently before the deadline rather
than after it. The current code isn't wrong. It is just grammatically
awkward.

HPE-bug-id: LUS-11304
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1911d51cee4acea20577e3fc45c99b8948b79523
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49593
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16451 kfilnd: Throttle traffic to down peers 91/49591/2
Chris Horn [Fri, 28 Oct 2022 22:27:17 +0000 (16:27 -0600)]
LU-16451 kfilnd: Throttle traffic to down peers

If a transaction fails with -EHOSTUNREACH then this suggests the
target is actually down. We want to avoid consuming resources on the
local NIC by trying to send messages to down peers, so we will
require a hello handshake before injecting other new messages to a
peer we suspect is down.

Introduce a new kfilnd_peer state, KP_STATE_DOWN. Peers in either
KP_STATE_UPTODATE or KP_STATE_STALE can transition to KP_STATE_DOWN.
We'll transition a peer to KP_STATE_DOWN if we fail a transaction with
them and the errno we get is either -EHOSTUNREACH or -ENOTCONN.
kfilnd_peer_down() transitions a peer to KP_STATE_DOWN as appropriate.

Similar to stale peers, if we continue to fail transactions with peers
that are down then we want to eventually purge them from the peer
cache. This logic in kfilnd_peer_stale() is moved to
kfilnd_peer_purge_old_peer(), and this new function is called by both
kfilnd_peer_stale() and kfilnd_peer_down().

Introduce kfilnd_peer_needs_throttle() that determines whether we
should queue a message for future replay pending a successful
handshake. Integrate this into kfilnd_tn_state_idle() so that we queue
messages for peers in KP_STATE_DOWN in addition to peers in
KP_STATE_NEW. Modify debug statements in this area to remove redundant
info and reflect that we can hit these conditions for down peers, not
just new peers.

Introduce kfilnd_peer_tn_failed() to interpret the errno for a
transaction failure and call kfilnd_peer_down() or kfilnd_peer_stale()
as appropriate. This function replaces all existing calls to
kfilnd_peer_stale(). kfilnd_peer_stale() is now a static function.

HPE-bug-id: LUS-11314
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I206075c3a1b2836715dc79b49b098dab51c6bb94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49591
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16450 kfilnd: Cancel TNs if handshake fails 90/49590/2
Chris Horn [Tue, 25 Oct 2022 19:21:17 +0000 (13:21 -0600)]
LU-16450 kfilnd: Cancel TNs if handshake fails

When sending a message to a new peer a HELLO is sent first and the
original message waits for the handshake to complete. If the HELLO
fails to be sent then the original message will continue to wait for
the full LND timeout. When we retry the original message we should
check whether there is actually an outstanding HELLO. If not, then
this indicates the HELLO failed and we should cancel the TN.

HPE-bug-id: LUS-11310
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4ed07964d5af0bcc3bdca33c1ea46fd436af2e98
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49590
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16455 tests: recovery-small test_139() fix 79/49579/5
Elena Gryaznova [Sun, 8 Jan 2023 18:05:19 +0000 (19:05 +0100)]
LU-16455 tests: recovery-small test_139() fix

mds device calculated before stop () can not be used
after stop() because of a device-mapper device is removed and
facet device is restored:
  stop () ->
    elif dm_flakey_supported $facet; then
      if [[ -n ${!failover_host} &&
               ${!failover_host} != ${!host} ]]
         dm_cleanup_dev $facet ->
               unexport_dm_dev $facet

Without this fix test_139 fails on failover setup:
  losetup: /dev/mapper/mds1_flakey: failed to set up loop device:
    No such file or directory

To reproduce the failure just run:
  sh llmountcleanup.sh
  ONLY=139 sh recovery-small.sh
on failover setup where mds1_HOST != mds1failover_HOST

Fixes: 4597fa7d88 ("LU-13061 osp: check catlog FID after reading in")
Test-Parameters: trivial testlist="recovery-small" failover=true iscsi=1 \
  env=ONLY=139,SLOW=yes mdssizegb=10 clientcount=4 osscount=2 mdscount=2 \
  mdtcount=2 austeroptions=-R
Test-Parameters: trivial testlist="recovery-small" failover=true iscsi=1 \
  env=FAILURE_MODE=HARD,ONLY=139,SLOW=yes mdssizegb=10 clientcount=4 \
  osscount=2 mdscount=2 mdtcount=2 austeroptions=-R
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
HPE-bug-id: LUS-10912
Change-Id: I67d98f633de4023a4430b55c6b4d308c7f17d988
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49579
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-2771 ldlm: remove obsolete LDLM_FL_SERVER_LOCK 63/49563/2
Andreas Dilger [Thu, 5 Jan 2023 22:44:50 +0000 (15:44 -0700)]
LU-2771 ldlm: remove obsolete LDLM_FL_SERVER_LOCK

The LDLM_FL_SERVER_LOCK flag and accompanying accessor macros have
never been used since they were first introduced.  Remove them.
It looks like this may have been duplicated by LDLM_FL_NS_SRV.

Test-Parameters: trivial
Fixes: caa55aec4a ("LU-2771 dlmlock: compress out unused space")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iffc9b126334a327a9054f9acae86f4a0d03ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-13642 lnet: modify lnet_inetdev to work with large NIDS 25/49525/5
James Simmons [Thu, 5 Jan 2023 17:59:24 +0000 (12:59 -0500)]
LU-13642 lnet: modify lnet_inetdev to work with large NIDS

Change li_ipv6 field in struct lnet_inetdev to li_size which
now represents the size of the NID address. This will work
with the GUID of Inifiniband as well. Second change is
to store li_ipaddr always in network format. This will allow
direct comparsion between li_ipaddr and the nid_addr of
struct lnet_nid. We will ensure AF_IB will also be in the
same format as what will be stored in struct lnet_nid.
Implement setup with a NID address for the ko2iblnd LND driver.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7c27edb67263dd5bda4728c536aee266d38a4592
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16380 osd-ldiskfs: race in OI mapping 14/49514/2
Lai Siyao [Sat, 17 Dec 2022 13:06:16 +0000 (08:06 -0500)]
LU-16380 osd-ldiskfs: race in OI mapping

There is race in OI scrub thread and OI mapping entry insertion, which
may add an inconsistent OI mapping entry, but not started OI scrub
thread. This may lead to osd_fid_lookup() always returns -EINPROGRESS.

To avoid such race, osd_fid_lookup() returns -EINPROGRESS only when
OI mapping is inconsistent, and OI scrub thread is not running.

Fixes: 558784caad ("LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs")
Test-Parameters: mdscount=2 mdtcount=4 testlist=conf-sanity env=ONLY=108b,ONLY_REPEAT=50
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I05114b6a33940c210e9952f6e24f6c36fd7f76a2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16335 mdt: skip target check for rm_entry 29/49329/6
Lai Siyao [Wed, 7 Dec 2022 02:53:25 +0000 (21:53 -0500)]
LU-16335 mdt: skip target check for rm_entry

For "lfs rm_entry", target may not exist, sanity check of it may fail
thus causes rm_entry fail.

Add sanity 832.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I824c7581af05c7494cf03c0c9bc999ca1abfec01
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49329
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16302 llite: Use alloc_inode_sb() to allocate inodes 70/49070/9
Shaun Tancheff [Fri, 2 Dec 2022 09:46:31 +0000 (03:46 -0600)]
LU-16302 llite: Use alloc_inode_sb() to allocate inodes

linux-commit: v5.17-49-g8b9f3ac5b01d
  fs: introduce alloc_inode_sb() to allocate filesystems specific
      inode

Filesystems are expected to use alloc_inode_sb to allocate inodes
for proper lru handling.

Test-Parameters: trivial
HPE-bug-id: LUS-11332
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie6f091a01df33738ed2ef6f7fef9c1f9c1a51e03
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
15 months agoLU-16459 tests: fix YAML verification function 84/49584/4
Lei Feng [Tue, 10 Jan 2023 08:51:27 +0000 (16:51 +0800)]
LU-16459 tests: fix YAML verification function

YAML verification function is not correct in tests.
Fix it and change test case accordingly.

Fixes: bedb797c5d ("LU-16110 lprocfs: make job_stats and rename_stats valid YAML")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I109e2294aea3d1bffa08e6d2c39a5911fa8ef7df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49584
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16239 tests: do not cleanup clients dirs 70/48870/4
Elena Gryaznova [Fri, 14 Oct 2022 14:51:03 +0000 (17:51 +0300)]
LU-16239 tests: do not cleanup clients dirs

Patch adds the ability to not remove the clients
directories. Let's just rename them if CLEANUP set to
false.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11158
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ibc55d32ef4946a62b00dcbf745567c123650ced9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48870
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16214 kfilnd: Proactively handshake old peers 86/48786/4
Chris Horn [Mon, 22 Aug 2022 19:43:36 +0000 (13:43 -0600)]
LU-16214 kfilnd: Proactively handshake old peers

If asked to send a message to a peer that we haven't communicated with
for some time, then we run the risk of that peer having a stale
(or missing) peer entry for us. This can result in the target peer
silently dropping our message. To reduce the chance of this happening
proactively handshake any peer we haven't talked to in the last 2x LND
timeouts.

Note, kfilnd_peer_needs_hello() is called on both the send and receive
path. We only want to proactively handshake on the send path, so an
argument is added to this function so it can distinguish between the
two situations.

HPE-bug-id: LUS-11125
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaacb48e5c45305869bd22335ce112b21cf67e848
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48786
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
15 months agoLU-16214 kfilnd: Keep stale peer entries 85/48785/4
Chris Horn [Fri, 19 Aug 2022 20:27:26 +0000 (14:27 -0600)]
LU-16214 kfilnd: Keep stale peer entries

A peer is currently removed from the cache whenever there is a network
failure associated with the peer. This leads to situations where
incoming messages from that peer will be dropped until a handshake
can be completed.

If we instead keep these stale peer entries then we at least have a
chance of completing future transactions with the peer.

To accomplish this, we introduce states to struct kfilnd_peer.

When a kfilnd_peer is newly allocated it is assigned a state of
KP_STATE_NEW. kfilnd_peer_is_new_peer() is modified to check for this
state rather than check if kp_version is set.

When a handshake is completed the peer is assigned a state of
KP_STATE_UPTODATE.

When a peer that is up-to-date experiences a failed network operation
then it is assigned a state of KP_STATE_STALE. kfilnd_peer_stale() is
introduced to set this state. Existing callers of kfilnd_peer_down()
are converted to call kfilnd_peer_stale(). kfilnd_peer_down() is
renamed to kfilnd_peer_del().

We will initiate a handshake to any peer that is in either
KP_STATE_NEW or KP_STATE_STALE. kfilnd_peer_needs_hello() is
modified accordingly.

struct kfilnd_peer::kp_last_alive is checked by kfilnd_peer_stale().
If we haven't heard from a stale peer within five LND timeout periods,
then that peer is deleted.

An additional kfilnd_peer_alive() call is added to
kfilnd_tn_state_idle() for the TN_EVENT_RX_HELLO case, so that
peer aliveness is updated when we receive a hello request or response.

HPE-bug-id: LUS-11125
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Icfb722e58fa334d983df02742dc456a55ac2abc3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48785
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
15 months agoLU-16213 kfilnd: Finalize replay TNs with deleted peer 84/48784/4
Chris Horn [Mon, 15 Aug 2022 21:06:25 +0000 (15:06 -0600)]
LU-16213 kfilnd: Finalize replay TNs with deleted peer

If there are transactions on the replay queue awaiting a hello
response, and the peer is marked for removal (e.g. because the hello
TN failed) then let's finalize those TNs right away rather than wait
for them to hit the timeout.

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6dc77cadaf850ab9ec37bf50241074bc3f5650b5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48784
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
15 months agoLU-16213 kfilnd: Allow one HELLO in-flight per peer 83/48783/4
Chris Horn [Fri, 19 Aug 2022 19:48:37 +0000 (13:48 -0600)]
LU-16213 kfilnd: Allow one HELLO in-flight per peer

Allow one HELLO message to be in-flight, per peer, at one time.
Accomplished by adding a flag to struct kfilnd_peer to indicate
whether a hello request has been sent to the peer. Cleared if the
send fails or when the hello response is received.

To detect situation where hello response is never received we add
kp_hello_ts to struct kfilnd_peer to record timestamp of when the
hello request was sent. If this is more than LND timeout seconds in
the past then we may send another hello.

Fix return value of kfilnd_send() when we're unable to allocate a
kfilnd_tn for the hello.

There's some code duplication with updating a peer based on hello
request and response. Consolidate processing of these hello messages
into a single function.

A race exists where a peer can be marked for removal in between a call
to kfilnd_peer_needs_hello() and the call to kfilnd_tn_alloc() inside
kfilnd_send_hello_request(). This would cause a hello request to be
sent to a new peer, created by kfilnd_peer_get() inside
kfilnd_tn_alloc(), without properly setting the kp_hello_pending flag
on that new peer. To avoid this situation, introduce
kfilnd_tn_alloc_for_peer() which takes a struct kfilnd_peer pointer
as an argument to assign to kfilnd_transaction::tn_kp. Use this to
allocate the kfilnd_transaction for the hello request inside
kfilnd_send_hello_request().

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6bb0928a629cb398c270366fae6d1040ad67df3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
15 months agoLU-16213 kfilnd: Fail sends of particular message type 82/48782/4
Chris Horn [Fri, 19 Aug 2022 16:48:01 +0000 (10:48 -0600)]
LU-16213 kfilnd: Fail sends of particular message type

Add ability to use failure injection to specify a message type for
simulated failure.

For example, to simulate failure of all immediate messages:
 lctl set_param fail_loc=0xF114 fail_val=1

To simulate failure of a single hello request:
 lctl set_param fail_loc=0x8000F114 fail_val=4

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4a20e92826df75812ef5b81979944526e4b94d83
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
15 months agoLU-16213 kfilnd: Add peer info to some debug statements 81/48781/4
Chris Horn [Thu, 18 Aug 2022 19:02:18 +0000 (13:02 -0600)]
LU-16213 kfilnd: Add peer info to some debug statements

Add kfilnd_peer pointer address to some debug statements.

Use 0x%llx format consistently when printing kfilnd_peer::kp_addr

Also add the message type to the TN debug macro.

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4410ca9215f9d0a6eb65e6d4f953234fa7fba5ea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48781
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16213 kfilnd: Rename struct kfilnd_peer members 80/48780/4
Chris Horn [Thu, 11 Aug 2022 19:03:04 +0000 (13:03 -0600)]
LU-16213 kfilnd: Rename struct kfilnd_peer members

Prefix members of struct kfilnd_peer with kp_ to make these variable
names easier to find.

Also use 'kp' as a standard name for pointers to struct kfilnd_peer
instead of 'peer' (again to make these pointers easier to find). As
such, struct kfilnd_transaction::peer is also renamed to
struct kfilnd_transaction::tn_kp.

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Change-Id: Id535c7af28a5335026037a55920c706a4e16f947
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-15163 osd: osd_obj_map_recover() to restart transaction 68/45368/8
Alex Zhuravlev [Tue, 26 Oct 2021 08:38:50 +0000 (11:38 +0300)]
LU-15163 osd: osd_obj_map_recover() to restart transaction

osd_obj_map_recover() stops transaction when need to call
vfs_link() and it has to start a new transaction to modify
filesystem.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6efe5444ddc959b19092bebc6e3c7dc25a29cea1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45368
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-14692 tests: allow FID_SEQ_NORMAL for MDT0000 93/46293/27
Li Dongyang [Tue, 25 Jan 2022 00:53:33 +0000 (11:53 +1100)]
LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000

Fix the tests asssuming objects created for MDT0000
always have a seq number of 0, to prepare for
deprecating IDIF sequence.

Fix sanity test_312 on ZFS to properly identify which
OST the object was created on, and re-enable it.

Test-Parameters: testlist=sanity env=ONLY="39r 312"
Test-Parameters: testlist=sanity-scrub env=ONLY=19
Test-Parameters: testlist=sanity-sec env=ONLY=37
Change-Id: I4bffabe25a6f84cdba760aabea1da3429715a283
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46293
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16460 lnet: validate data sent from user land properly 88/49588/6
James Simmons [Thu, 12 Jan 2023 14:34:10 +0000 (09:34 -0500)]
LU-16460 lnet: validate data sent from user land properly

Testing with improper setting from user land exposed some bugs in
the kernel's code handling of these cases. For tunables sent from
user land we need to do proper range checking. An improper cast
in the new Netlink tunables code preventing setting the default
LND tunable settings. Also silently ignore trying to set LND
tunables when its not supported. We shouldn't stop NI setup in
this case. Lastly setup the NI tunables to -1 when user land
doesn't provide any input. This tells the LND driver to use it
default values for the tunables. Resolve a double free when
setting up a NI with a non-existing interface. Another fix is for
net locking in lnet_net_cmd().

For lnetctl fix the YAML handling when only conns_per_peer is
requested. I only tested conns_per_peer and NI tunables changes
together before which missed the mentioned case.

Fixes: 8f8f6e2f3 ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7c5e993de57e3d674ecb8e3cc1bd62506470d416
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49588
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-14598 tests: skip conf-sanity test_122b in interop 83/49583/3
Andreas Dilger [Mon, 9 Jan 2023 23:27:46 +0000 (16:27 -0700)]
LU-14598 tests: skip conf-sanity test_122b in interop

Code was fixed in 2.15.0.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=122b serverversion=2.14.0
Fixes: 747fed818b ("LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6d9480f4b43706b597df6bd74c65959776cf2b5b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16160 revert: "llite: clear stale page's uptodate bit" 41/49541/4
Bobi Jam [Tue, 3 Jan 2023 05:57:24 +0000 (13:57 +0800)]
LU-16160 revert: "llite: clear stale page's uptodate bit"

This reverts commit 5b911e03261c3de6b0c2934c86dd191f01af4f2f
which caused a bug in cl_page_own() race with ll_releasepage()
and cl_pagevec_put() assertion failure.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Icdb8c60f4d992c9976670e1b06c5bab5ef3a3954
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49541
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-6142 osc: tidy up osc_init() 58/49458/2
Mr. NeilBrown [Tue, 20 Dec 2022 17:03:32 +0000 (12:03 -0500)]
LU-6142 osc: tidy up osc_init()

A module_init() function that registers the services
of the module should do that last, after all other
initialization has succeeded.
This patch moves the class_register_type() call to the
end and ensures everything else that might have been
set up, is cleaned up on error.

Linux-commit: e67f133d02e ("staging: lustre: osc: tidy up osc_init()")

Change-Id: I2a5ffb116c6d7c33a4530bab6e89a5ffe6117cea
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-10003 lnet: use Netlink to support LNet ping commands 60/49360/6
James Simmons [Wed, 4 Jan 2023 00:28:55 +0000 (19:28 -0500)]
LU-10003 lnet: use Netlink to support LNet ping commands

Completely replace the old pre-MR ping command ioctl using
Netlink which will also handle large NIDs. We do update
IOC_LIBCFS_PING_PEER, which only supports only small NIDs,
so older tools will keep working.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Change-Id: Ic82a18dc38e4bd4e78bf61da766f7a847da509a8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-6142 ldlm: use list_first_entry in ldlm_lock 59/49359/3
Mr. NeilBrown [Sat, 10 Dec 2022 14:27:08 +0000 (09:27 -0500)]
LU-6142 ldlm: use list_first_entry in ldlm_lock

This make the code (slightly) more readable.

Linux-commit: ef7e70a ("staging: lustre: ldlm: use list_first_entry in ldlm_lock")

Change-Id: If9789fef1dec55d08dec25819aaf5152946819c5
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-6142 ldlm: tidy list walking in ldlm_flock() 58/49358/2
Mr. NeilBrown [Sat, 10 Dec 2022 14:17:46 +0000 (09:17 -0500)]
LU-6142 ldlm: tidy list walking in ldlm_flock()

Use list_for_each_entry variants to
avoid the explicit list_entry() calls.
This allows us to use list_for_each_entry_safe_from()
instread of adding a local list-walking macro.

Also improve some comments so that it is more obvious
that the locks are sorted per-owner and that we need
to find the insertion point.

Linux-commit: 3ac5a67 ("staging: lustre: ldlm: tidy list walking in ldlm_flock()")

Change-Id: Ie9a756a898a9c58db1b4f446694603a4efa37352
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16369 ldiskfs: do not check enc context at lookup 24/49324/3
Sebastien Buisson [Tue, 6 Dec 2022 16:36:02 +0000 (17:36 +0100)]
LU-16369 ldiskfs: do not check enc context at lookup

On rhel8, ldiskfs should not check for encryption context of inodes
upon lookup. On these kernels, ext4 is not encryption aware, so just
assume context is fine when target is mounted as ldiskfs.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9f9813d290ea24b34f710e2c8219e856ca8fbc58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49324
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-8915 lnet: migrate LNet selftest group handling to Netlink 14/49314/5
James Simmons [Wed, 14 Dec 2022 19:31:26 +0000 (14:31 -0500)]
LU-8915 lnet: migrate LNet selftest group handling to Netlink

Replace the LSTIO_GROUP_LIST and LSTIO_GROUP_INFO ioctls with a Netlink
backend. Make this transitition transparent to the user. Be aware this
newer version of lnet_selftest.ko doesn't support older versions of the
lst tool. While the old interface allows only setting one group up at
a time the Netlink interface can be used to setup many groups at one
time. Currently we don't change the interface to handle larger NIDs but
this new interface will allow us to use the new NID format in a follow
on patch.

Change-Id: I18f07b380d353425c6e127e4fbd0f30e41f66944
Test-Parameters: trivial testlist=lnet-selftest
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16444 enc: null-enc names cannot be digested form 50/49550/4
Sebastien Buisson [Wed, 4 Jan 2023 15:10:02 +0000 (16:10 +0100)]
LU-16444 enc: null-enc names cannot be digested form

When encrypted files have their names encrypted, long names are in
digested form in case access is done without the encryption key. The
digest is base64-encoded, and prepended with '_'.
With null encryption for file names, names are always plain text. In
this case, a legitimate '_' at the start of a name must not be
interpreted as a digested form.

sanity-sec test_54 is improved to test the case of a file whose name
starts with '_'.

Fixes: f18c87cb53 ("LU-13717 sec: handle null algo for filename encryption")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idaad186afd06cfbabbe1d13e78f083d12876c8ff
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-16026 llite: always enable remote subdir mount 35/48535/7
Lai Siyao [Sun, 28 Aug 2022 19:33:29 +0000 (15:33 -0400)]
LU-16026 llite: always enable remote subdir mount

For historical reason, ROOT is revalidated with IT_LOOKUP in
.permission to ensure permission is update to date because ROOT is
never looked up. But ROOT FID and layout is not changeable, it's
PERM lock that should be revalidated, i.e., revalidate with
IT_GETATTR instead of IT_LOOKUP.

Since PERM|UPDATE lock is on the MDT where object is located, client
can cache this lock, therefore remote subdir mount doesn't need to
lookup ROOT in each file access.

Deprecate mdt.*.enable_remote_subdir_mount.

Per http://review.whamcloud.com/19195, replace 'df' with 'lfs df' in
sanity 228b since the former doesn't support transparent recovery.

Add sanity 247h.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I66f8ee347f6c01a8a154245b10a1d93539ea13b8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
15 months agoLU-14824 Revert "test: sanity 413a/b unlink timeout" 46/49646/4
Andreas Dilger [Mon, 16 Jan 2023 17:58:24 +0000 (17:58 +0000)]
LU-14824 Revert "test: sanity 413a/b unlink timeout"

This reverts commit 5ff3e400f1a74ea49b7eb9cf19715f0fae08c3f5.
The test_413a is timing out regularly for ldiskfs MDTs.

Change-Id: Iafd28ec648f0b30b3c9e48e8f8479979a8cb0d60
Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 fstype=ldiskfs testlist=sanity env=ONLY="413a 413b"
Test-Parameters: mdscount=2 mdtcount=4 fstype=zfs testlist=sanity env=ONLY="413a 413b"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49646
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>