Whamcloud - gitweb
fs/lustre-release.git
17 months agoEX-7601 ofd: create read mapping for read-modify-write
Patrick Farrell [Sun, 5 Nov 2023 15:55:29 +0000 (10:55 -0500)]
EX-7601 ofd: create read mapping for read-modify-write

When we need to do a read-modify-write for unaligned writes
to a compressed file, it's important we read only the
portion of the file which is receiving unaligned IO.

This patch identifies these chunks in preprw_write and
creates a read lnb mapping from a subset of the pages for
write.  These pages we read up are then decompressed.

Note one issue this patch does not address is reading of
data past EOF.  If the final chunk is unaligned, we will
round the write to cover it.  This results in extending the
file inappropriately, writing zeroes where they aren't
needed.  The read side gives us the info to address this,
which we will do in a future patch.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iede43f12127cbb93e73c22a915192aa2f814a927
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52997
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 ofd: distinguish nr_write and nr_read
Patrick Farrell [Fri, 3 Nov 2023 20:29:51 +0000 (16:29 -0400)]
EX-7601 ofd: distinguish nr_write and nr_read

We will have two counts of pages in lnbs, distinguish
between them.

Not actually used yet - will be calculated when the read
lnb mapping is created.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I709b8fd299163d348a196184152bb0294fcb650b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52985
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 ofd: add read lnb to ofd_preprw_write
Patrick Farrell [Fri, 3 Nov 2023 20:22:22 +0000 (16:22 -0400)]
EX-7601 ofd: add read lnb to ofd_preprw_write

The read phase of read-modify-write for compressed files
needs to read only a subset of the pages which will be
written, so it needs a separate set of lnb pointers for
tracking this subset.

This patch passes around the necessary argument but does
not set up or use the lnb yet.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I7ec7101e65e73a6c9e67cea3c58d8cace38e70e0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52984
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoLU-17370 utils: simplify lfs help text
Alexandre Ioffe [Thu, 21 Dec 2023 06:53:42 +0000 (22:53 -0800)]
LU-17370 utils: simplify lfs help text

Simplify help text for lfs getstripe and lfs setstripe.
Update corresponding man pages lfs-getstripe and lfs-setstripe.
On man pages make left side adjustment and disable hyphenation:
'.nh', '.ad l' to prevent hyphenation of keywords

Lustre-change: https://review.whamcloud.com/53564
Lustre-commit: TBD (from 6c3dae58eddc2e3c7caf35599733b2e59ebeb657)

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: Iae9d3534230ee7d325fbeffd78b5c12632a4a161
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53523
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17186 utils: replace gethostby*() with get*info()
Sebastien Buisson [Fri, 8 Dec 2023 19:53:12 +0000 (11:53 -0800)]
LU-17186 utils: replace gethostby*() with get*info()

This patch replaces the deprecated gethostbyname() and
gethostbyaddr() functions with getaddrinfo() and getnameinfo()
functions respectively.

The getaddrinfo() function combines the functionality provided by the
gethostbyname() and getservbyname() functions into a single interface,
but unlike the latter functions, getaddrinfo() is reentrant and allows
programs to eliminate IPv4-versus-IPv6 dependencies.

The getnameinfo() function is the inverse of getaddrinfo(): it
converts a socket address to a corresponding host and service, in a
protocol-independent manner. It combines the functionality of
gethostbyaddr() and getservbyport(), but unlike those functions,
getnameinfo() is reentrant and allows programs to eliminate
IPv4-versus-IPv6 dependencies.

Lustre-change: https://review.whamcloud.com/52632
Lustre-commit: TBD (from 99687573d33336a153c1a5b94a4b66ebbcc2d0f1)

Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iacb5583826cd2f7329455bc6cbb4477f9087f15a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53386
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17261 lov: ignore broken components
Alex Zhuravlev [Sun, 5 Nov 2023 13:51:29 +0000 (16:51 +0300)]
LU-17261 lov: ignore broken components

if some component of a mirrored file is broken, it makes sense
to try another (possible valid) replica rather than give up
immediately.

Lustre-change: https://review.whamcloud.com/52996
Lustre-commit: 902fe290e51dccdee89380fb725ae6e3c1802e2b

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I32ea0efa90109f5159bf8b6c4e0efe1d543580c3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53542
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoRM-620 build: New tag 2.14.0-ddn126
Andreas Dilger [Fri, 29 Dec 2023 11:20:01 +0000 (04:20 -0700)]
RM-620 build: New tag 2.14.0-ddn126

New tag 2.14.0-ddn126

Signed-off-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: I67599c5b0918be8761e0561cc9c60e39f171196f

17 months agoLU-16859 lnet: incorrect check for duplicate NI
Serguei Smirnov [Tue, 31 Oct 2023 21:11:54 +0000 (14:11 -0700)]
LU-16859 lnet: incorrect check for duplicate NI

When NI is being added to an existing LNet, checking against
existing NI interface names currently fails if the new NI
happens to use interface name which is a prefix of one used
by an existing NI.

The following example assumes ib0 and its alias ib0:1 are
configured:

lnetctl net add --net o2ib --if ib0:1
lnetctl net add --net o2ib --if ib0

Fix this by making sure interface strings are compared properly
regardless of relative length.

Lustre-change: https://review.whamcloud.com/52918
Lustre-commit: 7dcdb9eb0ded98e956fe417abbd835433a8de3f0

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I0d4047118e7d9982fa791a2e324a27aa5d4abaee
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53527
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoLU-16552 test: add new lnet test for Multi-Rail setups
James Simmons [Sun, 13 Aug 2023 15:02:33 +0000 (11:02 -0400)]
LU-16552 test: add new lnet test for Multi-Rail setups

You can crash lnet kernel module by setting up a interface with
lctl net up and then attempting to setup the interface with
the import function. This is due to improper clearing the net_cpts
array.

Currently sanity-lnet.sh doesn't real test MR setups. Because of
this a few bugs slipped in. Add two new test to ensure MR setups
behave properly. Test 107 is to see if deleting a second interface
for a MR setup doesn't crash a node. Test 108 creates a multi rail
setup of a tcp LNet net with two interfaces, one real and the
other fake. A bug was preventing the second fake interface from
being added.

Lustre-change: https://review.whamcloud.com/50302
Lustre-commit: 8785f25b053c69b4303e901c6c8dc5d0d4d6dfc1

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: Ic69e14bd0617f4d6fe931140b5b6d43b795843cf
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53529
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoLU-12031 mdt: explicit data version of DoM files
Mikhail Pershin [Mon, 25 Apr 2022 06:13:53 +0000 (09:13 +0300)]
LU-12031 mdt: explicit data version of DoM files

Use EA to store 'data_version' for DoM files explicitly.

Unlike OST objects the 'inode_version' of DoM file is changed
by metadata operations as well and that leads to problems
during HSM operations, e.g. writing HSM EA with file data
version inside causes DoM object version update making this
HSM EA version obsoleted, also any metadata update on
restored file makes it dirty and prevents second release.

DoM files have now explicitly updated 'data_version' in
addition to ordinary 'inode_version'. The 'data_version'
is updated along with 'inode_version' upon write/truncate and
fallocate operations and is stored as 'trusted.dataver' EA.
Layout swap procedure is updated to move data version between
files being swept along with HSM attributes.
If DoM file is migrated to RAID0 file then 'dataver' EA is
deleted.

Corresponding test 1f is added to sanity-hsm.sh and
207j to sanity.sh.

Lustre-change: https://review.whamcloud.com/47139
Lustre-commit: aae3289adb2bbc192870f195b78044484f717e16

Test-Parameters: clientversion=2.12.4 testlist=sanity-hsm
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I4689c56394c7323d32cd6f7dd86f58beb6e53353
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53214
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-12998 mds: add no_create parameter to stop creates
Andreas Dilger [Sat, 23 Apr 2022 00:10:36 +0000 (18:10 -0600)]
LU-12998 mds: add no_create parameter to stop creates

Add an target tunable parameter and mount option "no_create" to
disable new *directory* creation on an MDT.  This sends the
flag OS_STATFS_NOCREATE to the clients, and the DNE MDT space
balance will avoid selecting that MDT when creating a new
subdirectory, without disabling access to existing files/dirs.

This allows "soft disabling" an MDT in advance of storage
upgrades to minimize new directories and files created on that
MDT, reduce future migration, and/or backup/restore workload.

As yet it does not totally disable *file* creation on the MDT,
but it may be extended to do so in the future.

This is analogous to the "no_precreate" option that was added
on the OSTs, and "no_create" has been added to the OSTs for
consistency ("no_precreate" is kept for compatibility for now).

lod_declare_create() checks whether directory create target MDT is
current MDT, this may happen if nocreate is set on some MDT. Upon
such mismatch, call dt_statfs() to fetch latest statfs to know
whether nocreate is set.

lmv_create() will choose another MDT if target MDT is set with
nocreate, but in case the flag is cleared, call obd_statfs() to fetch
cached statfs and check again.

Lustre-change: https://review.whamcloud.com/47124
Lustre-commit: 1dbcd0bab881fac38d8a5e4ef1559f12618f8f0e
Lustre-change: https://review.whamcloud.com/53437
Lustre-commit: 066262a04cb8e0cbf49a20b7bf036d4484399afe (TBD)

Test-Parameters: testlist=conf-sanity env=ONLY=112b,ONLY_REPEAT=50
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I53cfb48ade2f844b18bfc630e7fcea6de9ce7057
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53189
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17263 utils: 'lfs find -blocks' to use 512-byte units
Andreas Dilger [Sun, 5 Nov 2023 05:32:19 +0000 (23:32 -0600)]
LU-17263 utils: 'lfs find -blocks' to use 512-byte units

Change the default units for 'lfs find -blocks' from 1KiB blocks
to 512-byte blocks to better match the behavior of find(1).  This
also matches what "-printf %b" will print.

Change llapi_parse_size() to accept a 'c' argument to specify
characters, and accept a "B" or "iB" suffix if provided.

Lustre-change: https://review.whamcloud.com/52993
Lustre-commit: 869ea3211d2f15d7c674bc10e5f1a3272e44504e

Fixes: c043f46025 ("LU-10705 utils: add "lfs find --blocks"")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8345f15bf53912501cadc0fa7f981a9f787b767
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53522
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17349 tests: sanity-quota_81 decrease timeout
Sergey Cheremencev [Sun, 3 Dec 2023 04:06:23 +0000 (07:06 +0300)]
LU-17349 tests: sanity-quota_81 decrease timeout

Decrease cfs fail timeout in sanity-quota_81 from 30
to 10 seconds to avoid soft lockup.

Lustre-change: https://review.whamcloud.com/53384
Lustre-commit: b58219ef1edebcb266cbe0dfede491ba5de491d1

Fixes: 862f0baa7c21 ("LU-15097 quota: stop pool_recalc before killing pool")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I8630db7b3948b335fef5d5349f960f79cb877fc3
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53516
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17358 lprocfs: make job_stats job_id valid yaml
Nathaniel Clark [Tue, 12 Dec 2023 18:05:22 +0000 (13:05 -0500)]
LU-17358 lprocfs: make job_stats job_id valid yaml

Fix quoting job_id to account for leading '@' being reserved.

Test-Parameters: trivial
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Change-Id: Ifce3edc9b636db2f059ab9960488972a152d2e7a
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/53424
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53519

17 months agoEX-8780 test: wait osts up after restart
Hongchao Zhang [Sat, 9 Dec 2023 20:43:49 +0000 (04:43 +0800)]
EX-8780 test: wait osts up after restart

In test_18e of sanity-lfsck, the OSTs could not be ready on all MDTs
and the LFSCK status will be incorrect because the LFSCK notify can
not be sent to all OSTs.

Change-Id: If1ed5d920d5c8b99d42f59f92a1e245a9e2a8267
Test-Parameters: trivial testlist=sanity-lfsck,sanity-lfsck,sanity-lfsck,sanity-lfsck,sanity-lfsck
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53531
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoLU-17289 test: disable sanity/test_906 temporarily
Qian Yingjin [Thu, 7 Dec 2023 09:45:01 +0000 (04:45 -0500)]
LU-17289 test: disable sanity/test_906 temporarily

On the rhel9.3, the fio io_uring engine testing failed with error
"Operation not permitted" on both local file systems (Ext4 and
xfs) and Lustre:

    "fio: pid=4551, err=1/file:engines/io_uring.c:1047,
    func=io_queue_init, error=Operation not permitted"

This is a generic failure in RHEL9.3.  Thus we disable
sanity/test_906 temporarily until the bug is fixed in RHEL9.3.

Lustre-change: https://review.whamcloud.com/53362
Lustre-commit: TBD (from 0eef4b0818e7a1a42a54333fa713ef660c7e9404)

Test-Parameters: trivial
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I3805b475c5f3d0b62dc6c57c4cd93f2bc1b67b76
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53546
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17000 gss: Fix Out-of-bounds access under svcgssd_proc.c
Arshad Hussain [Wed, 1 Nov 2023 06:50:53 +0000 (12:20 +0530)]
LU-17000 gss: Fix Out-of-bounds access under svcgssd_proc.c

Problem reported by coverity was passing 32bit type and
then dereferencing to larger 64bit under function
handle_channel_request(). This patch address this issue.

Since this is an uapi and to catch corner cases like
kernel modules being updated separately from user tools
RSI_DOWNCALL_MAGIC is also changed from 0x6d6dd62a to
0x6d6dd63a.

This patch also changes 32bit member (sid_hash) of
'struct rsi_downcall_data' to 64bit. Which also requires
changing of wiretest.c and wirecheck.c

Lustre-change: https://review.whamcloud.com/52920
Lustre-commit: 7d764f1f11be144ad26e33aa8cecedc5bb708793

CoverityID: 404758 ("Out-of-bounds access")
Fixes: 4daf43ac3c ("LU-17015 gss: support large kerberos token for rpc sec init")
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8041cd4063f1b1cefdebf5681df426be61820f99
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53440
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-14918 osd: don't declare similar ldiskfs writes twice
Alex Zhuravlev [Tue, 7 Dec 2021 08:13:54 +0000 (11:13 +0300)]
LU-14918 osd: don't declare similar ldiskfs writes twice

in some cases (like overstriping) the same operations can be
declared multiple times (new llog records) and this lead to
huge number of credits and performance degradation. we can
avoid this checking for duplicate declarations.
As every declaration would need an allocation, limit the scope
of this checks to transaction likely to be large.

% of "large" transaction in sanity-benchmark, depending on threshold:

  creates < 5 && writes < 5:
  0.58% (mds1) and  2.97% (mds2)

  create < 7 & writes < 7:
  0.58% and 2.4%

  create < 9 & writes  < 9:
  0.6% and 1.85%

  create < 10 & write2 < 10:
  0.0004% and 0.000001%

thus 10 creates or writes is selected as a threshold to enable this
logic.

Lustre-change: https://review.whamcloud.com/45765
Lustre-commit: 9e6225b2e7385cbb7be0474df01075fafc4966d5

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7c893fe3b95646b4b813b999bc832659dfcf03ad
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53485
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoEX-7601 ofd: add decompress_read to ofd_preprw_write
Patrick Farrell [Fri, 3 Nov 2023 18:20:37 +0000 (14:20 -0400)]
EX-7601 ofd: add decompress_read to ofd_preprw_write

We have read up the compressed data from disk, now we must
decompress it so we can rewrite it successfully.

This code still works on the whole lnbs rather than just on
the portion of it which is unaligned.  This is temporary
and will be resolved by a future patch.

With this patch, we have basic read-modify-write support,
so we can re-enable testing.  The next patch adds tests
for read-modify-write.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ib6503c15e9fb3d425a7bc295bcc61b41c089a1f0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52983
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 ofd: add read to write process
Patrick Farrell [Thu, 2 Nov 2023 22:03:15 +0000 (18:03 -0400)]
EX-7601 ofd: add read to write process

This adds a very simple read to the write process, which
just reads up the entire chunk-rounded write range.

This is a first step - the read will eventually be modified
to only read the unaligned portions which must be
decompressed for read-modify-write.  We will create a
special lnb mapping which contains only the pages which must
be read for decompression (similar to the tx lnb mapping).

For now, this read allows us to test decompression without
handling the mapping.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I169ddc2e161094aebdad1a60ec62e9c1d75cd6d8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 ofd: add chunk rounding to write
Patrick Farrell [Thu, 2 Nov 2023 21:40:38 +0000 (17:40 -0400)]
EX-7601 ofd: add chunk rounding to write

For compressed files, we need to round all niobufs to
chunk size in the write process, so we have buffers for
reading in and rewriting the complete chunks.

dt_bufs_get sets up the local niobuf for the write, so we
round before calling it.

Note this breaks writing to compressed files, which is not
fixed until a few patches later.  For this reason, we
disable the compression tests.  They will be reenabled
shortly - similar to how we handled the read series.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I413aaba9866dd7d6c4463fa620eadf1423379ba1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52963
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 vvp: remove unaligned write restriction
Patrick Farrell [Fri, 3 Nov 2023 16:29:58 +0000 (12:29 -0400)]
EX-7601 vvp: remove unaligned write restriction

This series will resolve the unaligned write issue for
compressed files, so we need to remove the restriction on
unaligned writes in order to test it.

This does not mean unaligned writes are working yet, but we
need to make this change so the subsequent patches can be
tested.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4abcbdcd18b00718099483c8dfdb9a7aa41c3ce7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52981
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 ofd: switch preprw to chunk_bits
Patrick Farrell [Fri, 3 Nov 2023 16:33:20 +0000 (12:33 -0400)]
EX-7601 ofd: switch preprw to chunk_bits

The compression/decompression code requires chunk_bits
rather than chunk size.  Since we need to call this code
from ofd_preprw_write, we need chunk_bits there.

This modifies the functions so chunk_bits is available
there.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id98e6d6364eeaaa7753a8aba059387e3e659d2a2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52982
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 tgt: add tx lnb for writes
Patrick Farrell [Thu, 2 Nov 2023 21:55:01 +0000 (17:55 -0400)]
EX-7601 tgt: add tx lnb for writes

With compression, the lnbs used for the disk IO on the
server can contain more data than the client requested,
due to reading up whole chunks for decompression.

This means the client is only going to write data in to
a subset of the lnbs used for io to storage.

We handle this the same way we do for reads:
We create a second set of lnbs just for the transfer, and
point these lnbs at the pages which will actually receive
data from the client.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I5b668547537698309792daf309842866be79f0b6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52965
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 tgt: add remote_pages for writes
Patrick Farrell [Thu, 2 Nov 2023 21:49:00 +0000 (17:49 -0400)]
EX-7601 tgt: add remote_pages for writes

When we round a write to get all of the compressed chunks,
the number of local and the number of remote pages will
differ.  We need to make sure we do the checksum and data
transfer using the number of remote pages, not the number of
local pages.

This patch calculates the number of remote pages and uses it
accordingly.

Note that just like on the read side, this patch doesn't do
anything until we're actually rounding the chunks for IO in
a later patch.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I38256070d68246613ce67b0bfe328f6443a95533
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52964
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 ofd: round range locking
Patrick Farrell [Mon, 20 Nov 2023 00:18:37 +0000 (19:18 -0500)]
EX-7601 ofd: round range locking

The range locking in OFD needs to be rounded for
compressed chunks.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I530d7f655a1c09033b1a3668c009072874ab1d18
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53178
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 tgt: round write lock to chunk
Patrick Farrell [Thu, 2 Nov 2023 21:31:41 +0000 (17:31 -0400)]
EX-7601 tgt: round write lock to chunk

For unaligned writes, we need to round the write locking to
cover the any leading or trailing chunks.  We do this by
creating a local 'remote niobuf' to describe the rounded
range and doing the locking against that niobuf.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I2bdea620386ad229375647a0e2cc6180c9bd7aa6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52961
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 tgt: identify writes to round
Patrick Farrell [Thu, 2 Nov 2023 21:26:58 +0000 (17:26 -0400)]
EX-7601 tgt: identify writes to round

If the beginning or end of a client write is unaligned, we
must round the locking.  This patch identifies writes where
this is required, the next patch will do the locking.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iec140c24423a0da478f6d42ff6fc620d7ad3ba4a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52960
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoEX-7601 ofd: clear pages in decompression
Patrick Farrell [Thu, 30 Nov 2023 19:59:07 +0000 (14:59 -0500)]
EX-7601 ofd: clear pages in decompression

Handling writes to compressed files requires a
read-modify-write cycle, which has implications for how we
handle reads.

Consider the case of a file with an 8 KiB write at offset 0,
which is compressed to 4 KiB.  Then there is another 4 KiB
write at offset 16 KiB.

Updating this correctly requires reading the first chunk,
then decompressing it.  However, this read will go past
EOF, because the write has not occurred yet.  The OSD read
code does not fill in these pages, because read past EOF is
not returned to the client (client gets a short read and
does not actually use the pages).

In our case, however, we must use these pages (from 8 KiB
16 KiB).  In the naive version without recompression, we
simply write out 0 - 16 KiB, so we must have zeroes in
those pages, and once we have recompression, we must
compress those pages so we need zeroes in that case too.

So we note if a page has data in it after decompression,
then if it does not, we clear the page.  Note we do NOT set
lnb_rc to 0 when we clear a page, because lnb_rc = 0 is
used to indicate EOF rather than a gap in the file.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If1d1360185eb087e821167a08e49c9427e29ffc4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53302
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 obd: do not decompress empty lnbs
Patrick Farrell [Tue, 12 Dec 2023 20:54:37 +0000 (15:54 -0500)]
EX-7601 obd: do not decompress empty lnbs

For reads which cross EOF, we may get lnbs with no data in
them (similarly for writes which cross EOF).

For these cases, it's important to only copy from the lnbs
where there is data, and only do decompression on the lnbs
if there's actually data in them.

Modify merge chunk to do this.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I83fefcfa6d1396dcd97fad994334bf29438bb4bf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53430
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7601 obd: add error check in merge_chunk
Patrick Farrell [Wed, 29 Nov 2023 01:45:43 +0000 (20:45 -0500)]
EX-7601 obd: add error check in merge_chunk

If the lnbs we're trying to merge have an error recorded in
them, then they're not going to be valid input for
decompression, so return an error.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1bf17131cb65106087eb5e72e2700db30c0cc975
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53274
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoEX-7644 mmap: add mmap support for compression
Patrick Farrell [Wed, 25 Oct 2023 16:15:59 +0000 (12:15 -0400)]
EX-7644 mmap: add mmap support for compression

This removes the EOPNOTSUPP for compression with mmap and
adds an mmap sanity test for compression.  This patch
removes all the restrictions for mmap, but we actually only
have unaligned read support right now, so the test is
deliberately simplified to only test reads.

A more complicated version which also tests mmap writes
comes later in the series, once read-modify-write is
supported.

The test tests mmap by copying data at several different
block sizes with several different compression chunk sizes.

Test-Parameters: testlist=sanity-compr env=ONLY="1003",ONLY_REPEAT=10
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4a37b106831a903d90e8a8871e9a93baac4e201e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52280
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
17 months agoLU-17317 gss: no cache flush for rsi and rsc
Sebastien Buisson [Tue, 5 Dec 2023 16:02:21 +0000 (17:02 +0100)]
LU-17317 gss: no cache flush for rsi and rsc

RPCSEC init and RPCSEC context caches hold gss-related information
of security contexts established between network peers. These cache
entries are tightly coupled with contexts handled in the sptlrpc layer
so they must not be purged directly. They are inserted into the cache
when sptlrpc security contexts are established, and removed when the
corresponding security contexts are destroyed.

Lustre-change: https://review.whamcloud.com/53377
Lustre-commit: 3615fa4a86be793652d53c94818c5aeb81e2257e

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Fixes: 4daf43ac3c ("LU-17015 gss: support large kerberos token for rpc sec init")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I903f75a4b5229286fcaed3e9d96b5eee7f653f15
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53334
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17015 gss: remove legacy sunrpc-cache based gss caches
Sebastien Buisson [Thu, 14 Sep 2023 12:23:07 +0000 (14:23 +0200)]
LU-17015 gss: remove legacy sunrpc-cache based gss caches

Now that GSS caches are based on Lustre's internal upcall cache
mechanism, we can remove the legacy ones based on the sunrpc cache
implementation, as this code is unused.

We can also remove support for updated get_expiry() in Linux 6.3, as
this function is no longer used.

Lustre-change: https://review.whamcloud.com/52376
Lustre-commit: 8665ba238412f407963724413e137b89d5cd384f

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I98d8777d225c723ae061ef360011abfc092e09d8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Xing Huang <hxing@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53443
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17015 gss: avoid request replay
Sebastien Buisson [Fri, 13 Oct 2023 15:19:16 +0000 (17:19 +0200)]
LU-17015 gss: avoid request replay

Lustre's upcall cache has a retry mechanism in case the upcall was
interrupted or failed and we timed out waiting. In this case we do our
best to retry and do the upcall again.
But when the upcall cache is used for GSS contexts, the upcall cannot
be done twice with same data. The GSSAPI implements security measures
that forbids that kind of request replay, to prevent man-in-the-middle
attacks for instance.

Add a new uc_acquire_replay field to struct upcall_cache, so that
upcall cache users can tell if acquire upcall can be replayed.
For identity upcall, this replay is fine. But for GSS contexts we need
to avoid those replays.
And bump upcall cache timeout value from 20s to 30s for GSS context
init requests.

Also add more debug messages to gss code for both client and server
sides, and both kernel and userspace.

Lustre-change: https://review.whamcloud.com/52689
Lustre-commit: d0194a4b5f6efa26d5473c2793b525f5fdb77e67

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I56decc83a4f0d21be420e87cb0417826011932af
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53255
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17015 gss: support large kerberos token for rpc sec ctxt
Sebastien Buisson [Thu, 7 Sep 2023 07:33:36 +0000 (09:33 +0200)]
LU-17015 gss: support large kerberos token for rpc sec ctxt

If the current Kerberos setup is using large token, like when PAC
feature is enabled for Kerberos, authentication can fail due to server
side unable to exchange token between kernel and userspace.
This limitation is inherent to the sunrpc cache mechanism, that can
only handle tokens up to PAGE_SIZE.

For RPC sec context phase, use Lustre's upcall cache mechanism
instead of deprecated kernel's sunrpc cache. Note this phase does not
involve a proper upcall, only the downcall part is relevant to
populate the context computed in userspace.

Lustre-change: https://review.whamcloud.com/52305
Lustre-commit: 473a41fec6fb600c9b6e26010d88772f5252d1e1

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I94e945a99cab60d5b6a4c40076c40fffede217ab
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53254
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-17317 gss: do not continue using expired reverse context
Sebastien Buisson [Fri, 8 Dec 2023 08:05:04 +0000 (09:05 +0100)]
LU-17317 gss: do not continue using expired reverse context

In case a server uses an expired gss context to send a callback
request to a client, it might be that the associated context on
the client has already expired, and been purged from the cache.
This results in a GSS_S_NO_CONTEXT reply.
In this specific scenario, the server must mark its reverse context
as dead. This will lead to destruction of the expired context, and
creation of a new context suitable for further callback requests.

Lustre-change: https://review.whamcloud.com/53375
Lustre-commit: TBD (65f91673262098aa6d97448f68a036b0f2cdfd98)

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4af90cd70a3815851ec555ea85b49714c8da4202
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53369
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoRM-620 build: New tag 2.14.0-ddn125
Andreas Dilger [Wed, 20 Dec 2023 08:55:47 +0000 (01:55 -0700)]
RM-620 build: New tag 2.14.0-ddn125

New tag 2.14.0-ddn125

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic2e4b53c8540ffd039359565c06294645a62d328

17 months agoLU-13791 mdt: parameter to tune capabilities
Andreas Dilger [Tue, 19 Dec 2023 02:07:58 +0000 (19:07 -0700)]
LU-13791 mdt: parameter to tune capabilities

Add mdt.*.enable_cap_mask to allow specific capabilities to
be enabled and disabled individually.

Fixes: f05edf8e2b ("LU-13791 sec: enable FS capabilities")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6fc0130a90693d673d8c2158e7e31c2de951553d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53500
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
18 months agoRM-620 build: New tag 2.14.0-ddn124
Andreas Dilger [Tue, 19 Dec 2023 06:10:54 +0000 (23:10 -0700)]
RM-620 build: New tag 2.14.0-ddn124

New tag 2.14.0-ddn124

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib274a425044bfbb22bc40bd51ccfda06ad6ba8b0

18 months agoLU-930 docs: fix whatis output
Timothy Day [Sun, 12 Mar 2023 15:19:54 +0000 (15:19 +0000)]
LU-930 docs: fix whatis output

The ".SH NAME" section has to be formatted in a certain
way for whatis and apropos to work correctly. Otherwise,
users will just see "(unknown subject)".

This patch fixes issues for all man pages.

Add a couple of one-line man page redirects.

Lustre-change: https://review.whamcloud.com/50264
Lustre-commit: 17bbf5bdd6f96f61dc0e39924dce540e91e1422c

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ie11eb921c84ff9ad19b50973c616f6fb6df1f461
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53474
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-12837 doc: add lfs-changelog* manpages
Etienne AUJAMES [Tue, 22 Nov 2022 12:39:25 +0000 (13:39 +0100)]
LU-12837 doc: add lfs-changelog* manpages

This patch moves the documentation for "lfs changelog" and "lfs
changelog_clear" utilities from "lfs.1" to the following manpages:
- lfs-changelog.1
- lfs-changelog_clear.1

Lustre-change: https://review.whamcloud.com/49209
Lustre-commit: 82e7ad348c77e5c164aa3e3155c9eb91872369d5

Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Signed-off-by: Xing Huang <hxing@ddn.com>
Test-Parameters: trivial
Change-Id: I6db2e687e506a6116fe4755358a9abbd5509c3bb
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Olaf Faaland <faaland1@llnl.gov>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53471
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-14651 build: fix build for el7.9 kernels
Andrew Perepechko [Mon, 18 Dec 2023 18:19:26 +0000 (11:19 -0700)]
LU-14651 build: fix build for el7.9 kernels

Handle extra setattr_prepare() argument added in Linux 5.12 kernels
when building on older kernels.

Lustre-change: https://review.whamcloud.com/53503
Lustre-commit: TBD (from cc03199c61df217f7da249d9f9f3419e0333c671)

HPE-bug-id: LUS-12059
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: Ie7fd1c4d51b7a9b086cfca0db941321cbcce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53494
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn123
Andreas Dilger [Fri, 15 Dec 2023 03:52:43 +0000 (20:52 -0700)]
RM-620 build: New tag 2.14.0-ddn123

New tag 2.14.0-ddn123

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I33af32d0a44376aee90286496939c4bcb114abd8

18 months agoLU-17366 kernel: update SLES15 SP5 [5.14.21-150500.55.39.1]
Jian Yu [Thu, 14 Dec 2023 19:38:42 +0000 (11:38 -0800)]
LU-17366 kernel: update SLES15 SP5 [5.14.21-150500.55.39.1]

Update SLES15 SP5 kernel to 5.14.21-150500.55.39.1 for Lustre client.

Lustre-change: https://review.whamcloud.com/53467
Lustre-commit: TBD (from 7084f80ec256f6a7335fe4d5981db1e8bcbed440)

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=sles15sp5 testlist=sanity

Change-Id: Id9476e8726728b00d4079cdaf31b081f89190eb1
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53468
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8779 build: kernel-abi-whitelists is not required
Jian Yu [Thu, 14 Dec 2023 17:07:48 +0000 (09:07 -0800)]
EX-8779 build: kernel-abi-whitelists is not required

This patch fixes build dependency issue with
kernel-abi-whitelists, which is not required.

Test-Parameters: trivial

Change-Id: I3f8ad51a0ccab5c994d472d62934670b497c1454
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53448
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn122
Andreas Dilger [Thu, 14 Dec 2023 14:04:42 +0000 (07:04 -0700)]
RM-620 build: New tag 2.14.0-ddn122

New tag 2.14.0-ddn122

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I95bfa9740603cb10e34dfa347c9112d67c29764a

18 months agoLU-16456 tests: skip conf-sanity test_129 in interop
Andreas Dilger [Thu, 14 Dec 2023 03:14:00 +0000 (20:14 -0700)]
LU-16456 tests: skip conf-sanity test_129 in interop

test_129 was added in commit v2_14_56-40-gcefabee52
It should be skipped for older MDS versions.

Lustre-change: https://review.whamcloud.com/49601
Lustre-commit: 7e566c6a1f9d5324718ebc7149153f3272363b9c

Test-Parameters: trivial testlist=conf-sanity env=ONLY=129 serverversion=EXA6.2.0
Fixes: cefabee52 ("LU-15112 mgc: do not ignore target registration failure")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If1e276c816ecf2f30dc970f9b5afe85d722540e5
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53452
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
18 months agoEX-7601 tests: unaligned read tests
Patrick Farrell [Tue, 12 Dec 2023 15:00:41 +0000 (10:00 -0500)]
EX-7601 tests: unaligned read tests

This adds testing for handling unaligned reads and partial
chunk reads from compressed files.

Testing for writes and multi-client and racing tests will
be added later, but we put the checking function in
test-framework now so it's easy to use later.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-compr env=ONLY="1001 1002",ONLY_REPEAT=10
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I06217c8aeba75016aa4168f329026842dff1d979
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/51841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17265 tests: allow margin for sanity/39r
Arshad Hussain [Wed, 8 Nov 2023 06:38:07 +0000 (12:08 +0530)]
LU-17265 tests: allow margin for sanity/39r

The timestamp may be little outdated due to a gap between
writing a file and checking the timestamp, so take that into
consideration and allow 2 second leniency when comparing
timestamps.

The on-disk inode may also not be flushed from the journal
immediately, so allow some time for it to be updated.

This patch also converts the hex value read via debugfs
to decimal.

Lustre-change: https://review.whamcloud.com/53035
Lustre-commit: c5aa16db172afc9cbf0d4fd2c85261fef1a40d7b

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9e765f9cd572fb25821f9a0401c34209b7c3f574
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53453
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
18 months agoLU-17360 kernel: update RHEL 9.3 [5.14.0-362.13.1.el9_3]
Jian Yu [Wed, 13 Dec 2023 08:32:22 +0000 (00:32 -0800)]
LU-17360 kernel: update RHEL 9.3 [5.14.0-362.13.1.el9_3]

Update RHEL 9.3 kernel to 5.14.0-362.13.1.el9_3 for Lustre client.

Lustre-change: https://review.whamcloud.com/53433
Lustre-commit: TBD (from 3662949bcd342a96f8dddcb6663872e870f9871b)

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 clientdistro=el9.3 testlist=sanity
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-3

Change-Id: I35863d298a612d7913d39f9031e792808f204ad4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53435
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-16397 test: check quota setting on QSD
Hongchao Zhang [Tue, 12 Dec 2023 10:37:22 +0000 (18:37 +0800)]
LU-16397 test: check quota setting on QSD

In some case, the quota setting at QMT could not be transfered to
QSD in time, which could cause the test to fail.
This patch adds check on QSD after setting the quota limit by LFS.

Lustre-change: https://review.whamcloud.com/49533/
Lustre-commit: TBD (from 76a7ad75740639b9255c51277ff65ce261379af6)

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Ia999317a36a0f97c1f66726cdc10e9edac3d8a53
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17057 tests: Fix sanity-sec/0
Arshad Hussain [Tue, 21 Nov 2023 15:10:51 +0000 (20:40 +0530)]
LU-17057 tests: Fix sanity-sec/0

Command executed through 'runas' on failure breaks
out of running test script. While this failure is
expected. The setting of 'set -e' forces the pipeline
to exit the running script immediately. This patch
fixes this by checking the return value and then
taking the appropriate action.

This patch also fixes 'touch' command to file f4 by
correctly calling it via uid and gid as it was set
few lines above.

Lustre-change: https://review.whamcloud.com/53194
Lustre-commit: 0b5e252d973e00200660a81f1cdb440f8f4f1886

Test-Parameters: trivial testlist=sanity-sec env=ONLY=0,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I06e6d22840e31add8c24cf90c31b98464d580ae7
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53439
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17203 libcfs: ignore remaining items
Alex Zhuravlev [Tue, 17 Oct 2023 11:48:58 +0000 (14:48 +0300)]
LU-17203 libcfs: ignore remaining items

remove the assertion checking libcfs hashtable for emptiness
in cfs_hash_for_each_empty(). the only user of this hashtable
is per-export ldlm locks set. in this case it's legal that
some locks can't be removed from the hashtable being in the
process of enqueuing. the hashtable is destroyed from the
export destroy function which in turn is called only when all
RPCs on this export are done (exp_rpc_count==0).

Lustre-change: https://review.whamcloud.com/52726
Lustre-commit: f2f8b6deaf54f1a264b31b44f6cf875fa1629ab2

Fixes: 306a9b666e ("LU-16272 libcfs: cfs_hash_for_each_empty optimization")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2b853b017bb7247a0c60cc8f464c2e08d649f0eb
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16272 libcfs: cfs_hash_for_each_empty optimization
Alexander Zarochentsev [Thu, 20 Oct 2022 19:23:39 +0000 (22:23 +0300)]
LU-16272 libcfs: cfs_hash_for_each_empty optimization

Restarts from bucket 0 in cfs_hash_for_each_empty()
cause excessive cpu consumption while checking first empty
buckets.

Lustre-change: https://review.whamcloud.com/48972
Lustre-commit: 306a9b666e5ea2882f704d93483355e7e147544f

HPE-bug-id: LUS-11311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ic03875ea25101052468213043128912ac46daf32
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17015 sec: fix PTLRPC_CTX_STATUS_MASK
Sebastien Buisson [Wed, 11 Oct 2023 13:29:46 +0000 (15:29 +0200)]
LU-17015 sec: fix PTLRPC_CTX_STATUS_MASK

PTLRPC_CTX_STATUS_MASK should not include PTLRPC_CTX_NEW_BIT, which is
a bit index and not a value. Also, according to code in
sptlrpc_req_refresh_ctx():
if (unlikely(test_bit(PTLRPC_CTX_NEW_BIT, &ctx->cc_flags))) {
   if (ctx->cc_ops->refresh)
      ctx->cc_ops->refresh(ctx);
}
a context needs to be refreshed if it has the PTLRPC_CTX_NEW_BIT bit.
So the function to check if context is refreshed, cli_ctx_is_refreshed
should not return true if the PTLRPC_CTX_NEW_BIT bit is set.

In the end, do not replace PTLRPC_CTX_NEW_BIT with anything else in
PTLRPC_CTX_STATUS_MASK. Having PTLRPC_CTX_NEW_BIT was a no-op (bitwise
OR with 0), but this was working as expected. Just cleanup the code to
avoid headaches.

Lustre-change: https://review.whamcloud.com/52629
Lustre-commit: c744221a1fd55df33ca2b0e3e1b1ffd7ef3a986d

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibc2ca9dfaa176b098080f7f2867338b62953b50e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53441
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-14104 tests: sanity/123* shouldn't fail performance checks
Alex Zhuravlev [Mon, 2 Nov 2020 07:13:44 +0000 (10:13 +0300)]
LU-14104 tests: sanity/123* shouldn't fail performance checks

running in VMs as CPU resource isn't strictly guaranteed usually.

Lustre-change: https://review.whamcloud.com/40512
Lustre-commit: b1915f13e3b69c72e3e4c1f2a32d022b6a20d347

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ieec4a89b921f7ccc198eb10513d4980ad3a20b51
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53456
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-8151 obd: Show correct shadow mountpoints for server
Arshad Hussain [Thu, 14 Dec 2023 08:04:25 +0000 (00:04 -0800)]
LU-8151 obd: Show correct shadow mountpoints for server

server_fill_super_common() preps the server for mounting
and forces "Read only" (SB_RDONLY) flag to restrict IO on
the server. This when running the mount command reflects
FS always as "ro" although they are "rw"

This patch double checks the obd statfs (FS) state for
"read only" flag (OS_STATFS_READONLY) and if not found
to be really "read only" toggles (removes) SB_RDONLY flag.

The client output remains unchanged.

Output before patch:
/dev/.../mds1_flakey on /mnt/lustre-mds1 type lustre (ro,svname=...)
/dev/.../ost1_flakey on /mnt/lustre-ost1 type lustre (ro,svname=...)

Output after patch:
/dev/.../mds1_flakey on /mnt/lustre-mds1 type lustre (rw,svname=...)
/dev/.../ost1_flakey on /mnt/lustre-ost1 type lustre (rw,svname=...)

Test case conf-sanity/113 added.

Lustre-change: https://review.whamcloud.com/47131
Lustre-commit: 0171801df517988b0eb1023378c2c8c07a0a36f1

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie92a686ae97dd62885f415b453bad6bdc0ed3d28
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53445
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
18 months agoLU-17347 debs: also move .ddeb files into debs/
Aurelien Degremont [Fri, 8 Dec 2023 12:34:09 +0000 (13:34 +0100)]
LU-17347 debs: also move .ddeb files into debs/

When building debian packages, the resulting packages are
moved into a 'debs/' subdir.

Don't miss the debug symbol packages 'dbgsym', which are
suffixed .ddeb.

Also add .buildinfo file.

Lustre-change: https://review.whamcloud.com/53378/
Lustre-commit: TBD

Test-Parameters: trivial
Change-Id: I52d0bddfaafc67c4a2a2dbc786d7f320c0b979f8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17336 gss: fix __user pointer in rsi_upcall_seq_write
Sebastien Buisson [Wed, 6 Dec 2023 08:15:18 +0000 (09:15 +0100)]
LU-17336 gss: fix __user pointer in rsi_upcall_seq_write

rsi_upcall_seq_write() uses sscanf to get the string passed from
userspace, but this needs to be copied to a kernel buffer first.

Lustre-change: https://review.whamcloud.com/53342
Lustre-commit: TBD (from 523ffed1cb43eec5fac38c144967026308da9cad)

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2ec875b7c6c158695857fe912ec1dd9f41ddc25d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn121
Andreas Dilger [Tue, 12 Dec 2023 05:53:22 +0000 (22:53 -0700)]
RM-620 build: New tag 2.14.0-ddn121

New tag 2.14.0-ddn121

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9925c0d0e75c01a5e0a97bf34f8386efa2da8fbe

18 months agoRM-620 build: New tag lipe-2.38
Andreas Dilger [Tue, 12 Dec 2023 05:53:04 +0000 (22:53 -0700)]
RM-620 build: New tag lipe-2.38

New tag lipe-2.38

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I62e6034c466f566015d82da9b9a3a4e4c50cc4fb

18 months agoEX-8590 lipe: Use SSH poll to read stdout/err unblocking
Alexandre Ioffe [Sat, 25 Nov 2023 08:36:42 +0000 (00:36 -0800)]
EX-8590 lipe: Use SSH poll to read stdout/err unblocking

Limit to use only one client machine for hot-pools tests 75*
Fix skip condition for tests 75a,b,c when bandwidth limit
options are not available.
Use ssh poll and unblocking read to read stdout/err in loop
to prevent losing the output when it is not ready.

Test-Parameters: trivial testlist=hot-pools
Test-Parameters: testlist=hot-pools env=ONLY=75a,ONLY_REPEAT=82
Test-Parameters: testlist=hot-pools env=ONLY=75b,ONLY_REPEAT=82
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ibe07cdd51197c1f3c048b7fcdab6caff850067e7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53288
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9]
Jian Yu [Thu, 7 Dec 2023 07:45:47 +0000 (23:45 -0800)]
LU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9]

Update RHEL 8.9 kernel to 4.18.0-513.9.1.el8_9 for Lustre client.

Lustre-change: https://review.whamcloud.com/53357
Lustre-commit: TBD (from 5574088906d813c8a17237edc85e55c5d54f10f5)

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-3

Change-Id: Ied0d2873974a3c8cc6e346373457c8ebc09740d6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17307 mdt: get dirent count by request
Lai Siyao [Sat, 4 Nov 2023 13:32:59 +0000 (09:32 -0400)]
LU-17307 mdt: get dirent count by request

Add MA_DIRENT_CNT/LA_DIRENT_CNT to notify osd to get dirent count.
Set it in mdt_getattr_name_lock() and when auto-split is enabled so it
won't cause overhead when auto-split is disabled, and change
oo_dirent_count type to atomic_t so the result does not become
inaccurate over time from repeated addition/removal (which may
be used to know whether directory is empty or compare directories in
the future).

In osd_dirent_count() set oo_dirent_count to 0 before iteration to
avoid multiple threads iterate at the same time, which means the
result may not be accurate in this case, but it will be eventually.

Lustre-change: https://review.whamcloud.com/53229
Lustre-commit: TBD (from 50080036674faecfe8a94ebcbb0bdbdbeddac53d)

Fixes: 03a4431dac ("LU-11025 osd: osd_attr_get() returns dirent count")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2be6c0dcfda1c98995a269585c5d8d781a8a3b42
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8236 pcc: abort data copy when clear PCC backend
Qian Yingjin [Fri, 8 Dec 2023 09:15:17 +0000 (04:15 -0500)]
EX-8236 pcc: abort data copy when clear PCC backend

This patch adds an option "--abort" for "lctl pcc del|clear"
command tools.
With this option, the user will first set ATTACH_ABORTING flag on
all in-progress attaching files, and then wait for them to abort
the attache when remove a PCC backend from a client.

Add sanity-pcc/test_108 to verify it.

Change-Id: I4e2f3ec8866e9af45f4524a9f45ee418ef4cb5be
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8778 tests: clear "trace" in quota_fini
Sergey Cheremencev [Sun, 3 Dec 2023 04:11:29 +0000 (07:11 +0300)]
EX-8778 tests: clear "trace" in quota_fini

Clear trace debug level in quota_fini.

Fixes: ba4d37b9fc ("LU-13055 libcfs: allow comma-separated masks")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I480b9975bbf99403cedbfd18154f365ebf181c09
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17212 gss: survive improper obd or imp at ctx init
Sebastien Buisson [Thu, 19 Oct 2023 09:11:48 +0000 (11:11 +0200)]
LU-17212 gss: survive improper obd or imp at ctx init

GSS context init requests can happen even after a client has been
unmounted, because they are coming from userspace (request-key,
lgss_keyring).
In this case they must be ignored, and code must be robust to survive
improper, already or partially shutdown obd device or import.

Lustre-change: https://review.whamcloud.com/52755
Lustre-commit: 3fcddf6dcdd92df6557c59913a61944f21d58615

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I541727165eadf1fcb7715e416da85d100976cf2f
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17306 ofd: return error for reconnection
Alexander Boyko [Thu, 16 Nov 2023 22:57:24 +0000 (17:57 -0500)]
LU-17306 ofd: return error for reconnection

During the cleanup orphan phase, reconnection leads to unsynchronized
last id between MDT and OST. This means that MDT could assign non
existing objects to a client for a file create operation.

ofd_create_hdl()) capstor-OST0087: dropping old orphan cleanup request
MDS LAST_ID [0x2540000400:0xb6941:0x0] (747841) is 352 behind OST
    LAST_ID [0x2540000400:0xb6aa1:0x0] (748193), trust the OST

recovery-small 144c reproduce bug where MDT lost synchronization
with OST.

Lustre-change: https://review.whamcloud.com/53195
Lustre-commit: TBD (from 1f0deff150a3087a974adbac687a5019f6c0e39d)

Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request")
HPE-bug-id: LUS-11969
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I22c3d3b3db2acc9ad8f1b978b234afe7d3eef51d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53341
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 tgt: reorder tgt_brw_write decls
Patrick Farrell [Thu, 2 Nov 2023 21:23:01 +0000 (17:23 -0400)]
EX-7601 tgt: reorder tgt_brw_write decls

Reorder the declarations in tgt_brw_write.

This patch also serves as the series head for implementing
read-modify-write support for compressed chunks.

The process for read-modify-write is similar to that used
for unaligned reads.

At a high level, read-modify-write means we must read up,
decompress, then recompress and write back the data.  This
only applies when we're actually doing read-modify-write.

To know when to do this, we rely partly on the client.  If
the client is able to compress a chunk, either because it is
a complete chunk, or because the start is chunk aligned and
the write is past EOF, we know there is no read-modify-write
required.  Either there is no existing data (write past EOF)
or the data will be fully replaced.

So, when we see a write which is not fully chunk aligned and
not already compressed, we will do a read-modify-write.

For this, we round the IO lnbs and associated locking to
cover complete chunks, then we do a read of the unaligned
chunks.

ie, if we have a write which goes from 63 KiB to 257 KiB
with a chunk size of 64 KiB, we will read 0-64 KiB and
256-320 KiB, and decompress those chunks in to the buffer.
64 KiB to 256 KiB is *NOT* read, because those are complete
chunks.

We then set up a transfer mapping - identical to the process
for unaligned reads - so the client data is written in to
the correct lnbs.

Now we have a set of chunk aligned lnbs which contain data
updated with the client write.  In the initial version, we
write these to disk uncompressed.  This is sufficient for
correct operation, but it does mean read-modify-write will
decompress those chunks.

There is code for recompression, but it is not working 100%
yet, and there are some complexities around managing holes
and EOF which still need to be resolved.

TBD if this will make our initial release - I am hopeful but
not sure yet.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia24583d4221f498928e99afa8c289b70e4d25f5b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52959
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 ofd: improve decompress_rnb debug
Patrick Farrell [Mon, 11 Dec 2023 15:49:19 +0000 (10:49 -0500)]
EX-7601 ofd: improve decompress_rnb debug

Since we're very close on landing the unaligned read
patches, this minor debug improvement is being placed later
in the series.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I36ad243bd1f7025e358f9593f1008f0b851cc1bb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53411
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoEX-7601 ofd: add decompress_rnb implementation
Patrick Farrell [Tue, 31 Oct 2023 20:11:41 +0000 (16:11 -0400)]
EX-7601 ofd: add decompress_rnb implementation

This implements decompress_rnb, which is the core code for
handling unaligned reads from the client.

Decompress rnb takes an unaligned remote niobuf and
identifies the unaligned portion(s) of the IO, then finds
the corresponding local niobufs (pages read from disk),
and passes them on for decompression in place.

decompress_chunk_in_lnb decompresses the data in a set of
lnbs and copies it back to the same location, replacing the
raw data from disk with decompressed data.  (If the chunk
was not compressed, it does nothing.)

With this patch, the implementation of unaligned reads is
complete and we can add the compression sanity tests back
safely.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ifd1d9b03d5d004bec3f5e456da359b8d10e005f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52916
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: lock pages from read to decompression
Patrick Farrell [Tue, 7 Nov 2023 21:35:51 +0000 (16:35 -0500)]
EX-7601 ofd: lock pages from read to decompression

When using the page cache on the server, for pages which
will be decompressed, we can't unlock them until they've
been decompressed.

Rather than only waiting to unlock the pages which will be
decompressed, we keep all of the read pages locked.  This
simplifies the code, at the cost of delaying other reads to
the aligned portion of an unaligned read.  ie, shouldn't be
important in practice.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie98920327979a5c9600e8c9e8627816461ea1a34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53026
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add ofd_decompress_read implementation
Patrick Farrell [Fri, 27 Oct 2023 20:50:30 +0000 (16:50 -0400)]
EX-7601 ofd: add ofd_decompress_read implementation

ofd_decompress_read is responsible for walking the
remote niobufs (rnbs) in the RPC and identifying if they
are chunk unaligned.  It then passes them on to the rnb
decompression code (not implemented yet, see next patch).

It also allocates the bounce buffers for decompression so
they can be reused for each remote niobuf.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1f2f86ce3fc036ac5d79b060a5e44f6564e123aa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: do not use page cache for compressed files
Patrick Farrell [Wed, 6 Dec 2023 15:55:24 +0000 (10:55 -0500)]
EX-7601 ofd: do not use page cache for compressed files

It is challenging for the server to safely use the page
cache with compressed files, because if data is
decompressed in to the page cache, the data in cache now
differs from the data on disk.

This is a problem if *part* of the page cache is ever
evicted, because we can end up with a situation where a read
will be partially satisfied from cache and partially from
disk, but the data on disk is compressed and the data in
cache is not.

It is possible to deal with this by carefully ensuring the
page cache is not used just for decompressed data, but this
makes getting the buffers/lnbs for compressed files fairly
complicated.  Instead, we can just entirely block using the
server page cache for compressed files.

This must be done for both read and write, and only works
for ldiskfs - ZFS cannot easily be forced to not use its
page cache.  But that's OK because we do not support CSDC
with ZFS.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iee73abb29ad5631bb2203c2133756d7ebf5b686d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53348
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add chunk_size to preprw_write
Patrick Farrell [Thu, 2 Nov 2023 21:37:43 +0000 (17:37 -0400)]
EX-7601 ofd: add chunk_size to preprw_write

preprw_write needs chunk size for rounding.  Add this in a
separate patch to keep things trivial, it will be used in
a subsequent patch.

This patch is really trivial on the write side, since the
read side already did most of this.  But it's being kept
separate for symmetry.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id25957dbc185b6e61b7f208cee8cf5f897f03944
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52962
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoEX-7601 ofd: add chunk rounding to read
Patrick Farrell [Fri, 27 Oct 2023 20:24:33 +0000 (16:24 -0400)]
EX-7601 ofd: add chunk rounding to read

We need to round all niobufs to chunk size in the read
process, so we read in the full chunk.

dt_bufs_get sets up the local niobuf for the read, so we
round before calling it.

This patch is a partial implementation of unaligned read
support, and breaks compression testing until the next few
patches are landed.  So this patch temporarily adds the
compression tests to ALWAYS_EXCEPT.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I587a519db4dae983db5db1d690e63e15bc010b7e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 tgt: add io_lnb_to_tx_lnb
Patrick Farrell [Mon, 30 Oct 2023 03:39:53 +0000 (23:39 -0400)]
EX-7601 tgt: add io_lnb_to_tx_lnb

With compression, the lnbs used for the disk IO on the
server can contain more data than the client requested,
due to reading up whole chunks for decompression.

This means we need to transfer only a subset of the lnbs.
We do this by creating a second set of lnbs, and pointing
them at the pages in the local io lnb which need to be
transferred to the client.

This code doesn't do anything for now, but it will kick in
with the next patch when we start rounding chunks for read.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0fe690718a3484578b139eaaec52c0c3b265da6a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52884
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 tgt: add second tbc lnb for RDMA
Patrick Farrell [Sun, 29 Oct 2023 02:16:01 +0000 (22:16 -0400)]
EX-7601 tgt: add second tbc lnb for RDMA

Compression requires the server to do local IO which differs
from the IO requested by the client.  This means we cannot
directly use the IO niobufs for doing the transfer to the
client.

So we add a second set of lnb pointers, which are used to
point at a specific subset of the pages in the main
per-thread cache.  This subset will be used for doing the
transfer to the client.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I53aa46045aaf335da20a311900ac0bf425823b22
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn120
Andreas Dilger [Thu, 7 Dec 2023 11:13:42 +0000 (04:13 -0700)]
RM-620 build: New tag 2.14.0-ddn120

New tag 2.14.0-ddn120

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59aeafd089ff479f3ff735a04a805ec99ecadfdb

18 months agoLU-16392 utils: use --list-commands for bash completion
Thomas Bertschinger [Wed, 21 Dec 2022 16:52:50 +0000 (11:52 -0500)]
LU-16392 utils: use --list-commands for bash completion

The CLI utils lctl and lfs currently use a pseudo option
--non-existent-option to generate a list of completions. However, this
was broken when the help output for an invalid command was changed.
Using --list-commands instead means that the format of the help output
can be kept succinct.

However, currently there are 2 issues that make --list-commands
unsuitable.

First, --list-commands truncates long commands. This commit resolves
this by not truncating long commands, and removing the fixed-length
char buffer and writing directly to stdout so that the line length
can overflow slightly if needed.

Second, --list-commands recursively displays sub-commands. For
example, for `lctl`, it will display `pcc add`, `pcc del`, etc in
additon to just `pcc`. The bash completion tools would view these
as separate tokens and thus would inappropriately suggest `add`,
`del`, etc. as completions for `lctl`. This commit removes the
recursive behavior.

Removing the recursive behavior resolves an unrelated bug with the
recursion that can be observed for `lctl`, where a number of
top-level commands are skipped following recursion into a previous
sub-command, equal to the number of subcommands processed in the
recursive call. Specifically, the commands in the section "device
setup", e.g. `attach`, `detach`, were not displayed following the
recursive call into `pcc`.

Finally, this commit changes the command parser to recognize --help
and print the list of commands when this argument is seen.

Lustre-change: https://review.whamcloud.com/49484
Lustre-commit: b4cc570ad11c1c07a6e1d825787ccc62c1245ca1

Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout or short cmd error")
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Ib6e139402b9cd18e5a54b8fd3d6a2652d301e736
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53337
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
18 months agoEX-7784 tests: reenable arm testing
Patrick Farrell [Wed, 22 Nov 2023 20:57:52 +0000 (15:57 -0500)]
EX-7784 tests: reenable arm testing

Previously, test 460a failed every time on ARM systems with
an issue with lnet/lnb transfers.

After a significant rework of the client compression code
for EX-7601, this no longer happens.

Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0490a2e7cbadb1492b58eb27c6bf8001b0704b5b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoLU-17278 ldlm: don't grant failed lock
Alex Zhuravlev [Thu, 9 Nov 2023 13:29:03 +0000 (16:29 +0300)]
LU-17278 ldlm: don't grant failed lock

lock convert can re-grant lock if it loses some bits. this
procedure can race with the import's invalidation. thus
lock can become invalid (l_granted_mode=LCK_MINMODE):
LustreError: 8637:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist())
ASSERTION( ldlm_is_granted(lock) )

Lustre-change: https://review.whamcloud.com/53051
Lustre-commit: f3b45a05475d8c65f06c81f41176b5a7f7d1acaa

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7bb20d62948224647d7632f2822fba44d39a7713
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn
Serguei Smirnov [Thu, 30 Nov 2023 18:55:11 +0000 (10:55 -0800)]
LU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn

There were examples in the field with RoCE setups which demonstrate
that CM_EVENT_UNREACHABLE may be received when connection is already
in ESTABLISHED state. This causes an assert in kiblnd_cm_callback to
fail.

Handle this in a more gracious manner: report the event as unexpected
and allow the flow to continue. If there are indeed issues on
the connection, it is expected to report transaction errors later
and get cleaned up without crashing the whole system.

Lustre-change: https://review.whamcloud.com/53298
Lustre-commit: TBD (from cbde71bf893dba0de752a190c3b16d653ef75085)

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If32166fe9fc59e025609c2035cb1c03d3bed22f2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-14928 mgs: allow md target re-register
Alexander Zarochentsev [Sun, 30 May 2021 13:43:05 +0000 (16:43 +0300)]
LU-14928 mgs: allow md target re-register

In a DNE system, it is not safe to do writeconf of
a MD target and attempt to mount (and re-register) it again,
as it creates a weird MDT-MDT osp devices like
fsname-MDT0001-osp-MDT0001" and makes the system non-functioning.
The fix doesn't allow creation of illegal devices.

Lustre-change: https://review.whamcloud.com/44594
Lustre-commit: e4f3f47f04c762770bc36c1e3fa7e92e94a36704

HPE-bug-id: LUS-10098
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I698ee6d70ac96f54eaec57b5c5fe553d130ba011
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-15112 mgc: do not ignore target registration failure
Alexander Zarochentsev [Wed, 15 Dec 2021 10:26:02 +0000 (13:26 +0300)]
LU-15112 mgc: do not ignore target registration failure

A serious target registation failure with LDD_F_ERROR
flag set is ignored by target, it makes possible
registreting new target with already used index;
Writeconf flag should be encoded in fs label regardless
the "first_time" flag, otherwise target cannot be registered
after initial registration failure.

Lustre-change: https://review.whamcloud.com/45259
Lustre-commit: cefabee52586f443bfd5163f6ac0b5e1b56a9db7

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: If051199d3dbafc8f8102f3daf086de01bc5c5f98
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-15112 ptlrpc: make rq_replied flag always correct
Alexander Zarochentsev [Wed, 15 Dec 2021 12:31:47 +0000 (15:31 +0300)]
LU-15112 ptlrpc: make rq_replied flag always correct

rq_replied flag is cleared at ptl_rpc_send() only,
so state of the flag may be incorrect for rpcs which
are timed out but have have been never sent.

Lustre-change: https://review.whamcloud.com/45871
Lustre-commit: 94f3f1b511609fa190cee64c7e8244f21ef70792

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I0de996a4d775b8f1a1a6b27ff38d21645694f868
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8498 osd: const create in osd_ldiskfs_map_inode_pages()
Alex Zhuravlev [Mon, 30 Oct 2023 08:08:57 +0000 (11:08 +0300)]
EX-8498 osd: const create in osd_ldiskfs_map_inode_pages()

create flag is used to skip reads of unwritten blocks so don't
use/modify it to enable dense writes.

Fixes: f36eda6a1e ("LU-10026 osd-ldiskfs: use preallocation for dense writes")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I63a08ae2b8ed30d8a8ef4c5570f05d300a2b430b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17334 lov: handle object created on newly added OST
Andreas Dilger [Wed, 6 Dec 2023 18:32:57 +0000 (10:32 -0800)]
LU-17334 lov: handle object created on newly added OST

When a new OST is added to a filesystem without no_create,
then a new object created on the OST relatively quickly
after it is added to the filesystem, in particular because
the new OST would be preferred by QOS space balancing
due to lots of free space. However, it might take a few
seconds for the addition of the new OST to be propagated
across all of the clients, so there is a risk that the MDS
creates file object on OSTs that a client is not yet aware of,
which returns an error to the application immediately.

This patch fixes the issue by adding a loop in lsme_unpack()
that is waiting and retrying for some number of seconds for
the filesystem layout to be updated if either the
"loi->loi_ost_idx >= lov->desc.ld_tgt_count" or "!ltd"
condition is hit.

Lustre-change: https://review.whamcloud.com/53335
Lustre-commit: TBD (from e1de624373ce6082253ddbdd987d36eb56ca6490)

Change-Id: Idc29b8c66079afaea25428577daf51370fa2b084
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53353
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17337 osd: ask for more revoke credits
Alex Zhuravlev [Tue, 5 Dec 2023 05:20:58 +0000 (08:20 +0300)]
LU-17337 osd: ask for more revoke credits

starting from 4.* kernels JBD2 tracks number of potential
revoked blocks separately from regular journal blocks and
checks a transaction doesn't exceed the declared number.
before extent merging patch a regular block allocation could
free only very limited number of blocks. now with extent
merging when an extent tree is really big and few extents
are inserted in a single transaction, then such an allocation
can exceed default revoke credits (8).
the patch uses number of extent in the transaction to calculate
potential number of revoke records (max tree depth * default).

Fixes: 0f7e6c02a9 ("LU-16843 ldiskfs: merge extent blocks")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4967deb56e5aba82b68ffdc91de589fffae6a64a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53325
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn119
Andreas Dilger [Thu, 30 Nov 2023 17:19:08 +0000 (10:19 -0700)]
RM-620 build: New tag 2.14.0-ddn119

New tag 2.14.0-ddn119

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I16137e4ed48ff6a28d9a33b9206ad6c5acab3c34

18 months agoEX-7601 ofd: add remote_pages
Patrick Farrell [Sat, 28 Oct 2023 20:34:25 +0000 (16:34 -0400)]
EX-7601 ofd: add remote_pages

When we round a read to get all of the compressed chunks,
the number of local and the number of remote pages will
differ.  We need to make sure we do the checksum and data
transfer using the number of remote pages, not the number of
local pages.

This patch calculates the number of remote pages and uses it
accordingly.  This doesn't do anything yet, but when we
round the local read to include the whole compressed chunk,
this will be needed.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4875b02016570d227b3b926efd117f0a7cda41b4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add chunk_size to preprw_read
Patrick Farrell [Fri, 27 Oct 2023 19:29:24 +0000 (15:29 -0400)]
EX-7601 ofd: add chunk_size to preprw_read

preprw_read needs chunk size for rounding.  Add this in a
separate patch to keep things trivial, it will be used in
a subsequent patch.

Also use this to add a check in DOM to ensure it doesn't
attempt to do compression.  This should already be
prevented by setstripe, so this is just an extra safety
check.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I9dc4d1559e5c8be315268a593466571b54c90a96
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: convert dt_bufs_get to offset, len
Patrick Farrell [Fri, 27 Oct 2023 19:22:12 +0000 (15:22 -0400)]
EX-7601 ofd: convert dt_bufs_get to offset, len

dt_bufs_get takes a remote niobuf, but just uses the
offset and length for getting pages.

Compression requires rounding the local IO to include the
full compression chunk, which means the local IO does not
match the remote niobuf any more.

So we modify dt_bufs_get to take an offset and length
rather than a remote niobuf, so we can ask for the pages we
need.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4beaf8207fa00d802c0a339df3de2a3c71154fc7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: round read lock to chunk
Patrick Farrell [Fri, 27 Oct 2023 18:42:41 +0000 (14:42 -0400)]
EX-7601 ofd: round read lock to chunk

For unaligned reads, we need to round the read locking to
cover the any leading or trailing chunks.  We do this by
creating a local 'remote niobuf' to describe the rounded
range and doing the locking against that niobuf.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8818522c188aca3c5c5eb564da2a8ba8aef18a4b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52864
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: identify reads to round
Patrick Farrell [Fri, 27 Oct 2023 18:37:11 +0000 (14:37 -0400)]
EX-7601 ofd: identify reads to round

If the beginning or end of a client read is unaligned, we
must round the locking.  This patch identifies reads where
this is required, the next patch will do the locking.

Print a debug message when such an IO is found, but don't
do anything different - yet.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ibdab35b733225b4b1349ef457f66ca37dcb2d9bf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: handle partial chunks in decompress_request
Patrick Farrell [Mon, 27 Nov 2023 21:07:49 +0000 (16:07 -0500)]
EX-7601 osc: handle partial chunks in decompress_request

Now that we have compression for incomplete chunks at the
end of files, decompress_request needs to handle these
chunks.  This patch modifies it to understand compressed
chunks which are less than chunk_size pages.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I877550fa0d418def406e0308392a5336ec9f3ab6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53160
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: rewrite compress_request
Patrick Farrell [Tue, 28 Nov 2023 03:35:49 +0000 (22:35 -0500)]
EX-7601 osc: rewrite compress_request

The existing version of compress_request can't handle
discontiguous RPCs.  Rewrite the logic to handle this
case properly.

This also implements kms handling.

If a write chunks ends at the known minimum size, we know
this write is after all other data in the file and so
there is no compressed data under it.  This means we can
compress this chunk.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8a912d9e279d04c8ff07de39e63a1ec9b490d921
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoLU-16804 tests: load CONFIG at beginning of init_test_env
Sebastien Buisson [Wed, 10 May 2023 12:13:54 +0000 (14:13 +0200)]
LU-16804 tests: load CONFIG at beginning of init_test_env

In order to have all environment variables properly loaded, make
CONFIG loaded at the beginning of init_test_env().

Lustre-change: https://review.whamcloud.com/50914
Lustre-commit: fdbb2bc8495064e1d9e61f02bcfd13b1e6aec8da

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1c3caa3d582c4b317ff3d0d10fc0103e046ddf17
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53250
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16784 tests: fix path to lgss_sk
Sebastien Buisson [Mon, 1 May 2023 23:44:18 +0000 (16:44 -0700)]
LU-16784 tests: fix path to lgss_sk

Find correct path to lgss_sk utility, by looking inside Lustre build
tree if command is not installed on the local node.

Lustre-change: https://review.whamcloud.com/50825
Lustre-commit: 1ba12d98d5b068083fbb855b287d0b6da0ada80d

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I23920bb2a44d2ec7e9662e75c23bd5302d8dfee2
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53251
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>