Whamcloud - gitweb
fs/lustre-release.git
18 months agoLU-17265 tests: allow margin for sanity/39r
Arshad Hussain [Wed, 8 Nov 2023 06:38:07 +0000 (12:08 +0530)]
LU-17265 tests: allow margin for sanity/39r

The timestamp may be little outdated due to a gap between
writing a file and checking the timestamp, so take that into
consideration and allow 2 second leniency when comparing
timestamps.

The on-disk inode may also not be flushed from the journal
immediately, so allow some time for it to be updated.

This patch also converts the hex value read via debugfs
to decimal.

Lustre-change: https://review.whamcloud.com/53035
Lustre-commit: c5aa16db172afc9cbf0d4fd2c85261fef1a40d7b

Test-Parameters: trivial testlist=sanity env=ONLY=39r,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I9e765f9cd572fb25821f9a0401c34209b7c3f574
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53453
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
18 months agoLU-17360 kernel: update RHEL 9.3 [5.14.0-362.13.1.el9_3]
Jian Yu [Wed, 13 Dec 2023 08:32:22 +0000 (00:32 -0800)]
LU-17360 kernel: update RHEL 9.3 [5.14.0-362.13.1.el9_3]

Update RHEL 9.3 kernel to 5.14.0-362.13.1.el9_3 for Lustre client.

Lustre-change: https://review.whamcloud.com/53433
Lustre-commit: TBD (from 3662949bcd342a96f8dddcb6663872e870f9871b)

Test-Parameters: trivial env=SANITY_EXCEPT="906" \
  mdtcount=4 mdscount=2 clientdistro=el9.3 testlist=sanity
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.3 testgroup=full-part-3

Change-Id: I35863d298a612d7913d39f9031e792808f204ad4
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53435
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-16397 test: check quota setting on QSD
Hongchao Zhang [Tue, 12 Dec 2023 10:37:22 +0000 (18:37 +0800)]
LU-16397 test: check quota setting on QSD

In some case, the quota setting at QMT could not be transfered to
QSD in time, which could cause the test to fail.
This patch adds check on QSD after setting the quota limit by LFS.

Lustre-change: https://review.whamcloud.com/49533/
Lustre-commit: TBD (from 76a7ad75740639b9255c51277ff65ce261379af6)

Test-Parameters: trivial testlist=sanity-quota
Change-Id: Ia999317a36a0f97c1f66726cdc10e9edac3d8a53
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53402
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17057 tests: Fix sanity-sec/0
Arshad Hussain [Tue, 21 Nov 2023 15:10:51 +0000 (20:40 +0530)]
LU-17057 tests: Fix sanity-sec/0

Command executed through 'runas' on failure breaks
out of running test script. While this failure is
expected. The setting of 'set -e' forces the pipeline
to exit the running script immediately. This patch
fixes this by checking the return value and then
taking the appropriate action.

This patch also fixes 'touch' command to file f4 by
correctly calling it via uid and gid as it was set
few lines above.

Lustre-change: https://review.whamcloud.com/53194
Lustre-commit: 0b5e252d973e00200660a81f1cdb440f8f4f1886

Test-Parameters: trivial testlist=sanity-sec env=ONLY=0,ONLY_REPEAT=100
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I06e6d22840e31add8c24cf90c31b98464d580ae7
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53439
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17203 libcfs: ignore remaining items
Alex Zhuravlev [Tue, 17 Oct 2023 11:48:58 +0000 (14:48 +0300)]
LU-17203 libcfs: ignore remaining items

remove the assertion checking libcfs hashtable for emptiness
in cfs_hash_for_each_empty(). the only user of this hashtable
is per-export ldlm locks set. in this case it's legal that
some locks can't be removed from the hashtable being in the
process of enqueuing. the hashtable is destroyed from the
export destroy function which in turn is called only when all
RPCs on this export are done (exp_rpc_count==0).

Lustre-change: https://review.whamcloud.com/52726
Lustre-commit: f2f8b6deaf54f1a264b31b44f6cf875fa1629ab2

Fixes: 306a9b666e ("LU-16272 libcfs: cfs_hash_for_each_empty optimization")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I2b853b017bb7247a0c60cc8f464c2e08d649f0eb
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53404
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16272 libcfs: cfs_hash_for_each_empty optimization
Alexander Zarochentsev [Thu, 20 Oct 2022 19:23:39 +0000 (22:23 +0300)]
LU-16272 libcfs: cfs_hash_for_each_empty optimization

Restarts from bucket 0 in cfs_hash_for_each_empty()
cause excessive cpu consumption while checking first empty
buckets.

Lustre-change: https://review.whamcloud.com/48972
Lustre-commit: 306a9b666e5ea2882f704d93483355e7e147544f

HPE-bug-id: LUS-11311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ic03875ea25101052468213043128912ac46daf32
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53379
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17015 sec: fix PTLRPC_CTX_STATUS_MASK
Sebastien Buisson [Wed, 11 Oct 2023 13:29:46 +0000 (15:29 +0200)]
LU-17015 sec: fix PTLRPC_CTX_STATUS_MASK

PTLRPC_CTX_STATUS_MASK should not include PTLRPC_CTX_NEW_BIT, which is
a bit index and not a value. Also, according to code in
sptlrpc_req_refresh_ctx():
if (unlikely(test_bit(PTLRPC_CTX_NEW_BIT, &ctx->cc_flags))) {
   if (ctx->cc_ops->refresh)
      ctx->cc_ops->refresh(ctx);
}
a context needs to be refreshed if it has the PTLRPC_CTX_NEW_BIT bit.
So the function to check if context is refreshed, cli_ctx_is_refreshed
should not return true if the PTLRPC_CTX_NEW_BIT bit is set.

In the end, do not replace PTLRPC_CTX_NEW_BIT with anything else in
PTLRPC_CTX_STATUS_MASK. Having PTLRPC_CTX_NEW_BIT was a no-op (bitwise
OR with 0), but this was working as expected. Just cleanup the code to
avoid headaches.

Lustre-change: https://review.whamcloud.com/52629
Lustre-commit: c744221a1fd55df33ca2b0e3e1b1ffd7ef3a986d

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibc2ca9dfaa176b098080f7f2867338b62953b50e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53441
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-14104 tests: sanity/123* shouldn't fail performance checks
Alex Zhuravlev [Mon, 2 Nov 2020 07:13:44 +0000 (10:13 +0300)]
LU-14104 tests: sanity/123* shouldn't fail performance checks

running in VMs as CPU resource isn't strictly guaranteed usually.

Lustre-change: https://review.whamcloud.com/40512
Lustre-commit: b1915f13e3b69c72e3e4c1f2a32d022b6a20d347

Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ieec4a89b921f7ccc198eb10513d4980ad3a20b51
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53456
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-8151 obd: Show correct shadow mountpoints for server
Arshad Hussain [Thu, 14 Dec 2023 08:04:25 +0000 (00:04 -0800)]
LU-8151 obd: Show correct shadow mountpoints for server

server_fill_super_common() preps the server for mounting
and forces "Read only" (SB_RDONLY) flag to restrict IO on
the server. This when running the mount command reflects
FS always as "ro" although they are "rw"

This patch double checks the obd statfs (FS) state for
"read only" flag (OS_STATFS_READONLY) and if not found
to be really "read only" toggles (removes) SB_RDONLY flag.

The client output remains unchanged.

Output before patch:
/dev/.../mds1_flakey on /mnt/lustre-mds1 type lustre (ro,svname=...)
/dev/.../ost1_flakey on /mnt/lustre-ost1 type lustre (ro,svname=...)

Output after patch:
/dev/.../mds1_flakey on /mnt/lustre-mds1 type lustre (rw,svname=...)
/dev/.../ost1_flakey on /mnt/lustre-ost1 type lustre (rw,svname=...)

Test case conf-sanity/113 added.

Lustre-change: https://review.whamcloud.com/47131
Lustre-commit: 0171801df517988b0eb1023378c2c8c07a0a36f1

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ie92a686ae97dd62885f415b453bad6bdc0ed3d28
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53445
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
18 months agoLU-17347 debs: also move .ddeb files into debs/
Aurelien Degremont [Fri, 8 Dec 2023 12:34:09 +0000 (13:34 +0100)]
LU-17347 debs: also move .ddeb files into debs/

When building debian packages, the resulting packages are
moved into a 'debs/' subdir.

Don't miss the debug symbol packages 'dbgsym', which are
suffixed .ddeb.

Also add .buildinfo file.

Lustre-change: https://review.whamcloud.com/53378/
Lustre-commit: TBD

Test-Parameters: trivial
Change-Id: I52d0bddfaafc67c4a2a2dbc786d7f320c0b979f8
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53421
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17336 gss: fix __user pointer in rsi_upcall_seq_write
Sebastien Buisson [Wed, 6 Dec 2023 08:15:18 +0000 (09:15 +0100)]
LU-17336 gss: fix __user pointer in rsi_upcall_seq_write

rsi_upcall_seq_write() uses sscanf to get the string passed from
userspace, but this needs to be copied to a kernel buffer first.

Lustre-change: https://review.whamcloud.com/53342
Lustre-commit: TBD (from 523ffed1cb43eec5fac38c144967026308da9cad)

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I2ec875b7c6c158695857fe912ec1dd9f41ddc25d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53434
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn121
Andreas Dilger [Tue, 12 Dec 2023 05:53:22 +0000 (22:53 -0700)]
RM-620 build: New tag 2.14.0-ddn121

New tag 2.14.0-ddn121

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9925c0d0e75c01a5e0a97bf34f8386efa2da8fbe

18 months agoRM-620 build: New tag lipe-2.38
Andreas Dilger [Tue, 12 Dec 2023 05:53:04 +0000 (22:53 -0700)]
RM-620 build: New tag lipe-2.38

New tag lipe-2.38

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I62e6034c466f566015d82da9b9a3a4e4c50cc4fb

18 months agoEX-8590 lipe: Use SSH poll to read stdout/err unblocking
Alexandre Ioffe [Sat, 25 Nov 2023 08:36:42 +0000 (00:36 -0800)]
EX-8590 lipe: Use SSH poll to read stdout/err unblocking

Limit to use only one client machine for hot-pools tests 75*
Fix skip condition for tests 75a,b,c when bandwidth limit
options are not available.
Use ssh poll and unblocking read to read stdout/err in loop
to prevent losing the output when it is not ready.

Test-Parameters: trivial testlist=hot-pools
Test-Parameters: testlist=hot-pools env=ONLY=75a,ONLY_REPEAT=82
Test-Parameters: testlist=hot-pools env=ONLY=75b,ONLY_REPEAT=82
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ibe07cdd51197c1f3c048b7fcdab6caff850067e7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53288
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9]
Jian Yu [Thu, 7 Dec 2023 07:45:47 +0000 (23:45 -0800)]
LU-17338 kernel: update RHEL 8.9 [4.18.0-513.9.1.el8_9]

Update RHEL 8.9 kernel to 4.18.0-513.9.1.el8_9 for Lustre client.

Lustre-change: https://review.whamcloud.com/53357
Lustre-commit: TBD (from 5574088906d813c8a17237edc85e55c5d54f10f5)

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-3

Change-Id: Ied0d2873974a3c8cc6e346373457c8ebc09740d6
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17307 mdt: get dirent count by request
Lai Siyao [Sat, 4 Nov 2023 13:32:59 +0000 (09:32 -0400)]
LU-17307 mdt: get dirent count by request

Add MA_DIRENT_CNT/LA_DIRENT_CNT to notify osd to get dirent count.
Set it in mdt_getattr_name_lock() and when auto-split is enabled so it
won't cause overhead when auto-split is disabled, and change
oo_dirent_count type to atomic_t so the result does not become
inaccurate over time from repeated addition/removal (which may
be used to know whether directory is empty or compare directories in
the future).

In osd_dirent_count() set oo_dirent_count to 0 before iteration to
avoid multiple threads iterate at the same time, which means the
result may not be accurate in this case, but it will be eventually.

Lustre-change: https://review.whamcloud.com/53229
Lustre-commit: TBD (from 50080036674faecfe8a94ebcbb0bdbdbeddac53d)

Fixes: 03a4431dac ("LU-11025 osd: osd_attr_get() returns dirent count")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I2be6c0dcfda1c98995a269585c5d8d781a8a3b42
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53275
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8236 pcc: abort data copy when clear PCC backend
Qian Yingjin [Fri, 8 Dec 2023 09:15:17 +0000 (04:15 -0500)]
EX-8236 pcc: abort data copy when clear PCC backend

This patch adds an option "--abort" for "lctl pcc del|clear"
command tools.
With this option, the user will first set ATTACH_ABORTING flag on
all in-progress attaching files, and then wait for them to abort
the attache when remove a PCC backend from a client.

Add sanity-pcc/test_108 to verify it.

Change-Id: I4e2f3ec8866e9af45f4524a9f45ee418ef4cb5be
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53373
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8778 tests: clear "trace" in quota_fini
Sergey Cheremencev [Sun, 3 Dec 2023 04:11:29 +0000 (07:11 +0300)]
EX-8778 tests: clear "trace" in quota_fini

Clear trace debug level in quota_fini.

Fixes: ba4d37b9fc ("LU-13055 libcfs: allow comma-separated masks")
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I480b9975bbf99403cedbfd18154f365ebf181c09
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53385
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17212 gss: survive improper obd or imp at ctx init
Sebastien Buisson [Thu, 19 Oct 2023 09:11:48 +0000 (11:11 +0200)]
LU-17212 gss: survive improper obd or imp at ctx init

GSS context init requests can happen even after a client has been
unmounted, because they are coming from userspace (request-key,
lgss_keyring).
In this case they must be ignored, and code must be robust to survive
improper, already or partially shutdown obd device or import.

Lustre-change: https://review.whamcloud.com/52755
Lustre-commit: 3fcddf6dcdd92df6557c59913a61944f21d58615

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I541727165eadf1fcb7715e416da85d100976cf2f
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53291
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17306 ofd: return error for reconnection
Alexander Boyko [Thu, 16 Nov 2023 22:57:24 +0000 (17:57 -0500)]
LU-17306 ofd: return error for reconnection

During the cleanup orphan phase, reconnection leads to unsynchronized
last id between MDT and OST. This means that MDT could assign non
existing objects to a client for a file create operation.

ofd_create_hdl()) capstor-OST0087: dropping old orphan cleanup request
MDS LAST_ID [0x2540000400:0xb6941:0x0] (747841) is 352 behind OST
    LAST_ID [0x2540000400:0xb6aa1:0x0] (748193), trust the OST

recovery-small 144c reproduce bug where MDT lost synchronization
with OST.

Lustre-change: https://review.whamcloud.com/53195
Lustre-commit: TBD (from 1f0deff150a3087a974adbac687a5019f6c0e39d)

Fixes: 63e17799a3 ("LU-8367 osp: enable replay for precreation request")
HPE-bug-id: LUS-11969
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: I22c3d3b3db2acc9ad8f1b978b234afe7d3eef51d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53341
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 tgt: reorder tgt_brw_write decls
Patrick Farrell [Thu, 2 Nov 2023 21:23:01 +0000 (17:23 -0400)]
EX-7601 tgt: reorder tgt_brw_write decls

Reorder the declarations in tgt_brw_write.

This patch also serves as the series head for implementing
read-modify-write support for compressed chunks.

The process for read-modify-write is similar to that used
for unaligned reads.

At a high level, read-modify-write means we must read up,
decompress, then recompress and write back the data.  This
only applies when we're actually doing read-modify-write.

To know when to do this, we rely partly on the client.  If
the client is able to compress a chunk, either because it is
a complete chunk, or because the start is chunk aligned and
the write is past EOF, we know there is no read-modify-write
required.  Either there is no existing data (write past EOF)
or the data will be fully replaced.

So, when we see a write which is not fully chunk aligned and
not already compressed, we will do a read-modify-write.

For this, we round the IO lnbs and associated locking to
cover complete chunks, then we do a read of the unaligned
chunks.

ie, if we have a write which goes from 63 KiB to 257 KiB
with a chunk size of 64 KiB, we will read 0-64 KiB and
256-320 KiB, and decompress those chunks in to the buffer.
64 KiB to 256 KiB is *NOT* read, because those are complete
chunks.

We then set up a transfer mapping - identical to the process
for unaligned reads - so the client data is written in to
the correct lnbs.

Now we have a set of chunk aligned lnbs which contain data
updated with the client write.  In the initial version, we
write these to disk uncompressed.  This is sufficient for
correct operation, but it does mean read-modify-write will
decompress those chunks.

There is code for recompression, but it is not working 100%
yet, and there are some complexities around managing holes
and EOF which still need to be resolved.

TBD if this will make our initial release - I am hopeful but
not sure yet.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia24583d4221f498928e99afa8c289b70e4d25f5b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52959
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 ofd: improve decompress_rnb debug
Patrick Farrell [Mon, 11 Dec 2023 15:49:19 +0000 (10:49 -0500)]
EX-7601 ofd: improve decompress_rnb debug

Since we're very close on landing the unaligned read
patches, this minor debug improvement is being placed later
in the series.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I36ad243bd1f7025e358f9593f1008f0b851cc1bb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53411
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoEX-7601 ofd: add decompress_rnb implementation
Patrick Farrell [Tue, 31 Oct 2023 20:11:41 +0000 (16:11 -0400)]
EX-7601 ofd: add decompress_rnb implementation

This implements decompress_rnb, which is the core code for
handling unaligned reads from the client.

Decompress rnb takes an unaligned remote niobuf and
identifies the unaligned portion(s) of the IO, then finds
the corresponding local niobufs (pages read from disk),
and passes them on for decompression in place.

decompress_chunk_in_lnb decompresses the data in a set of
lnbs and copies it back to the same location, replacing the
raw data from disk with decompressed data.  (If the chunk
was not compressed, it does nothing.)

With this patch, the implementation of unaligned reads is
complete and we can add the compression sanity tests back
safely.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ifd1d9b03d5d004bec3f5e456da359b8d10e005f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52916
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: lock pages from read to decompression
Patrick Farrell [Tue, 7 Nov 2023 21:35:51 +0000 (16:35 -0500)]
EX-7601 ofd: lock pages from read to decompression

When using the page cache on the server, for pages which
will be decompressed, we can't unlock them until they've
been decompressed.

Rather than only waiting to unlock the pages which will be
decompressed, we keep all of the read pages locked.  This
simplifies the code, at the cost of delaying other reads to
the aligned portion of an unaligned read.  ie, shouldn't be
important in practice.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie98920327979a5c9600e8c9e8627816461ea1a34
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53026
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add ofd_decompress_read implementation
Patrick Farrell [Fri, 27 Oct 2023 20:50:30 +0000 (16:50 -0400)]
EX-7601 ofd: add ofd_decompress_read implementation

ofd_decompress_read is responsible for walking the
remote niobufs (rnbs) in the RPC and identifying if they
are chunk unaligned.  It then passes them on to the rnb
decompression code (not implemented yet, see next patch).

It also allocates the bounce buffers for decompression so
they can be reused for each remote niobuf.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1f2f86ce3fc036ac5d79b060a5e44f6564e123aa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52868
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: do not use page cache for compressed files
Patrick Farrell [Wed, 6 Dec 2023 15:55:24 +0000 (10:55 -0500)]
EX-7601 ofd: do not use page cache for compressed files

It is challenging for the server to safely use the page
cache with compressed files, because if data is
decompressed in to the page cache, the data in cache now
differs from the data on disk.

This is a problem if *part* of the page cache is ever
evicted, because we can end up with a situation where a read
will be partially satisfied from cache and partially from
disk, but the data on disk is compressed and the data in
cache is not.

It is possible to deal with this by carefully ensuring the
page cache is not used just for decompressed data, but this
makes getting the buffers/lnbs for compressed files fairly
complicated.  Instead, we can just entirely block using the
server page cache for compressed files.

This must be done for both read and write, and only works
for ldiskfs - ZFS cannot easily be forced to not use its
page cache.  But that's OK because we do not support CSDC
with ZFS.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Iee73abb29ad5631bb2203c2133756d7ebf5b686d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53348
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add chunk_size to preprw_write
Patrick Farrell [Thu, 2 Nov 2023 21:37:43 +0000 (17:37 -0400)]
EX-7601 ofd: add chunk_size to preprw_write

preprw_write needs chunk size for rounding.  Add this in a
separate patch to keep things trivial, it will be used in
a subsequent patch.

This patch is really trivial on the write side, since the
read side already did most of this.  But it's being kept
separate for symmetry.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id25957dbc185b6e61b7f208cee8cf5f897f03944
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52962
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoEX-7601 ofd: add chunk rounding to read
Patrick Farrell [Fri, 27 Oct 2023 20:24:33 +0000 (16:24 -0400)]
EX-7601 ofd: add chunk rounding to read

We need to round all niobufs to chunk size in the read
process, so we read in the full chunk.

dt_bufs_get sets up the local niobuf for the read, so we
round before calling it.

This patch is a partial implementation of unaligned read
support, and breaks compression testing until the next few
patches are landed.  So this patch temporarily adds the
compression tests to ALWAYS_EXCEPT.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I587a519db4dae983db5db1d690e63e15bc010b7e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52867
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 tgt: add io_lnb_to_tx_lnb
Patrick Farrell [Mon, 30 Oct 2023 03:39:53 +0000 (23:39 -0400)]
EX-7601 tgt: add io_lnb_to_tx_lnb

With compression, the lnbs used for the disk IO on the
server can contain more data than the client requested,
due to reading up whole chunks for decompression.

This means we need to transfer only a subset of the lnbs.
We do this by creating a second set of lnbs, and pointing
them at the pages in the local io lnb which need to be
transferred to the client.

This code doesn't do anything for now, but it will kick in
with the next patch when we start rounding chunks for read.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0fe690718a3484578b139eaaec52c0c3b265da6a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52884
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 tgt: add second tbc lnb for RDMA
Patrick Farrell [Sun, 29 Oct 2023 02:16:01 +0000 (22:16 -0400)]
EX-7601 tgt: add second tbc lnb for RDMA

Compression requires the server to do local IO which differs
from the IO requested by the client.  This means we cannot
directly use the IO niobufs for doing the transfer to the
client.

So we add a second set of lnb pointers, which are used to
point at a specific subset of the pages in the main
per-thread cache.  This subset will be used for doing the
transfer to the client.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I53aa46045aaf335da20a311900ac0bf425823b22
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52881
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn120
Andreas Dilger [Thu, 7 Dec 2023 11:13:42 +0000 (04:13 -0700)]
RM-620 build: New tag 2.14.0-ddn120

New tag 2.14.0-ddn120

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I59aeafd089ff479f3ff735a04a805ec99ecadfdb

18 months agoLU-16392 utils: use --list-commands for bash completion
Thomas Bertschinger [Wed, 21 Dec 2022 16:52:50 +0000 (11:52 -0500)]
LU-16392 utils: use --list-commands for bash completion

The CLI utils lctl and lfs currently use a pseudo option
--non-existent-option to generate a list of completions. However, this
was broken when the help output for an invalid command was changed.
Using --list-commands instead means that the format of the help output
can be kept succinct.

However, currently there are 2 issues that make --list-commands
unsuitable.

First, --list-commands truncates long commands. This commit resolves
this by not truncating long commands, and removing the fixed-length
char buffer and writing directly to stdout so that the line length
can overflow slightly if needed.

Second, --list-commands recursively displays sub-commands. For
example, for `lctl`, it will display `pcc add`, `pcc del`, etc in
additon to just `pcc`. The bash completion tools would view these
as separate tokens and thus would inappropriately suggest `add`,
`del`, etc. as completions for `lctl`. This commit removes the
recursive behavior.

Removing the recursive behavior resolves an unrelated bug with the
recursion that can be observed for `lctl`, where a number of
top-level commands are skipped following recursion into a previous
sub-command, equal to the number of subcommands processed in the
recursive call. Specifically, the commands in the section "device
setup", e.g. `attach`, `detach`, were not displayed following the
recursive call into `pcc`.

Finally, this commit changes the command parser to recognize --help
and print the list of commands when this argument is seen.

Lustre-change: https://review.whamcloud.com/49484
Lustre-commit: b4cc570ad11c1c07a6e1d825787ccc62c1245ca1

Fixes: bc69a8d058 ("LU-8621 utils: cmd help to stdout or short cmd error")
Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Ib6e139402b9cd18e5a54b8fd3d6a2652d301e736
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53337
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
18 months agoEX-7784 tests: reenable arm testing
Patrick Farrell [Wed, 22 Nov 2023 20:57:52 +0000 (15:57 -0500)]
EX-7784 tests: reenable arm testing

Previously, test 460a failed every time on ARM systems with
an issue with lnet/lnb transfers.

After a significant rework of the client compression code
for EX-7601, this no longer happens.

Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Test-Parameters: trivial testlist=sanity clientdistro=el8.8 clientarch=aarch64
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0490a2e7cbadb1492b58eb27c6bf8001b0704b5b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53201
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoLU-17278 ldlm: don't grant failed lock
Alex Zhuravlev [Thu, 9 Nov 2023 13:29:03 +0000 (16:29 +0300)]
LU-17278 ldlm: don't grant failed lock

lock convert can re-grant lock if it loses some bits. this
procedure can race with the import's invalidation. thus
lock can become invalid (l_granted_mode=LCK_MINMODE):
LustreError: 8637:0:(ldlm_lock.c:1095:ldlm_grant_lock_with_skiplist())
ASSERTION( ldlm_is_granted(lock) )

Lustre-change: https://review.whamcloud.com/53051
Lustre-commit: f3b45a05475d8c65f06c81f41176b5a7f7d1acaa

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I7bb20d62948224647d7632f2822fba44d39a7713
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53286
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn
Serguei Smirnov [Thu, 30 Nov 2023 18:55:11 +0000 (10:55 -0800)]
LU-17325 o2iblnd: CM_EVENT_UNREACHABLE on established conn

There were examples in the field with RoCE setups which demonstrate
that CM_EVENT_UNREACHABLE may be received when connection is already
in ESTABLISHED state. This causes an assert in kiblnd_cm_callback to
fail.

Handle this in a more gracious manner: report the event as unexpected
and allow the flow to continue. If there are indeed issues on
the connection, it is expected to report transaction errors later
and get cleaned up without crashing the whole system.

Lustre-change: https://review.whamcloud.com/53298
Lustre-commit: TBD (from cbde71bf893dba0de752a190c3b16d653ef75085)

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: If32166fe9fc59e025609c2035cb1c03d3bed22f2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53301
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-14928 mgs: allow md target re-register
Alexander Zarochentsev [Sun, 30 May 2021 13:43:05 +0000 (16:43 +0300)]
LU-14928 mgs: allow md target re-register

In a DNE system, it is not safe to do writeconf of
a MD target and attempt to mount (and re-register) it again,
as it creates a weird MDT-MDT osp devices like
fsname-MDT0001-osp-MDT0001" and makes the system non-functioning.
The fix doesn't allow creation of illegal devices.

Lustre-change: https://review.whamcloud.com/44594
Lustre-commit: e4f3f47f04c762770bc36c1e3fa7e92e94a36704

HPE-bug-id: LUS-10098
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I698ee6d70ac96f54eaec57b5c5fe553d130ba011
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53328
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-15112 mgc: do not ignore target registration failure
Alexander Zarochentsev [Wed, 15 Dec 2021 10:26:02 +0000 (13:26 +0300)]
LU-15112 mgc: do not ignore target registration failure

A serious target registation failure with LDD_F_ERROR
flag set is ignored by target, it makes possible
registreting new target with already used index;
Writeconf flag should be encoded in fs label regardless
the "first_time" flag, otherwise target cannot be registered
after initial registration failure.

Lustre-change: https://review.whamcloud.com/45259
Lustre-commit: cefabee52586f443bfd5163f6ac0b5e1b56a9db7

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: If051199d3dbafc8f8102f3daf086de01bc5c5f98
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53340
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-15112 ptlrpc: make rq_replied flag always correct
Alexander Zarochentsev [Wed, 15 Dec 2021 12:31:47 +0000 (15:31 +0300)]
LU-15112 ptlrpc: make rq_replied flag always correct

rq_replied flag is cleared at ptl_rpc_send() only,
so state of the flag may be incorrect for rpcs which
are timed out but have have been never sent.

Lustre-change: https://review.whamcloud.com/45871
Lustre-commit: 94f3f1b511609fa190cee64c7e8244f21ef70792

HPE-bug-id: LUS-8752
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I0de996a4d775b8f1a1a6b27ff38d21645694f868
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53329
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8498 osd: const create in osd_ldiskfs_map_inode_pages()
Alex Zhuravlev [Mon, 30 Oct 2023 08:08:57 +0000 (11:08 +0300)]
EX-8498 osd: const create in osd_ldiskfs_map_inode_pages()

create flag is used to skip reads of unwritten blocks so don't
use/modify it to enable dense writes.

Fixes: f36eda6a1e ("LU-10026 osd-ldiskfs: use preallocation for dense writes")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I63a08ae2b8ed30d8a8ef4c5570f05d300a2b430b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17334 lov: handle object created on newly added OST
Andreas Dilger [Wed, 6 Dec 2023 18:32:57 +0000 (10:32 -0800)]
LU-17334 lov: handle object created on newly added OST

When a new OST is added to a filesystem without no_create,
then a new object created on the OST relatively quickly
after it is added to the filesystem, in particular because
the new OST would be preferred by QOS space balancing
due to lots of free space. However, it might take a few
seconds for the addition of the new OST to be propagated
across all of the clients, so there is a risk that the MDS
creates file object on OSTs that a client is not yet aware of,
which returns an error to the application immediately.

This patch fixes the issue by adding a loop in lsme_unpack()
that is waiting and retrying for some number of seconds for
the filesystem layout to be updated if either the
"loi->loi_ost_idx >= lov->desc.ld_tgt_count" or "!ltd"
condition is hit.

Lustre-change: https://review.whamcloud.com/53335
Lustre-commit: TBD (from e1de624373ce6082253ddbdd987d36eb56ca6490)

Change-Id: Idc29b8c66079afaea25428577daf51370fa2b084
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53353
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17337 osd: ask for more revoke credits
Alex Zhuravlev [Tue, 5 Dec 2023 05:20:58 +0000 (08:20 +0300)]
LU-17337 osd: ask for more revoke credits

starting from 4.* kernels JBD2 tracks number of potential
revoked blocks separately from regular journal blocks and
checks a transaction doesn't exceed the declared number.
before extent merging patch a regular block allocation could
free only very limited number of blocks. now with extent
merging when an extent tree is really big and few extents
are inserted in a single transaction, then such an allocation
can exceed default revoke credits (8).
the patch uses number of extent in the transaction to calculate
potential number of revoke records (max tree depth * default).

Fixes: 0f7e6c02a9 ("LU-16843 ldiskfs: merge extent blocks")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I4967deb56e5aba82b68ffdc91de589fffae6a64a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53325
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn119
Andreas Dilger [Thu, 30 Nov 2023 17:19:08 +0000 (10:19 -0700)]
RM-620 build: New tag 2.14.0-ddn119

New tag 2.14.0-ddn119

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I16137e4ed48ff6a28d9a33b9206ad6c5acab3c34

18 months agoEX-7601 ofd: add remote_pages
Patrick Farrell [Sat, 28 Oct 2023 20:34:25 +0000 (16:34 -0400)]
EX-7601 ofd: add remote_pages

When we round a read to get all of the compressed chunks,
the number of local and the number of remote pages will
differ.  We need to make sure we do the checksum and data
transfer using the number of remote pages, not the number of
local pages.

This patch calculates the number of remote pages and uses it
accordingly.  This doesn't do anything yet, but when we
round the local read to include the whole compressed chunk,
this will be needed.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4875b02016570d227b3b926efd117f0a7cda41b4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add chunk_size to preprw_read
Patrick Farrell [Fri, 27 Oct 2023 19:29:24 +0000 (15:29 -0400)]
EX-7601 ofd: add chunk_size to preprw_read

preprw_read needs chunk size for rounding.  Add this in a
separate patch to keep things trivial, it will be used in
a subsequent patch.

Also use this to add a check in DOM to ensure it doesn't
attempt to do compression.  This should already be
prevented by setstripe, so this is just an extra safety
check.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I9dc4d1559e5c8be315268a593466571b54c90a96
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: convert dt_bufs_get to offset, len
Patrick Farrell [Fri, 27 Oct 2023 19:22:12 +0000 (15:22 -0400)]
EX-7601 ofd: convert dt_bufs_get to offset, len

dt_bufs_get takes a remote niobuf, but just uses the
offset and length for getting pages.

Compression requires rounding the local IO to include the
full compression chunk, which means the local IO does not
match the remote niobuf any more.

So we modify dt_bufs_get to take an offset and length
rather than a remote niobuf, so we can ask for the pages we
need.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4beaf8207fa00d802c0a339df3de2a3c71154fc7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52865
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: round read lock to chunk
Patrick Farrell [Fri, 27 Oct 2023 18:42:41 +0000 (14:42 -0400)]
EX-7601 ofd: round read lock to chunk

For unaligned reads, we need to round the read locking to
cover the any leading or trailing chunks.  We do this by
creating a local 'remote niobuf' to describe the rounded
range and doing the locking against that niobuf.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8818522c188aca3c5c5eb564da2a8ba8aef18a4b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52864
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: identify reads to round
Patrick Farrell [Fri, 27 Oct 2023 18:37:11 +0000 (14:37 -0400)]
EX-7601 ofd: identify reads to round

If the beginning or end of a client read is unaligned, we
must round the locking.  This patch identifies reads where
this is required, the next patch will do the locking.

Print a debug message when such an IO is found, but don't
do anything different - yet.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ibdab35b733225b4b1349ef457f66ca37dcb2d9bf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: handle partial chunks in decompress_request
Patrick Farrell [Mon, 27 Nov 2023 21:07:49 +0000 (16:07 -0500)]
EX-7601 osc: handle partial chunks in decompress_request

Now that we have compression for incomplete chunks at the
end of files, decompress_request needs to handle these
chunks.  This patch modifies it to understand compressed
chunks which are less than chunk_size pages.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I877550fa0d418def406e0308392a5336ec9f3ab6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53160
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: rewrite compress_request
Patrick Farrell [Tue, 28 Nov 2023 03:35:49 +0000 (22:35 -0500)]
EX-7601 osc: rewrite compress_request

The existing version of compress_request can't handle
discontiguous RPCs.  Rewrite the logic to handle this
case properly.

This also implements kms handling.

If a write chunks ends at the known minimum size, we know
this write is after all other data in the file and so
there is no compressed data under it.  This means we can
compress this chunk.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8a912d9e279d04c8ff07de39e63a1ec9b490d921
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53111
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoLU-16804 tests: load CONFIG at beginning of init_test_env
Sebastien Buisson [Wed, 10 May 2023 12:13:54 +0000 (14:13 +0200)]
LU-16804 tests: load CONFIG at beginning of init_test_env

In order to have all environment variables properly loaded, make
CONFIG loaded at the beginning of init_test_env().

Lustre-change: https://review.whamcloud.com/50914
Lustre-commit: fdbb2bc8495064e1d9e61f02bcfd13b1e6aec8da

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1c3caa3d582c4b317ff3d0d10fc0103e046ddf17
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53250
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16784 tests: fix path to lgss_sk
Sebastien Buisson [Mon, 1 May 2023 23:44:18 +0000 (16:44 -0700)]
LU-16784 tests: fix path to lgss_sk

Find correct path to lgss_sk utility, by looking inside Lustre build
tree if command is not installed on the local node.

Lustre-change: https://review.whamcloud.com/50825
Lustre-commit: 1ba12d98d5b068083fbb855b287d0b6da0ada80d

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I23920bb2a44d2ec7e9662e75c23bd5302d8dfee2
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53251
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-17230 socklnd: treat UNKNOWN netif operstate as UP
Serguei Smirnov [Thu, 26 Oct 2023 18:15:28 +0000 (11:15 -0700)]
LU-17230 socklnd: treat UNKNOWN netif operstate as UP

"UNKNOWN" (IF_OPER_UNKNOWN) operational state doesn't necessarily
mean that the interface can't be used and may be the result of
particular network driver not providing UP/DOWN states,
so it may be incorrect for socklnd to initiate
setting of a "fatal error" flag on a NI using an interface
in "UNKNOWN" operstate.

Lustre-change: https://review.whamcloud.com/52842
Lustre-commit: 6897dbe67c0d7d7554926128a17c65afa1ec0001

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I39dfa01f3758809440d50cf8b6b11555889ef366
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53285
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn118
Andreas Dilger [Mon, 27 Nov 2023 18:49:46 +0000 (11:49 -0700)]
RM-620 build: New tag 2.14.0-ddn118

New tag 2.14.0-ddn118

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I40b73ac045de1c00d39691913f81a9e4dccdb72b

18 months agoRM-620 build: New tag lipe-2.37
Andreas Dilger [Mon, 27 Nov 2023 18:47:50 +0000 (11:47 -0700)]
RM-620 build: New tag lipe-2.37

New tag lipe-2.37

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6a55ba0178cc2e2f2bd7566ce3de5f7b231692c5

18 months agoEX-7601 osc: calculate compressed size reduction accurately
Patrick Farrell [Mon, 20 Nov 2023 02:50:21 +0000 (21:50 -0500)]
EX-7601 osc: calculate compressed size reduction accurately

Compression reduces space used if it results in allocating
at least one fewer block on disk.  Modify the checks in
compress_chunk to reflect this, rather than using the
simpler "reduce size by at least 4K" calculation.

Also do not attempt to compress chunks if they are less
than 4K in size, since they can't possibly get a space
benefit.

This improved my measured ratio on a version of the Linux
kernel source data set from 1.24 to 1.56, so this is
significant for datasets with many small files.  (This
version of the source had large incompressible files
removed, to focus on smaller files.  The unmodified data set
would not improve as much.)

Note this is still short of our estimates, so either the
estimate or Lustre still needs adjustment.  TBD.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I815706914b88de4f532a674d773769aa3a64d218
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53181
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: series description and reorder declarations
Patrick Farrell [Tue, 31 Oct 2023 15:11:01 +0000 (11:11 -0400)]
EX-7601 ofd: series description and reorder declarations

Reorder declarations in tgt_brw_read prior to adding things.

This trivial patch is a good place to put the description
of this series, which handles unaligned reads to compressed
files.

-----------------------
These patches handle compression chunk unaligned reads on
the server.

When using compression, the client attempts to send chunk
aligned reads, but sometimes it can't, and the client will
send a read to the server which is not chunk aligned.

In this case, the server must read the full chunk,
decompress it, and provide the requested data to the client.

Here's how we do this.

The server receives a set of remote niobufs describing IO
from the client.  Each remote niobuf (rnb) describes a range
of data the client wants to do IO to.

These are translated to a set of local niobufs on the
server, which we then use to do the read.  For compression,
the server has to read complete chunks on unalinged reads.

So we walk these remote niobufs and identify unaligned read
requests (in ofd_preprw_read), then round them to chunk
size. The server then reads the chunk rounded read request
from storage.

The local niobufs now contain a set of complete compressed
chunks, ie, the raw data from disk.  We need to decompress
the chunks where the client is doing an unaligned read, but
leave the other chunks compressed (because the client will
uncompress them).

So, in obd_decompress_read, we use the remote niobuf to
identify unaligned reads from the client.  We then walk the
local niobufs, identify the chunks which match the unaligned
reads from the client, and decompress them 'in place'.
The decompression uses temporary buffers, but the
decompressed data is placed back in the local niobuf.
(If the data is uncompressed on disk, we of course do not
decompress it.  This happens for incompressible data.)

Now the local niobuf contains some raw chunks and some
chunks which have been decompressed.  This is *more* data
than the client asked for.  Normally, the server local
niobuf contains exactly what the client asked for, so the
server checksums and sends the entire local niobuf.  But
because we read complete chunks, the local niobuf contains
more data than the client requested.

This means we need to identify the subset of the local
niobuf which the client actually wants to read and present
that to the client.

In order to do that, we walk the local niobuf and use the
remote niobufs (the description of the pages the client
needs) and create a special tx niobuf which points to only
the pages the client wants (io_lnb_to_tx_lnb).  Then we use
this tx niobuf for checksum and transfer to the client.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic89dcef7e169879725caa6cdef4619b9a76b2b37
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52915
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 ofd: rename map_remote_to_local
Patrick Farrell [Fri, 27 Oct 2023 19:01:06 +0000 (15:01 -0400)]
EX-7601 ofd: rename map_remote_to_local

osd_map_remote_to_local implies some complex role, but in
fact what this does is initialize the fields of the
local niobuf structs to represent the requested range.

This *may* be the same as a remote niobuf, but it also
isn't in some cases.  Name it osd_init_lnbs instead.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0d3b5f24e42ee8dc962437daea7cf9347ccb9059
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52861
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 ofd: rename 'local' in thread_big_cache
Patrick Farrell [Sat, 28 Oct 2023 17:48:04 +0000 (13:48 -0400)]
EX-7601 ofd: rename 'local' in thread_big_cache

It's not a big deal since it's only used a few times, but
let's give this variable a descriptive name.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ide136cd42e885d59f1a2e4ce22a2e7449faca3f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52874
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 ofd: do not overwrite rc in unmerge_chunk
Patrick Farrell [Sat, 28 Oct 2023 20:30:56 +0000 (16:30 -0400)]
EX-7601 ofd: do not overwrite rc in unmerge_chunk

unmerge_chunk should not be responsible for setting the
lnb rc, because this overwrites the result of any previous
activity on the lnb.  Plus, unmerge_chunk can't fail.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Id1ce590c7f1da3ab7faddbd685d264a33c08d639
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52876
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: allow multiple chunks in read
Patrick Farrell [Thu, 16 Nov 2023 23:26:26 +0000 (18:26 -0500)]
EX-7601 osc: allow multiple chunks in read

It's rare, but reads can sometimes have multiple
discontiguous chunks.  Update decompress_request to
handle this case.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I880af95db285dce76db3610e8140a0f54baa401b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53159
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: rename pages_in_chunk
Patrick Farrell [Thu, 16 Nov 2023 20:37:57 +0000 (15:37 -0500)]
EX-7601 osc: rename pages_in_chunk

Chunks can have variable numbers of pages in them.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If199d777367569e62c21305f6e4b9f3e4cce6d06
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53158
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 obd: move type switching to alloc_compr callers
Patrick Farrell [Sat, 11 Nov 2023 20:39:22 +0000 (15:39 -0500)]
EX-7601 obd: move type switching to alloc_compr callers

The code is much cleaner if we can eliminated applied type
and handle that issue once per compression or decompression
rather than for every chunk.  This requires moving the type
switching inside alloc_compr.  (Also improve some error
messages - alloc_compr can fail with ENOMEM as well.)

The compression code currently allocates a transform for
every chunk on the client.  This is relatively cheap, but
it also complicates the code by repeatedly checking if a
particular compression type is supported (this is the
"applied type" code).

Moving alloc_compr to compress/decompress request makes the
code much simpler.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I162e81577db721a9715d57b3f262fcabbcbf308a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53103
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8590 lipe: Use only one client for the test
Alexandre Ioffe [Sun, 26 Nov 2023 19:42:14 +0000 (11:42 -0800)]
EX-8590 lipe: Use only one client for the test

Use only one client machine for hot-pools tests 75a, b, c.

Test-Parameters: trivial testlist=hot-pools
Test-Parameters: trivial testlist=hot-pools env=ONLY=75a
Test-Parameters: trivial testlist=hot-pools env=ONLY=75b
Test-Parameters: trivial testlist=hot-pools env=ONLY=75c
Test-Parameters: trivial testlist=hot-pools env=ONLY=75a
Test-Parameters: trivial testlist=hot-pools env=ONLY=75b
Test-Parameters: trivial testlist=hot-pools env=ONLY=75c
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Icfa958474ec928faeec63029a2d5983cea650bb7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53240
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8466 tests: limit 'cmp' output in sanity-pcc.sh
Andreas Dilger [Sat, 25 Nov 2023 05:27:36 +0000 (22:27 -0700)]
EX-8466 tests: limit 'cmp' output in sanity-pcc.sh

Limit the number of lines printed by 'cmp' when there is an error
comparing two files.  Often the files are multiple MB in size, and
printing 1-32M lines of output when the test fails is not useful.

Instead, print the first 66000 lines of output by default, which is
enough to see a full 64KiB plus some lines to see if more than 64KiB
of data is incorrect.  This is controlled by the CMP_LINES variable.

Test-Parameters: trivial testlist=sanity-pcc
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I80f4d5d3460d531ab63788185a2c88e79415a801
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53239
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
18 months agoLU-17312 tests: skip conf-sanity test_53 in interop
Andreas Dilger [Fri, 24 Nov 2023 07:48:44 +0000 (00:48 -0700)]
LU-17312 tests: skip conf-sanity test_53 in interop

Skip conf-sanity test_53 in interop because older servers cannot
stop any running service threads above threads_max.

Remove old test interop for servers < 2.3.

Lustre-change: https://review.whamcloud.com/53226
Lustre-commit: TBD (from d029a1cb45ac440e580c177866f0e9766444d8f1)

Test-Parameters: trivial testlist=conf-sanity
Test-Parameters: testlist=conf-sanity env=ONLY=53 serverversion=EXA5
Fixes: 183cb1e3cd ("LU-947 ptlrpc: allow stopping threads above threads_max")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia95405060c607c7a070720ed32a7a43b1c3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53227
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
18 months agoEX-7600 osd: save compressed object size on zfs
Artem Blagodarenko [Thu, 2 Nov 2023 22:20:52 +0000 (22:20 +0000)]
EX-7600 osd: save compressed object size on zfs

"osc: save compressed object size" added means to transfer
object size to the osd and added ldiskfs support.

This patch adds saving objec size to the ZFS backend.
Currently this fix submitted as separete patch, for
testing purpouse, but can be marged to the main patch latter.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Test-Parameters: trivial testlist=sanity fstype=zfs
Change-Id: I99e29e3f756a070b5f3cece12c4ca58f668a2ecf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8671 tests: use smaller files in sanity-pcc/103+104
Andreas Dilger [Thu, 23 Nov 2023 01:29:08 +0000 (18:29 -0700)]
EX-8671 tests: use smaller files in sanity-pcc/103+104

Running fallocate is fast, but the actual PCC data copy may be slow.
Use smaller test files for sanity-pcc test_103 and test_104 to speed
up testing, and also wait longer in case the copy is slow.

Add some extra debugging on failure so we can see the file attach
state on failure, in case there is something wrong with the parsing.

Test-Parameters: trivial testlist=sanity-pcc
Test-Parameters: testlist=sanity-pcc env=ONLY=103,ONLY_REPEAT=100
Test-Parameters: testlist=sanity-pcc env=ONLY=104,ONLY_REPEAT=100
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I09f159810a778b8ef2bab93d0e2869237a3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53212
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
18 months agoEX-7601 osd: osd_bufs_put does not always handle all pages
Patrick Farrell [Wed, 18 Oct 2023 17:40:51 +0000 (13:40 -0400)]
EX-7601 osd: osd_bufs_put does not always handle all pages

osd_bufs_put asserts that the dio pages used after are
always zero, but there's no reason for this to be true and
compression specifically violates this by using 1 page at
a time.

Without this patch, we hit this assert and crash when
nonrotational = 1.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If6bdb11f254c260e2da4cabe11a82693a468e6fb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52750
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
18 months agoLU-17251 osp: start OST object precreate earlier
Andreas Dilger [Sun, 26 Nov 2023 05:58:09 +0000 (22:58 -0700)]
LU-17251 osp: start OST object precreate earlier

If the OST object precreate count gets large (usually due to high
MDT file create workload, but sometimes also forced during testing)
then send an OST_CREATE RPC sooner when the number of precreated
objects gets low.

Currently the MDS will wait until 1/2 of the precreated OST objects
are consumed, but if create_count = 10000, then this can put bursty
create workloads on the OST.  Instead, send an OST_CREATE RPC when
the precreate pool is at most 1024 objects below target, so that the
MDS keeps its precreated pool more full and the OST doesn't have to
create so many objects at once (which also locks object directories
for a longer time).

Don't set opd_force_creation=true when osp.*.create_count is set
larger, and instead rely on the improved precreate check to force
OST object creation to start sooner, as opd_force_creation=true
can cause the OSP precreation to stop completely in some cases.

Lustre-change: https://review.whamcloud.com/53245
Lustre-commit: TBD (from 6ffb849d7086a2b2ae48f274d4f5b1b8fbf83fe2)

Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity env=ONLY=1-130,HONOR_EXCEPT=y
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Test-Parameters: testlist=parallel-scale env=ONLY=rr_alloc,ONLY_REPEAT=10
Fixes: df5b4c0a8b ("LU-17251 osp: force precreate if create_count grows")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id2d12636d535485919ca5eec3adb18b1e6ce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53244
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn117
Andreas Dilger [Fri, 24 Nov 2023 09:35:29 +0000 (02:35 -0700)]
RM-620 build: New tag 2.14.0-ddn117

New tag 2.14.0-ddn117

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ida37fd66ddfd7331efbc3a2276ddaf0f574f5de5

18 months agoLU-16468 llite: protect layout before read IO going
Bobi Jam [Fri, 13 Jan 2023 04:36:01 +0000 (12:36 +0800)]
LU-16468 llite: protect layout before read IO going

It's possible that the before the read IO, file_read_confine_iter()
->lov_attr_get() to get proper kms (known minimum size of the file),
and lov_attr_get() presumes that it's called under ongoing IO, which
protected the layout from changing, while it's not in this case.

Lustre-change: https://review.whamcloud.com/49622
Lustre-commit: from e050b91c6c471d3576eba3bbf4f3c31aef644e3f

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I1b36ec6e158331e63e8026ee2b986d5a7e3cb6dc
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/49623
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8386 lipe: Remove cruft from systemd services
Nathaniel Clark [Wed, 15 Nov 2023 21:09:10 +0000 (16:09 -0500)]
EX-8386 lipe: Remove cruft from systemd services

Remove After=rust-iml-agent.service

rust-iml-agent is deprecrecated and not longer needed.

Change-Id: Icd0e79dbd417e98beb07f8546487d20fa5f6bb62
Signed-off-by: Nathaniel Clark <nclark@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53152
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8282 lfs: migrate compressed file without stripe info
Bobi Jam [Wed, 8 Nov 2023 10:41:21 +0000 (18:41 +0800)]
EX-8282 lfs: migrate compressed file without stripe info

lfs migrate file without specifying stripe info will get layout info
from the file as the target layout template, and
llapi_layout_get_by_xattr() tries to convert LOV_PATTERN_* values
to user scope LLAPI_LAYOUT_* values, while LOV_PATTERN_COMPRESS
is missed in this conversion.

This patch add a function llapi_pattern_from_lov() to handle this
conversion specifically.

This patch also add more error messages for llapi_layout_file_open().

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I49a43cc7761cd2baed7a5da7d4e7cff2152ff9bb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53039
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: remove &pga usage in compress_request
Patrick Farrell [Tue, 21 Nov 2023 17:34:31 +0000 (12:34 -0500)]
EX-7601 osc: remove &pga usage in compress_request

The usage of 'pga' and '&pga' in compress_request is
confusing, but also, compress_request modifies &pga by
allocating a new compressed page array.  Except if we fail
in compress_request, we free that new page array.

This means failing in compress_request replaces 'pga' with
a pointer to freed memory.  Instead, create an explicit
cpga pointer in the caller and use that.  This allows
compress_request to fail safely.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Idaf592103c57b0e9ce76ab520a69b819d4f37be9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53120
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: give compress_request explicit success
Patrick Farrell [Tue, 21 Nov 2023 17:33:06 +0000 (12:33 -0500)]
EX-7601 osc: give compress_request explicit success

Compress_request has explicit failure handling, but the
success handling just follows the failure handling.  This is
confusing - on failure, we do:
page_count = *pcount
then immediately do:
*pcount = page_count

It also sets *orig_pga = pga on success OR failure, which
is wrong because compress_request may have modified pga and
then failed.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I121ec71cfe35babc4a572951e93f7581887ade80
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53119
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: rearrange compress_request
Patrick Farrell [Tue, 21 Nov 2023 17:32:20 +0000 (12:32 -0500)]
EX-7601 osc: rearrange compress_request

A trivial rearrangement of compress_request to make it
more readable before redoing the core logic.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I1d34cd2a2a6d84bc30cc7dae8eb07586c4837f7d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53110
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: replace assert with error
Patrick Farrell [Mon, 13 Nov 2023 04:20:01 +0000 (23:20 -0500)]
EX-7601 osc: replace assert with error

We shouldn't assert on values read from storage, instead if
they are incorrect, we should give EIO.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Icda213e3c5a90a848c9b008788e92ee49e2efcb1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53108
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: variable cleanup in decompress_req
Patrick Farrell [Mon, 13 Nov 2023 04:13:10 +0000 (23:13 -0500)]
EX-7601 osc: variable cleanup in decompress_req

Use type and lvl variables in decompress_request.

Remove an unused variable and an assert which can never
fire.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ieff57411a2a41215fd368d731614801bd0f43e38
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53107
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 obd: move module load to function
Patrick Farrell [Sat, 11 Nov 2023 20:21:01 +0000 (15:21 -0500)]
EX-7601 obd: move module load to function

This is a trivial code change to make alloc_compr a bit
shorter.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I0a790afe7afebde1d223420d9a578529da6ff7e5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53102
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: make compress_chunk take chunk_bits
Patrick Farrell [Fri, 10 Nov 2023 22:21:52 +0000 (17:21 -0500)]
EX-7601 ofd: make compress_chunk take chunk_bits

Chunk bits is used everywhere, have compress_chunk convert
to log bits rather than have the callers do it.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ic01bb749425cb95d9c5717965d692a18138ceeb7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53100
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: cleanup compression variables
Patrick Farrell [Fri, 10 Nov 2023 22:19:00 +0000 (17:19 -0500)]
EX-7601 osc: cleanup compression variables

Make usage of the compression variables more readable.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I6daff56b56877c8f36e02303cc0579ba7faa731b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53099
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: rename 'done'
Patrick Farrell [Fri, 10 Nov 2023 22:10:34 +0000 (17:10 -0500)]
EX-7601 osc: rename 'done'

Rename the ambiguous 'done' and remove it where not used.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8fb88b7a91fcc7dbd5ce2d29a61c18330fc0cda3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53098
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: rename pages_in_chunk
Patrick Farrell [Fri, 10 Nov 2023 22:07:39 +0000 (17:07 -0500)]
EX-7601 osc: rename pages_in_chunk

Use the more standard pages_per_chunk.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I47e0995fe8aa8d1a9a610669d6cd4c39559b6fa4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53097
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7600 osc: use pages_left in unmerge_chunk
Patrick Farrell [Sun, 12 Nov 2023 19:52:28 +0000 (14:52 -0500)]
EX-7600 osc: use pages_left in unmerge_chunk

Since we have compressed chunks < chunk_size (if they're
after EOF), we must use pages_left in unmgerge_chunk or it
will go off the end of the page array.

This also lets us remove the workaround where unmerge_chunk
would skip pages that were not present.  unmerge_chunk
always works with a known and complete set of pages, so this
check is unneeded.

We should also check that our count of bytes is correct
when we finish.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I88896307990ff839514e54e9a7e18390a457e5d8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53095
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 osc: only set compressed flag on compressed pages
Patrick Farrell [Mon, 13 Nov 2023 16:18:49 +0000 (11:18 -0500)]
EX-7601 osc: only set compressed flag on compressed pages

The code accidentally sets the compressed flag on all
pages processed through fill_cpga, even if they're not
compressed.  Oops.

Also stop setting pg->index on the pages in the compressed
pga, this is only used by encryption and that's no longer
supported with compression.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I313fd943a18b71cd52493852a6884f30d187e52f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53118
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
18 months agoEX-7601 osc: remove cpga fill bits
Patrick Farrell [Fri, 10 Nov 2023 19:07:07 +0000 (14:07 -0500)]
EX-7601 osc: remove cpga fill bits

cpga fill bits are not needed now that we don't support
compression and encryption.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I13c2278e085e9b288bd896585947e28e2ea505ca
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53082
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 ofd: add obd level compression lib
Patrick Farrell [Wed, 1 Nov 2023 21:07:23 +0000 (17:07 -0400)]
EX-7601 ofd: add obd level compression lib

Some compression functions will be used by several areas of
of Lustre, so they need to be in obdclass.

This moves merge_chunk and unmerge_chunk there and adds the
ability for them to merge lnbs.  This is used in a future
patch.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: If4a318119bb7685e41adb9f3b31a66074031e6ac
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7601 llite: restrict readahead to eof
Patrick Farrell [Tue, 14 Nov 2023 22:58:16 +0000 (17:58 -0500)]
EX-7601 llite: restrict readahead to eof

Compressed file readahead rounding needs to come before
readahead is limited to EOF.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I4e9e7fe63301c08efcb05f170726735593a9431d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53137
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16032 tests: restore delay_unlink_mb in sanity/360
Andreas Dilger [Thu, 23 Nov 2023 22:56:00 +0000 (15:56 -0700)]
LU-16032 tests: restore delay_unlink_mb in sanity/360

Restore the original value of osd-ldiskfs.*.delay_unlink_mb after
sanity test_360 is finished, so that it doesn't have an impact on
later tests running, in particular sanity-quota.sh was seeing some
delay in freeing quota for files that were just deleted.

Lustre-change: https://review.whamcloud.com/53218
Lustre-commit: TBD (from 8fa0580fd64fe7cbe969817ece87a161c517c4c3)

Test-Parameters: trivial testlist=sanity-quota
Fixes: a772e90243 ("LU-16032 osd: move unlink of large objects to separate thread")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7c1ab02262afdef2fc51f9fbc3932d954a4f8304
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53219
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-15777 hsm: set changelog error for restore layout swap failure
Nikitas Angelinas [Wed, 11 May 2022 22:54:08 +0000 (15:54 -0700)]
LU-15777 hsm: set changelog error for restore layout swap failure

Set the error code in the changelog record generated, if the layout swap
fails at the end of an HSM restore operation. Also, handle error code
overflow inside hsm_set_cl_error(), so that callers don't need to do
this themselves.

Lustre-change: https://review.whamcloud.com/47121
Lustre-commit: 09fe64719b888cd212b6cffe923545b7650f230f

Suggested-by: Olaf Weber <olaf.weber@hpe.com>
Suggested-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Change-Id: I4ed2ebffa3bc1c6a0f87ea9f13734e344f77006f
HPE-bug-id: LUS-10863
Test-Parameters: testlist=sanity-hsm,sanity-pcc
Reviewed-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53213
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17115 quota: fix race of acquiring locks in qmt
Hongchao Zhang [Thu, 26 Oct 2023 12:46:44 +0000 (20:46 +0800)]
LU-17115 quota: fix race of acquiring locks in qmt

In qmt_delete_qid and qmt_reset_qid, the order to require
the lock of lquota_entry and journal is different from that
in qmt_dqacq0, which could cause deadlock in some cases.

Lustre-change: https://review.whamcloud.com/52371
Lustre-commit: ee0e9447e7022e2caa8b161657d505e17ccdc4a1

Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ic439f2c5d6ca22429422b87f0dde65e0d2e6113d
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoLU-16097 quota: release preacquired quota when over limits
Hongchao Zhang [Thu, 19 Oct 2023 06:33:47 +0000 (14:33 +0800)]
LU-16097 quota: release preacquired quota when over limits

The pre-acquired quota on each MDT or OST should be released when
the whole quota is over limits, for instance, after the quota limits
had been decreased for some quota ID by Administrator.

Lustre-change: https://review.whamcloud.com/48576
Lustre-commit: 57ac32a22372065b789ca491a568f075e755d339

Test-Parameters: testlist=sanity-quota
Test-Parameters: testlist=sanity-quota
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6263b835d4ae6a3fd03f9a2bc4f463949cbc74d4
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17142 mgc: reconnection without pinger
Alexander Boyko [Tue, 22 Aug 2023 09:53:14 +0000 (05:53 -0400)]
LU-17142 mgc: reconnection without pinger

When MGS was offline for some time, AT is increased and
connection request deadline is high. Reconnect with a pinger
waits a request deadline for a next attempt. A situation is
worse with a failover partner, when different connections are used.
Reconnection could fail with local MGS too.

Here is the error when MGC could not connect to a local MGS, MDT
combined with MGS.

    LustreError: 15c-8: MGC90@kfi:
    Confguration from log kjlmo12-MDT0000 failed from MGS -5.

The patch forces reconnection with import invalidate and aborts
inflight requests.

ptlrpc_recover_import() aborts waiting for disconnect import state.
But disconnect happens between connection attempt and it is valid.
This is fixed.

Reset Adaptive Timeout when local MGS starts. It allows MGC to
reconnect efficiently.

mgs_barrier_gl_interpret_reply() should handle -EINVAL from a client,
it means client don't have a lock.

Lustre-change: https://review.whamcloud.com/52498
Lustre-commit: 867ba433e3a0fce4a1b2f8d37a91d550ada41a26

HPE-bug-id: LUS-11633
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ie631e04fb3e72900af076cf7f268f20f7b285445
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53116
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
18 months agoRM-620 build: New tag 2.14.0-ddn116
Andreas Dilger [Wed, 22 Nov 2023 21:11:28 +0000 (14:11 -0700)]
RM-620 build: New tag 2.14.0-ddn116

New tag 2.14.0-ddn116

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iaf3d0d8a468b44c0bd179bc729fc66483cb45581

18 months agoRM-620 build: New tag 2.14.0-2.14.0-ddn116
Andreas Dilger [Wed, 22 Nov 2023 21:10:48 +0000 (14:10 -0700)]
RM-620 build: New tag 2.14.0-2.14.0-ddn116

New tag 2.14.0-2.14.0-ddn116

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I752cf0dfd78de778fe34787b2e026fec0277f610

18 months agoEX-8236 pcc: abort data copy via ll_fid_path_copy
Qian Yingjin [Fri, 10 Nov 2023 09:23:46 +0000 (04:23 -0500)]
EX-8236 pcc: abort data copy via ll_fid_path_copy

For data copying via ll_fid_path_copy in direct I/O mode in user
space, the client calls llapi_pcc_state_fd() to obtain the file
PCC state. If it is marked with PCC_STATE_FL_ATTACH_ABORTING, the
data copy process ll_fid_path_copy exits immediately.
To reduce the overhead of these check, we do not check for each
data copy iter, instead, we do a check for certain times of I/Os
(32 times by default). For I/O size of 32MiB, it will be checking
1 times per second at 1GiB/s. There should be some time-lag
before the copy tool quits finally.

Change-Id: I20631e5481a7e97d7a1ed0729bcd269ef6248a2c
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-7331 csdc: prohibit set compression upon encrypted file
Bobi Jam [Fri, 10 Nov 2023 09:17:50 +0000 (17:17 +0800)]
EX-7331 csdc: prohibit set compression upon encrypted file

Setting compression layout component upon encrypted file is not
allowed for now.

This patch add this check on MDS in creating file with layout,
adding/merging new mirror to existing file.

Test-Parameters: testlist=sanity-sec env=ONLY=67,PTLDEBUG=-1
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I60d9f4bfce3a498f1eb3994c6276afb9d89c99a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53075
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoEX-8584 tests: check and wait lpcc_purge scanning ends
Lei Feng [Fri, 17 Nov 2023 07:53:21 +0000 (15:53 +0800)]
EX-8584 tests: check and wait lpcc_purge scanning ends

check lpcc_purge status to make sure it finishs at least
one round of scanning.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial testlist=sanity-pcc env=ONLY="200 201 202",ONLY_REPEAT=50
Change-Id: I8e6f50393d1a3cbb7a1bc976942631db6ecceb67
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53167
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-16284 utils: lfs getstripe follows symlink
Lei Feng [Tue, 1 Nov 2022 02:57:39 +0000 (10:57 +0800)]
LU-16284 utils: lfs getstripe follows symlink

'lfs getstripe' prints the information of symlink target by default.
With '--no-follow' option it prints the information of symlink itself.

Lustre-change: https://review.whamcloud.com/49003
Lustre-commit: af32b516593dbf2a8e7a85d885c33fd017926ada

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I6cef01af5bb2235bdcbf0b5c99af4b9ed5869515
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53139
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
18 months agoLU-17275 kernel: RHEL 8.9 client support
Jian Yu [Mon, 20 Nov 2023 22:32:40 +0000 (14:32 -0800)]
LU-17275 kernel: RHEL 8.9 client support

This patch makes changes to support RHEL 8.9 release
with kernel 4.18.0-513.5.1.el8_9 for Lustre client.

Lustre-change: https://review.whamcloud.com/53071
Lustre-commit: TBD (from 0da16c715a06b6426a6b99c111147fc875784e85)

Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-1

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-2

Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-3

Change-Id: Ia3672d134534b877bb6aaffb4cea0339bc55974f
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53089
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>