Whamcloud - gitweb
fs/lustre-release.git
15 months agoLU-17034 quota: lqeg_arr memmory corruption
Sergey Cheremencev [Fri, 25 Aug 2023 06:22:26 +0000 (10:22 +0400)]
LU-17034 quota: lqeg_arr memmory corruption

Fix memory corruption caused by accessing memory
out of array lqeg_arr. It could happen when at least
one of OSTs has index larger than the whole number
of OSTs. For example, if the system has 4 OSTs with
indexes 0001, 0002, 00c9, 00ca. This issue more often
corrupted bucket_table in obd_uuid_hash or obd_nid_hash
causing to crash rhashtable code. However, it could
be the reason of other panics depending on the type
of corrupted neighbour memory region.

This patch adds an lge_idx field to each lqe global entry
to store index of the OST. It is needed to map OST index
to the array index to avoid out-of-bound array access.

This patch also add locking to protect lqe_glbl_data in
qmt_set_revoke and qmt_clear_lgeg_arr_nu. This was
forgotten in 50ff4d1da6.

This patch begins to store all connected MDTs in the quota
global pool. Thus handling MDTs beginning from this patch
is the same with OSTs stored in the global pool. It is the
1st step to introduce MDT pools.

Add conf-sanity_33c that reproduces mentioned memory
corruption without the fix.

Lustre-change: https://review.whamcloud.com/52094
Lustre-commit: 67f90e42889ff22d574e82cc647f6076e48c65a5

Fixes: 50ff4d1da6 ("LU-16772 quota: protect lqe_glbl_data in qmt_site_recalc_cb")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Id6e4bcde09d9f32726d69f711eedb82729a2266e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53810
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-17034 revert: "quota: tmp fix against memory corruption"
Sergey Cheremencev [Thu, 18 Jan 2024 19:03:50 +0000 (22:03 +0300)]
LU-17034 revert: "quota: tmp fix against memory corruption"

This reverts commit fdcb1144c95908bbbd0216ec931ac5f222f484a7
as it was a temporary solution. Instead of that will be landed
"LU-17034 quota: lqeg_arr memmory corruption".

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I6c057ff7e0f9c8789190c51c14fc370afe0c703c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53809
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-17334 lmv: handle object created on newly added MDT
Lai Siyao [Thu, 7 Dec 2023 12:39:09 +0000 (07:39 -0500)]
LU-17334 lmv: handle object created on newly added MDT

When a new MDT is added to a filesystem without no_create, then a new
object is created on the MDT relatively quickly after it is added to
the filesystem, in particular because the new MDT would be preferred
by QOS space balancing due to lots of free space. However, it might
take a few seconds for the addition of the new MDT to be propagated
across all of the clients, so there is a risk that one client creates
a directory on an MDT that a client is not yet aware of, which returns
an error to the application immediately.

This patch fixes the issue by adding lmv_tgt_retry() that will retry
to use the MDT and wait for some number of seconds for the filesystem
layout to be updated if the MDT index an existing file/directory is
not found.

Commands that depend on user input, like 'lfs mkdir -i' and 'lfs df'
and round-robin MDT allocation will continue to use lmv_tgt() which
doesn't retry in case user specifies wrong MDT index, otherwise it can
hang the command for an extended period of time.

Lustre-change: https://review.whamcloud.com/53363
Lustre-commit: 94a4663db95656ade6b6e695b849cd7763f0bd49

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Idb0cf65e95f665628d6799298732b7a06cde4a86
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54018
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-17469 llite: hold object reference in IO
Bobi Jam [Mon, 22 Jan 2024 12:14:56 +0000 (20:14 +0800)]
LU-17469 llite: hold object reference in IO

There could be a race between page write and inode free, hold
a cl_object reference during the IO lest accessing freed object.

Lustre-change: https://review.whamcloud.com/53819
Lustre-commit: TBD (from a84242bc202e402664a5f5d7461b66c770896851)

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ic70cc27430e68265aba0662fc68e9bfe2f86cfe1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53760
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <paf0187@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoDDN-4630 sec: protect against concurrent mi_ginfo change
Sebastien Buisson [Thu, 22 Feb 2024 12:44:57 +0000 (13:44 +0100)]
DDN-4630 sec: protect against concurrent mi_ginfo change

With the INTERNAL upcall mechanism, we put in the upcall cache the
groups received from the client, by appending them to a list built
from previous requests.
An existing entry is never modified once it is marked as VALID, it is
replaced with a new one, with a larger groups list. However, the group
info associated with an entry can change when updated from NEW to
VALID. This means the number of groups can only grow from 0 (group
info not set) to the current number of collected groups.
In case of concurrent cache entry update, we need to check the group
info and start over adding the groups associated with the current
request.

Fixes: 4515e5365f ("LU-17015 build: rework upcall cache")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ie7088bdbfcae396602b59e2ab07fbfbbb14d96af
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54146
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-16297 ptlrpc: don't panic during reconnection
Alexander Boyko [Thu, 3 Nov 2022 11:23:20 +0000 (07:23 -0400)]
LU-16297 ptlrpc: don't panic during reconnection

ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks

(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||

reconnect changes import state and flags
and second part

(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))

MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.

Lustre-change: https://review.whamcloud.com/49029
Lustre-commit: df31c4c0b39b8845911344e6fadc008bcba40bb1

HPE-bug-id: LUS-10985
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ifc9e413c679c3e8a4c8f4f541251bebabae41c82
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54086
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-16281 clio: append to non-existent component
Vitaly Fertman [Tue, 5 Jul 2022 21:00:58 +0000 (00:00 +0300)]
LU-16281 clio: append to non-existent component

should return an error, but it fails now with a BUG below
because @rc of lov_io_layout_at() is not checked for < 0

    stripe_width()) ASSERTION( index < lsm->lsm_entry_count ) failed:
    BUG: unable to handle kernel paging request at ffff99d3c2f74030
    Call Trace:
      lov_stripe_number+0x19/0x40 [lov]
      lov_page_init_composite+0x103/0x5f0 [lov]
      ? kmem_cache_alloc+0x12e/0x270
      cl_page_alloc+0x19f/0x660 [obdclass]
      cl_page_find+0x1a0/0x250 [obdclass]
      ll_write_begin+0x1f7/0xfb0 [lustre]

Lustre-change: https://review.whamcloud.com/48994
Lustre-commit: 8fdeca3b6faf22c72f6687aa23b86715d39ceeb1

HPE-bug-id: LUS-11075
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Change-Id: I4371f56cd9cdb3429d52a283831fb0a768e5c9c3
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54133
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
15 months agoLU-14441 mdc: check/grab import before access
Alex Zhuravlev [Mon, 13 Dec 2021 08:27:42 +0000 (11:27 +0300)]
LU-14441 mdc: check/grab import before access

to ensure the import doesn't disappear while being accessed
via procfs.

Lustre-change: https://review.whamcloud.com/41681
Lustre-commit: b8416320b381ae8a6fdd058b0a09ea42ce56d573

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I005c96b349e55646996fd0d265ab4dd1e2b9a1fa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54126
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-17484 gss: reply error for SEC_CTX_INIT on wrong node
Sebastien Buisson [Thu, 8 Feb 2024 12:44:21 +0000 (13:44 +0100)]
LU-17484 gss: reply error for SEC_CTX_INIT on wrong node

When a server receives a SEC_CTX_INIT request for a target that is not
available (either stopping, or not set up yet, or moved to a failover
node), the request gets dropped. This makes the client-side RPC time
out, increasing the time it takes to establish a proper gss context
with the target, because it slows down the HA mechanism that tries
alternate failover NIDs.
Instead of dropping the request reply for SEC_CTX_INIT, the server
needs to send back a proper error reply. The client will then be able
to immediately try alternate failover NIDs, speeding mount/reconnect
process up, and avoiding potential eviction.

Lustre-change: https://review.whamcloud.com/53970
Lustre-commit: 3d635dd3f24421c181aca5673cd81ed8f3e2c622

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Id2cefaa7d54729a63c7be13b65d7ace579bcaa78
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54157
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-17528 gss: cleanup gss api usage
Sebastien Buisson [Thu, 15 Feb 2024 08:58:16 +0000 (09:58 +0100)]
LU-17528 gss: cleanup gss api usage

The lucid context support has been available from at least
krb5 1.7, and even RHEL7 ships with a more recent version.
So drop support for non-lucid api, and cleanup gss api usage.

Lustre-change: https://review.whamcloud.com/54063
Lustre-commit: 79a2d8645a28de77c7406ba56889d3a0749b851c

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I91fb706d2444c199156423b57a8c1ef24a0c3420
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54156
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-17535 gss: fix lsvcgssd crash in krb lib
Bruno Faccini [Tue, 13 Feb 2024 11:14:40 +0000 (12:14 +0100)]
LU-17535 gss: fix lsvcgssd crash in krb lib

This patch fixes some logic around the need to call
gss_delete_sec_context() or not vs kerberos implementations.

snd->ctx address instead of value should be passed to
serialize_context_for_kernel()/serialize_krb5_ctx() to
allow each implementation to clear it with GSS_C_NO_CONTEXT
if it has been destroyed internally, and cases where not
can also be handled in handle_krb() now.

Lustre-change: https://review.whamcloud.com/54023
Lustre-commit: f2705c4ec5598ca244bbb08673a1cfefd7342812

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Bruno Faccini <bfaccini@nvidia.com>
Change-Id: I752712168a2c0f0a5a7a496b851d4cddbb7e4236
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54155
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
15 months agoLU-17226 build: create config option for l_getsepol
Gian-Carlo DeFazio [Thu, 16 Nov 2023 23:05:45 +0000 (15:05 -0800)]
LU-17226 build: create config option for l_getsepol

Add a configuration option for l_getsepol.
l_getsepol is build by default unless the --disable-l_getsepol
option is given to configure.
lustre.spec.in builds l_getsepol by default and has its
dependencies as build requirements.

The implicit configuration check for the dependency
openssl-devel is removed and replaced by a BuildRequires.

Lustre-change: https://review.whamcloud.com/52849
Lustre-commit: 2777adcabd1032ddb886f913fa04d82a292ab379

Test-Parameters: trivial
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: If71a2a4a524047edbd2b31e6fac7a42f36a030bf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54162
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoEX-9074 csdc: Provide finer grained enable_compression control
Artem Blagodarenko [Fri, 16 Feb 2024 16:50:08 +0000 (16:50 +0000)]
EX-9074 csdc: Provide finer grained enable_compression control

On all architectures other than aarch64 and ppc64le enable_compression
is now enabled by default. lfs warning message is gone.

To use CSDC on aarch64/ppc64le (on your own risk)
llite.*.enable_compression=1 should be set. lfs
set_stripe command still prints a warning message in this case.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Test-Parameters: trivial
Change-Id: Ic8edc5bbeb8f9a3cd34ad3fc4e8c78e59f4cc34f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Patrick Farrell <paf0187@gmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoEX-8993 ofd: do not write 'hole' pages on compression
Patrick Farrell [Tue, 13 Feb 2024 16:45:17 +0000 (11:45 -0500)]
EX-8993 ofd: do not write 'hole' pages on compression

When doing unaligned read-modify-write to a compressed file,
we must round the IO lnb used for write in order to read up
the compressed data for modification.

In some cases, this creates a situation where there are
pages in the write lnb which have no data in them.  It is
important not to write out these pages, because if we do,
this wastes space and can cause incorrect file size.

In most cases, the file size is covered by the client
sending the file size, but if the client does not compress
a particular write, it does not send the size and the server
does not use it.  We could resolve this by having the client
always send size info and have the server always use it, but
it's better to make server writes 'hole' aware, since this
improves space usage.  (And this will be required for the
server to do recompression on read-modify-write, otherwise
no space is gained.)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I66169e205fe4691ed03b2c9b3005ffc4ecd3213d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53595
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-17258 socklnd: stop connecting on too many retries
Serguei Smirnov [Wed, 7 Feb 2024 18:48:08 +0000 (10:48 -0800)]
LU-17258 socklnd: stop connecting on too many retries

If peer repeatedly rejects connection requests with EALREADY,
assume that it doesn't support as many connections as we're trying
to create. Make sure to stop connecting to the peer altogether and
either continue with already created connections if there's at least
one of each type, or fail.

This helps avoid the assertion:

"ASSERTION( (wanted & ((((1UL))) << (3))) != 0 ) failed"

Lustre-change: https://review.whamcloud.com/53955
Lustre-commit: 02caf7170762d97dac4f367651addc7d90b6eb32

Test-Parameters: trivial testlist=sanity-lnet
Fixes: 5afe3b053 ("LU-17258 socklnd: ensure connection type established upon race")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I6072e91cc36544fc2f56c91cd78f6637cf82ecbc
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54014
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT
Serguei Smirnov [Mon, 5 Feb 2024 23:27:15 +0000 (15:27 -0800)]
LU-17505 socklnd: return NETWORK_TIMEOUT to LNet on ETIMEOUT

Returning LNET_MSG_STATUS_LOCAL_TIMEOUT to LNet on ETIMEDOUT
causes LNet to only decrement the local NI health score,
while the issue may actually be with the remote NI.

Changing this to return LNET_MSG_STATUS_NETWORK_TIMEOUT
causes LNet to decrement both local NI and peer NI health.
If local NI is ok, it will recover its health score quickly,
but the affected peer NI health is lowered until peer NI is recovered.
This helps LNet select healthy NIs of the same peer in the meantime.

Lustre-change: https://review.whamcloud.com/53930
Lustre-commit: 099350d6e30218eb68d31cbfc7e9252a112e591f

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I916772477d1fd63571447262880a33830746f002
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53964
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
15 months agoLU-16752 test: improve sanity 413a/b reliability
Lai Siyao [Thu, 22 Feb 2024 18:46:12 +0000 (13:46 -0500)]
LU-16752 test: improve sanity 413a/b reliability

Set qos_maxage to 1 early in test_qos_mkdir() to ensure statfs are
updated in round-robin mkdir test, so that the subsequent QoS mkdir
behave as expected.

Lustre-change: https://review.whamcloud.com/54168
Lustre-commit: TBD (from f22e115c6a468452d4beb40c6530f4cc0627022b)

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Test-Parameters: mdscount=2 mdtcount=4 testlist=sanity
Fixes: 233344d451 ("LU-13417 test: generate uneven MDTs early for sanity 413")
Fixes: c1d0a355a6 ("LU-12624 lod: alloc dir stripes by QoS")
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I08f94b5b4e355ffff0704bd0f661bb99a82a9234
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54164
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoRM-620 build: New tag 2.14.0-ddn135
Andreas Dilger [Wed, 14 Feb 2024 19:22:38 +0000 (12:22 -0700)]
RM-620 build: New tag 2.14.0-ddn135

New tag 2.14.0-ddn135

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I61d16abd2cf5185d1e05b2e67fd5404adb9f23c9

16 months agoRM-620 build: New tag lipe-2.42
Andreas Dilger [Wed, 14 Feb 2024 19:22:15 +0000 (12:22 -0700)]
RM-620 build: New tag lipe-2.42

New tag lipe-2.42

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iaff6874778b27e6fa4a7be1f85fa0d47636190d6

16 months agoEX-9156 lipe: Print host name in SSH errors
Alexandre Ioffe [Thu, 8 Feb 2024 04:05:07 +0000 (20:05 -0800)]
EX-9156 lipe: Print host name in SSH errors

Improvement: Add host name in error messages

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I8f552d34d0445ab35d9b978b13b3989411f95cdb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53966
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoDDN-4630 sec: cleanup grouplist alloc for INTERNAL id upcall
Sebastien Buisson [Thu, 8 Feb 2024 17:17:52 +0000 (18:17 +0100)]
DDN-4630 sec: cleanup grouplist alloc for INTERNAL id upcall

With the INTERNAL identity upcall, we are using supplementary groups
provided by the client, by building an array of gid_t.
Cleanup this group list allocation, and make sure the size returned
matches the actual size of the allocated array.

Fixes: 4515e5365f ("LU-17015 build: rework upcall cache")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I72cdfc6b76bfd9c2832a5d5e5f72c3aa45cf1efe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53979
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8927 csdc: check ocd flag for fiemap/lseek
Bobi Jam [Tue, 6 Feb 2024 09:13:27 +0000 (17:13 +0800)]
EX-8927 csdc: check ocd flag for fiemap/lseek

Currently client will not sending fiemap/lseek request to OST if
the file is a compressed one. This patch will check the
OBD_CONNECT2_COMPRESS flags and send the request if OST supports
compression as server would do the fiemap/lseek check instead.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I478cbf161044165fa31d4caa2336e9949fc626fe
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53935
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8927 csdc: server reject FIEMAP/SEEK_HOLE|DATA on compr obj
Bobi Jam [Fri, 2 Feb 2024 07:15:30 +0000 (15:15 +0800)]
EX-8927 csdc: server reject FIEMAP/SEEK_HOLE|DATA on compr obj

Server return -EOPNOTSUPP if they get the FIEMAP and
SEEK_HOLE/SEEK_DATA requests upon compressed file objects.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I9f04fbb13a22cc83402d9989daab63a59367ff33
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-14111 obdclass: count eviction per obd_device
Aurelien Degremont [Tue, 13 Oct 2020 14:12:23 +0000 (14:12 +0000)]
LU-14111 obdclass: count eviction per obd_device

Add a new 'obd_eviction_count' counter to obd_device which
is increased every time a client is evicted, which means
every time we call `class_fail_export()`.

Expose this counter through `lctl get_param *.*.eviction_count`
for every target.

Only support recovery-small test 146 for 2.14.0.133+.

Lustre-change: https://review.whamcloud.com/40528
Lustre-commit: 3c69d46e1766480c0ffd1bef840b4e167b4cf88e

Lustre-change: https://review.whamcloud.com/52098
Lustre-commit: b034dd27dd39483e40f91ea82d3f5c62b514ec54

Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: I83b691662285cf2cd937187bffa54de6bd1f694c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17173 gss: user keys go to user keyring
Sebastien Buisson [Fri, 20 Oct 2023 08:27:14 +0000 (10:27 +0200)]
LU-17173 gss: user keys go to user keyring

Keys for root, that are used for Lustre internal processing, are
stored in the session keyring. That way they can be found by all
Lustre processes in userspace and in the kernel.
For end user keys, it is better to store them in the user keyring.
This simplifies key management, makes them shared accross all user
sessions, and avoids unfortunate key leak if lfs flushctx is not
called at user logout.

Lustre-change: https://review.whamcloud.com/52771
Lustre-commit: 02b456e4a445b9503b044df30932cc0fb5021f49

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ibb3d326e89dcacc89e77eca76cdb773861d3a8a7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53908
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17173 utils: cleanup lfs flushctx
Sebastien Buisson [Mon, 13 Nov 2023 10:02:24 +0000 (11:02 +0100)]
LU-17173 utils: cleanup lfs flushctx

When lfs flushctx is called without mount points, build the list of
all mounts first, and then call the ioctl to flush associated
contexts. Otherwise fetching the mount points unfortunately refreshes
the contexts being flushed, because the mount points are being
accessed.

Lustre-change: https://review.whamcloud.com/52604
Lustre-commit: f0534544e3e3aef280ccc5f042e37d42d33b28d3

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I75b9efe4c65ce66f5f692f9e49a28fde705d0140
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53909
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-17173 tests: fix security related tests
Sebastien Buisson [Mon, 13 Nov 2023 10:03:38 +0000 (11:03 +0100)]
LU-17173 tests: fix security related tests

Several cleanups required in security related tests.

In sanity-krb5, in order to get proper access to keyrings, use su -
instead of runas to initialize process more completely.
Also fix use of 'lfs flushctx', as some tests do not call it properly.
And in test_8, avoid waiting arbitrarily and change fail_loc to just
sleep once.

In sanity-krb5 and sanity-sec, fix parameters passed to
start_gss_daemons().

Lustre-change: https://review.whamcloud.com/53012
Lustre-commit: 9fc12ca7f29bd70be19471c2b9143d50d2e24eda

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4598ae5a7d28afbc39d7cc2d0afd1096d877d03b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53910
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-17016 mdd: no EXDEV for parent dir projid mismatch
Andreas Dilger [Fri, 4 Aug 2023 05:01:42 +0000 (23:01 -0600)]
LU-17016 mdd: no EXDEV for parent dir projid mismatch

Don't return EXDEV if the parent directory projid of a renamed
directory does not match the projid of the target dir.  Only the
projid of the source directory itself and the target matter.

Rename variables in mdd_rename_sanity_check() and mdd_rename()
so the object and attribute variable names are consistent.

Improve console error messages to contain more useful information.
Replace spaces with tabs in affected functions.

Lustre-change: https://review.whamcloud.com/51868
Lustre-commit: 1c033467317394d18a7aa05f6e81734bcbbcac75

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7aa53f6d168926719ad9fd5df3c760e6c73ebbe5
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53965
Tested-by: jenkins <devops@whamcloud.com>
16 months agoEX-9125 tests: ignore sanity-compr/1008 failures
Andreas Dilger [Tue, 13 Feb 2024 04:01:20 +0000 (21:01 -0700)]
EX-9125 tests: ignore sanity-compr/1008 failures

Temporarily ignore test failures for sanity-compr.sh test_1008 on
SLES15 and aarch64/ppc64le due to very high failure rates on those
systems.

Rename the *second* test_1008 to test_1080 so it can run, and allow
other new tests to start using disjoint numbers to avoid conflicts.

Test-Parameters: trivial testlist=sanity-compr.sh env=ONLY="1008 1080"
Fixes: bd462ce8e4 ("EX-7795 tests: add sanity-compr test for dir compression")
Fixes: 7546ae79e9 ("EX-8688 csdc: Add header checksum verification calculation")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I199613b1e5b1ee8ea7ca4287a6dfe090257ed72f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54019
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
16 months agoRM-620 build: New tag 2.14.0-ddn134
Andreas Dilger [Thu, 8 Feb 2024 09:00:30 +0000 (02:00 -0700)]
RM-620 build: New tag 2.14.0-ddn134

New tag 2.14.0-ddn134

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I40aa3ac64522236007b800fda7edcd22255a7a4f

16 months agoRM-620 build: New tag lipe-2.41
Andreas Dilger [Thu, 8 Feb 2024 08:59:59 +0000 (01:59 -0700)]
RM-620 build: New tag lipe-2.41

New tag lipe-2.41

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I8bcb7322f9be8c0c4bd6dd7a6f47826044e27801

16 months agoLU-17422 osc: Clear PageChecked on bounce pages
Patrick Farrell [Tue, 30 Jan 2024 21:07:59 +0000 (16:07 -0500)]
LU-17422 osc: Clear PageChecked on bounce pages

When we're finalizing a bounce page, we must clear
PageChecked.  Otherwise, if it's a page pool page, it will
be reused without the full wipe the kernel gives it, and we
will see PageChecked on pages which are not actually from
encryption and will handle them incorrectly.

Lustre-Change: https://review.whamcloud.com/53865/
Lustre-Commit: TBD  (from 5582abc557a8d7188bbb6fb2bc38585338f660b4)

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I8b319e7ba55dd883d74db79a19bf93b6f125616a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53866
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8257 lipe: Hot pools: Add compression support
Alexandre Ioffe [Thu, 18 Jan 2024 23:36:22 +0000 (15:36 -0800)]
EX-8257 lipe: Hot pools: Add compression support

Add compression type and compression level
to lamigo command line options for slow pool.
Check compression option availability in 'lfs mirror extend'
and use it when replicate file to slow pool.

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I207a92079d98bfbffd3a08295527fbb7fca03045
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8688 csdc: Add header checksum verification and calculation
Artem Blagodarenko [Wed, 20 Dec 2023 22:53:26 +0000 (22:53 +0000)]
EX-8688 csdc: Add header checksum verification and calculation

This commit adds functionality to verify and calculate the checksum
of the header in the `ll_compr_hdr` struct and some another sanity
check.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I24a9ab9cb7bea1208ada23aa6550127fe6a55017
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53521
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8814 csdc: Update async_args after resend
Artem Blagodarenko [Thu, 25 Jan 2024 23:01:05 +0000 (23:01 +0000)]
EX-8814 csdc: Update async_args after resend

It is decided to send an uncompressed request on redo.
osc_brw_prep_request() processes uncompressed data and prepares
a request, so some parts of the old request are outdated.

Let's update the old request with information from the new one.

Fixes: 8fb8d5b ("EX-8814 csdc: Revert "EX-8189 osc: do not compress resends")
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: Idb1c6ee9db64cb1f2ea1c1562b1c5aae443263e3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53830
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17000 utils: In mydaemon() check after calling open()
Arshad Hussain [Mon, 22 Jan 2024 10:33:02 +0000 (16:03 +0530)]
LU-17000 utils: In mydaemon() check after calling open()

This patch adds check after calling open() in function
mydaemon() instead of directly using the value

Lustre-change: https://review.whamcloud.com/53758
Lustre-commit: 0f67ab9b00c3949f257cd4e6081184858f245b4e

Test-Parameters: trivial kerberos=true testlist=sanity-krb5
CoverityID: 397666 ("Argument cannot be negative")
Fixes: d2d56f38da0 ("make HEAD from b_post_cmd3")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ic59414977029221e8618c5bb3320e95d39d9cded
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53911
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoDDN-4656 osd-ldiskfs: hide alloc time in brw_stats
Andreas Dilger [Sun, 4 Feb 2024 00:44:13 +0000 (17:44 -0700)]
DDN-4656 osd-ldiskfs: hide alloc time in brw_stats

For EXA6.0/6.1 do not show the "block maps msec" stats in brw_stats
by default as this breaks collectd and lustrefs_collector parsing.
Base this check on the Linux kernel version, since those releases
were based on RHEL7.9 on the server, while EXA6.2/6.3 use RHEL8.

Add an "enable_brw_block_maps" parameter that can be used to
disable the display of this statistic (it is always collected).

Enable the "enable_stats_header" parameter automatically in the
same way, as this was added for EXA6.2 but should now be supported.

Test-Parameters: trivial
Fixes: c1e43cf8e0 ("LU-15564 osd: add allocation time histogram")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib5e33bd98085aaf4a5a5d39283d5d334b93ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53903
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
16 months agoLU-17476 tests: wait for sanity/350 to clean up
Andreas Dilger [Mon, 5 Feb 2024 22:27:20 +0000 (15:27 -0700)]
LU-17476 tests: wait for sanity/350 to clean up

Wait until sanity test_350 has finished deleting its files before
moving on to the next subtests, otherwise the background cleanup
can cause later test failures (in particular test_413a).

Test-Parameters: trivial testlist=sanity
Test-Parameters: testlist=sanity
Test-Parameters: testlist=sanity
Test-Parameters: testlist=sanity
Test-Parameters: testlist=sanity
Fixes: d1509ff2ca ("LU-17476 lnet: prefer to use bits only to match ME")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9ff61013764f4e47916999eefab893e069bb217a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53928
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoEX-7717 lipe: Add simple compression ratio statistics
Vitaliy Kuznetsov [Tue, 12 Dec 2023 14:49:27 +0000 (15:49 +0100)]
EX-7717 lipe: Add simple compression ratio statistics

This patch adds a new table to display data
compression ratio in overall statistics.

The new table to display compression ratio (for regular files)
will have the following column values:
0. Compression ratio range;
1. Count of files in range;
2. Number of files in range as a percent of total
   number of files;
3. Number of files in this range or smaller as
   a % of total # of files;
4. Total compression size of files in range;
5. Total compression size of files in range as a % of
   total compression size of files;
6. Total compression size of files in this range or
   smaller as a % of total compression size of files;
7. Minimum value in range (ratio);
8. Maximum value in range (ratio).

The columns in the table are numbered from 0 to 8 for a better
understanding of the table without the need to name the
columns with long text.

This PR also changes some variable types to the "double" type
for correct calculation of values and to avoid duplication of
variables with the same semantic value.

The output of information in reports with the .out
extension has also been improved.

Test-Parameters: trivial testlist=sanity-lipe-scan3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I242ddb9c4132a7fce81508dadacf8e2b01e3cead
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52372
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8038 csdc: store compression info in FID EA
Bobi Jam [Thu, 11 Jan 2024 09:44:18 +0000 (17:44 +0800)]
EX-8038 csdc: store compression info in FID EA

Store compression information in OST-object's FID EA, and lfsck could
use it to recover the MDT-object layout EA from orphan OST-object(s).

2.15 Lustre may embed PFID and layout stripe info in LMA EA, this
patch would clear them from LMA EA and store them with compression
info directly into FID EA thereafter.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Iacac04601b73f85d9bc057b8dd34a5004248dac4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53649
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8038 csdc: expand filter_fid
Bobi Jam [Fri, 4 Aug 2023 07:02:41 +0000 (15:02 +0800)]
EX-8038 csdc: expand filter_fid

Expand filter_fid to include compression information, for
compatibility reason, if the file is an uncompressed file, still
store the old filter_fid with no compression info in FID EA.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I388500c03604749d05849aeed3c9141974540e4a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53663
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoRM-620 build: New tag 2.14.0-ddn133
Andreas Dilger [Fri, 2 Feb 2024 16:21:29 +0000 (09:21 -0700)]
RM-620 build: New tag 2.14.0-ddn133

New tag 2.14.0-ddn133

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I06785ba295c668f8aab7dcbf2504c64068592123

16 months agoRM-620 build: New tag lipe-2.40
Andreas Dilger [Fri, 2 Feb 2024 16:21:16 +0000 (09:21 -0700)]
RM-620 build: New tag lipe-2.40

New tag lipe-2.40

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If7cd3a8cb392eee47bbc004ba881518f0e3fb991

16 months agoLU-17482 llite: short read could mess up next read offset
Bobi Jam [Fri, 26 Jan 2024 10:14:36 +0000 (18:14 +0800)]
LU-17482 llite: short read could mess up next read offset

When read reaches EOF, it could read data from stale pagecache, but
we need to restore the iocb->ki_pos so that next read could continue
from the correct offset.

Lustre-change: https://review.whamcloud.com/53827
Lustre-commit: TBD (from 4bec3a277c83932cfb5ba26e31336e1f4666460a)

Fixes: 4468f6c9d9 ("LU-16025 llite: adjust read count as file got truncated")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ib8b62c41bf65f8efec82dda53fcfbdb68ad08b38
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53828
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17476 lnet: prefer to use bits only to match ME
Serguei Smirnov [Sat, 27 Jan 2024 20:17:34 +0000 (12:17 -0800)]
LU-17476 lnet: prefer to use bits only to match ME

In some cases, it has been observed that a reply will arrive
at the portal with the correct match bits, but is dropped by
lnet_parse_put().  This appears to happen with LNet Multi-Rail
peers, each having two separate NIDs.

If a reply arrives with matchbits available and matching, but
the NIDs don't match, confirm the match if the NIDs are found
to belong to the same peer.  This will only happen in cases
where the reply would be dropped entirely, causing hundreds of
seconds of delay until the RPC is resent, so the extra overhead
of checking for a peer match before dropping the request is
only in the error path and minimal compared to the alternative.

Add CFS_FAIL_CHECK() for exercising the match NIDs code.

That is in a hot codepath, but CFS_FAIL_CHECK() is marked unlikely()
and this check is in the error case and _should_ only be hit when the
message would have been dropped anyway, so it seems unlikely to impact
performance in any meaningful way.

Lustre-change: https://review.whamcloud.com/53843
Lustre-commit: TBD (from 3360e892750d1bf4f2b7ceab60d9a637b3e649ad)

Test-Parameters: testlist=sanity-lnet env=ONLY=350,ONLY_REPEAT=10
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I10e1a2142539ddf5dabc26ce962cec1f2cfcf3db
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53846
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
16 months agoLU-16873 osd: update OI_Scrub file with new magic
Alexander Zarochentsev [Sun, 28 May 2023 12:42:27 +0000 (08:42 -0400)]
LU-16873 osd: update OI_Scrub file with new magic

The fix for LUS-11542 detects the format change correctly
but does not write new oi scrub file magic, so new mount
triggers the "oi files counter reset" again and again.

Lustre-change: https://review.whamcloud.com/51226
Lustre-commit: 38b7c408212f60d684c9b114d90b4514e0044ffe

Fixes: 126275ba83 ("LU-16655 scrub: upgrade scrub_file from 2.12 format")
HPE-bug-id: LUS-11646
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ia13fcfaf0d8f2c4ee9331dd9fec0ff159d195186
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53854
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
16 months agoEX-8598 tests: use alternative data source for rewriting
Artem Blagodarenko [Sun, 28 Jan 2024 20:24:31 +0000 (20:24 +0000)]
EX-8598 tests: use alternative data source for rewriting

Using the same file as input has disadvantages. It is not
possible to understand that data was not rewritten at all.
Alternative data source should be used.

Let's shift source file data and use it as a source.
To check rewriting result the same operarion is performed
on the destination file copy stored outside the Lustre FS.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Test-Parameters: trivial testlist=sanity-compr env=ONLY=1004
Change-Id: I6ef400520359bfe9156c3f47e757064863bdf4e0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53088
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoEX-8996 ofd: handle 'missing object' reads
Patrick Farrell [Wed, 24 Jan 2024 16:02:32 +0000 (11:02 -0500)]
EX-8996 ofd: handle 'missing object' reads

When the read code (eg, mdt_preprw_read) finds there is no
object, it will return a read with 0 pages, but not fail the
read.  The assert for local and remote pages needs to
recognize this case.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Idc6ff70f71abc100f750a63eca73a754a56f6435
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8450 tests: skip sanity-lipe-find3/306 on el7.9
Andreas Dilger [Mon, 29 Jan 2024 23:20:02 +0000 (16:20 -0700)]
EX-8450 tests: skip sanity-lipe-find3/306 on el7.9

The sanity-lipe-scan3 test_309 is failing consistently with el7.9
*clients*.  Exclude it until fixed or we drop this client version.

Test-Parameters: trivial testlist=sanity-lipe-find3 clientdistro=el7.9 serverdistro=el7.9
Test-Parameters: trivial testlist=sanity-lipe-find3 clientdistro=el8.8 serverdistro=el7.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iceca83a3b85df95fe45482076170d77a6abc0947
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53853
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
16 months agoEX-7601 tests: skip sanity-compr tests in interop
Andreas Dilger [Mon, 29 Jan 2024 21:54:03 +0000 (14:54 -0700)]
EX-7601 tests: skip sanity-compr tests in interop

Skip a number of subtests in sanity-compr that depend on fixes
landed to the code that were not available in older versions.

Test-Parameters: trivial testlist=sanity-compr serverversion=EXA6.3.0
Fixes: 3e1dd9d6ae ("LU-17468 lod: component add missed pattern info")
Fixes: 7731c7fc74 ("EX-7601 tests: unaligned read tests")
Fixes: 033dd0ba2c ("EX-7644 mmap: add mmap support for compression")
Fixes: 46708e4636 ("EX-7601 tests: tests for read-modify-write")
Fixes: 6c4c4d7599 ("EX-7601 tests: add multi-mount compression test")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I26cae5cf01cc32c9f3e4386cf7151a66ac3678ea
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53852
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoEX-7795 tests: add sanity-compr test for dir compression
Jian Yu [Tue, 30 Jan 2024 02:26:24 +0000 (18:26 -0800)]
EX-7795 tests: add sanity-compr test for dir compression

This patch adds a sanity-compr test to validate that
we get directory space usage reduction with compression.

Change-Id: I16f3a3f1e413e4884b3973829df36500667271ce
Test-Parameters: trivial testlist=sanity-compr env=ONLY="1007 1008",ONLY_REPEAT=3
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53855
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-16367 utils: clean up ldiskfs feature handling
Andreas Dilger [Mon, 5 Dec 2022 18:59:02 +0000 (11:59 -0700)]
LU-16367 utils: clean up ldiskfs feature handling

Update the default ldiskfs features used by mkfs.lustre:
- enable large_dir on OSTs as well as MDTs
- remove obsolete handling of "ext3" filesystems
- clean up handling of other features that have become a bit messy

Lustre-change: https://review.whamcloud.com/49316
Lustre-commit: e6b6b7ee253cedd8aeb6bb48d6c54916368c4109

Test-Parameters: trivial testlist=conf-sanity
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id717c3ba939ccf9b2de34e868d4415e88429ef39
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53875
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-16599 obdclass: job_stats can parse escaped jobid string
Lei Feng [Wed, 1 Mar 2023 00:16:03 +0000 (08:16 +0800)]
LU-16599 obdclass: job_stats can parse escaped jobid string

Writing a jobid to job_stats proc entry asks lustre to clear
the stats of the specific jobid. Since job_stats outputs
escaped jobid string in some cases, it should be able to parse
an escaped jobid string when the string is written to it.

Lustre-change: https://review.whamcloud.com/50160
Lustre-commit: 8f004bc53b1a488dad5a92a580f5f0c078e33654

Test-Parameters: trivial
Signed-off-by: Lei Feng <flei@whamcloud.com>
Change-Id: Idbc63dac6c3b35331317927107e634a3d638dd66
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-14810 lnet: Cancel discovery ping/push on shutdown
Chris Horn [Tue, 5 Dec 2023 09:56:57 +0000 (03:56 -0600)]
LU-14810 lnet: Cancel discovery ping/push on shutdown

Discovery shutdown can race with ping and push events. In some cases
this can result in failing to unlink ping/push MDs on shutdown.
Protect against this by checking for PING/PUSH_FAILED state on peers
on the request queue.

Lustre-change: https://review.whamcloud.com/53356
Lustre-commit: c3b9597742d5118a96f56129e7dd30d84468d2c8

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lnet env=ONLY=500,ONLY_REPEAT=50
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I84a1f5beb6508651bc62e1dd93271f9e72f5081c
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53848
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17471 osd: add symlink for brw_stats
Hongchao Zhang [Fri, 26 Jan 2024 13:43:36 +0000 (21:43 +0800)]
LU-17471 osd: add symlink for brw_stats

Add symlink at /proc/fs/lustre/osd-*/*/brw_stats to
/sys/kernel/debug/lustre/osd-*/*/brw_stats to fix
the compatible issue of the previous utils that are
still using the old proc entry.

Lustre-change: https://review.whamcloud.com/53829
Lustre-commit: TBD (from 5fad20603098c55c0080548a177023a36e640e84)

Fixes: 8a84c7f9c7 ("LU-14927 osd: share brw_stats code between OSD back ends")
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ie86b2b384e3b91f98ead00b6325ddeb020e47aa5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53858
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoRM-620 build: New tag 2.14.0-ddn132
Andreas Dilger [Mon, 29 Jan 2024 09:02:19 +0000 (02:02 -0700)]
RM-620 build: New tag 2.14.0-ddn132

New tag 2.14.0-ddn132

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65b4833ced4c7c398110c49336138d5fb9947a31

16 months agoLU-17464 lod: set llc_ostlist to NULL after free
Bobi Jam [Wed, 24 Jan 2024 06:04:35 +0000 (14:04 +0800)]
LU-17464 lod: set llc_ostlist to NULL after free

Default LOV striping could free component entry llc_ostlist if needed
e.g. expand component entries, without set it to NULL it could be
double allocated/freed later.

Lustre-change: https://review.whamcloud.com/53797
Lustre-commit: TBD (from 5e7440b488050166af15e744dc74b9dc4f0d3b96)

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I25824cb61dd47ba284403039259593b88d25fa9d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53798
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-9007 lipe: Fix getting client mount path
Vitaliy Kuznetsov [Tue, 23 Jan 2024 12:34:40 +0000 (13:34 +0100)]
EX-9007 lipe: Fix getting client mount path

This patch fixes an issue where when the client is not
mounted, size reports do not work.

Test-Parameters: trivial testlist=sanity-lipe-scan3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I1e99fddf21960ecd14526c0d6baeb75c2a138dd8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53763
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-9029 lfs: not iterate compr_type_table using ARRAY_SIZE
Bobi Jam [Wed, 24 Jan 2024 15:35:37 +0000 (23:35 +0800)]
EX-9029 lfs: not iterate compr_type_table using ARRAY_SIZE

EX-8311 patch modifies compr_type_table to contain NULL fields in the
array, so iterate over the array should not use ARRAY_SIZE, but skip
those elements with NULL compression type name.

Fixes: ec5814c9a7 ("EX-8311 csdc: allow specify 'fast'/'best' compression type")
Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I8e4988fd3a63c1cb66f75510d190c2ebc4f8f9be
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53808
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17468 lod: component add missed pattern info
Bobi Jam [Wed, 24 Jan 2024 17:08:33 +0000 (01:08 +0800)]
LU-17468 lod: component add missed pattern info

"lfs setstripe --commponent-add" missed setting component pattern,
which causes some setting missing, like overstriping, compression.

Lustre-change: https://review.whamcloud.com/53817
Lustre-commit: TBD (from 3849e3efdc58d535ee6858aafa22cfdc665ba2d7)

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I7ad746a550f1afea54a6f5b68823a79a85a44082
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53811
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-16307 tests: fix sanity-sec test_31
Sebastien Buisson [Tue, 23 Jan 2024 13:29:11 +0000 (14:29 +0100)]
LU-16307 tests: fix sanity-sec test_31

In order to improve sanity-sec test_31 resiliency, reorganize the way
the new LNet '999' is handled. And make sure everything is correctly
cleaned up after the test.

Lustre-change: https://review.whamcloud.com/53818
Lustre-commit: TBD (from f4a96799159fd662855542d471197ac4060d3295)

Test-Parameters: trivial testgroup=review-dne-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idd657c7555e598d0ebc08387eac537b1c73e35bd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53779
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-16339 quota: notify OSTs until lge_qunit_nu is set
Sergey Cheremencev [Wed, 9 Feb 2022 11:53:51 +0000 (14:53 +0300)]
LU-16339 quota: notify OSTs until lge_qunit_nu is set

There is a window when locks are not granted yet, but
lqe is set to qmt_reba_list to send updates to OSTs.

t1: lqe_init()->qmt_setup_lqe_gd->qmt_seed_glbe()
t1: lqe_init()->qmt_setup_lqe_gd->qmt_id_lock_notify()
t2: qmt_glimpse_lock() lustre-QMT0000: no granted locks to send glimpse
t1: ldlm_lock_enqueue()->ldlm_granted_list_add_lock() ...

If lge_qunit_nu was set to 1 in qmt_seed_glbe and appropriate qunit
is equal to the least_qunit, new qunit won't be sent to OSTs
and finally lqe_revoke will not be set causing endless -115 errors.
The fix calls qmt_id_lock_notify into qmt_dqacq0 for an lqe that has
set lge_qunit_nu or lge_edquot_nu.

Add test 85 into sanity-quota to check that write
doesn't hung if qunit initial value is equal to
the least_qunit due to small block hard limit.

HPE-bug-id: LUS-10711
Change-Id: Icd1ac29beab87c0ebf00bcb20b25c33b771b74c1
Lustre-change: https://review.whamcloud.com/49228
Lustre-commit: 6c0b4329d046de283eeb254fca561be9386df68a
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53778
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-14008 o2iblnd: avoid memory copy for short msg
Alexey Lyashkov [Wed, 12 Aug 2020 14:59:50 +0000 (17:59 +0300)]
LU-14008 o2iblnd: avoid memory copy for short msg

Modern cards allow to send a kernel memory data without mapping
or copy to the preallocated buffer.
It reduce a lnet selftest cpu consumption by 3% for messages
less than 4k size.

Lustre-change: https://review.whamcloud.com/40262
Lustre-commit: bebd87cc6c9acc577a2fdde56e856075094f1291

Test-Parameters: trivial
HPe-bug-id: LUS-1796
Change-Id: I96c31be680c8ea7ac289a755df7f1d4c1c7f9aef
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53767
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-14008 o2iblnd: avoid static allocation for msg tx
Alexey Lyashkov [Wed, 12 Aug 2020 13:20:00 +0000 (16:20 +0300)]
LU-14008 o2iblnd: avoid static allocation for msg tx

tx msg handling simplification, just push
a lnet header message in same list as other.

Lustre-change: https://review.whamcloud.com/40261
Lustre-commit: 7d12b98d3f8294ca0911ca730aacd07a0f822298

Test-Parameters: trivial
Cray-bug-id: LUS-1796
Change-Id: I8e5d9b8a4579ff630d4a4fbc57b06a73a662e68c
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53766
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-14008 o2iblnd: cleanup
Alexey Lyashkov [Fri, 7 Aug 2020 11:26:25 +0000 (14:26 +0300)]
LU-14008 o2iblnd: cleanup

simplify kiblnd_send by avoid code duplication.
lets pickup idle tx first.

Lustre-change: https://review.whamcloud.com/40260
Lustre-commit: 3916b9d7226ebb21cf413dd7685afa693e243513

Test-Parameters: trivial
HPE-bug-id: LUS-1796
Change-Id: Iaf71a9a3aeb3047a086d4cc0a3cf4f1dbe8944b4
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53765
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8362 scripts: improve ll_compression_scan estimate
Andreas Dilger [Sat, 27 Jan 2024 20:08:33 +0000 (12:08 -0800)]
EX-8362 scripts: improve ll_compression_scan estimate

Improve ll_compression_scan script to give a better estimate of
actual compression ratios.
- add a '-d' debug option for verbose output during testing
- log and report incompressible small files < 4096
- log and report incompressible file count and size
- include small/incompressible/large files in compression estimate
- add a correction factor to calculations for safety margin

Change-Id: If561b0273e38e4821de228c81291859c7bb1a0d2
Test-Parameters: trivial testlist=sanity-compr env=ONLY=1007,ONLY_REPEAT=10
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53824
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoEX-8362 scripts: improve ll_compression_scan functionality
Andreas Dilger [Tue, 23 Jan 2024 19:46:40 +0000 (11:46 -0800)]
EX-8362 scripts: improve ll_compression_scan functionality

Improve ll_compression_scan script functionality without
changing the compression estimates.
- add a version string to the output to allow tracking
- handle pathnames with spaces in them
- handle the lz4fast compression type
- allow running on MacOS for testing

Test-Parameters: trivial testlist=sanity-compr env=ONLY=1007
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0b8442a2590fdb9c718b1404cba1d73c26cff03c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53678
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
16 months agoLU-17446 ldlm: Do not wait for BL AST RPC completion on cancel
Oleg Drokin [Fri, 19 Jan 2024 05:24:43 +0000 (00:24 -0500)]
LU-17446 ldlm: Do not wait for BL AST RPC completion on cancel

If we have sent an AST RPC to the client and while it's in flight
the client sent in the cancel, sometimes (esp. if AST or reply
to it are lost) even though the lock is already cancelled, whoever
is waiting on it is still stuck while trying to resend ASTs.
And in the end the client is not even evicted because the lock cancel
did come and all is fine, but it can add over a hundred seconds
to lock granting process in some non-ideal circumstances.

For simplicity we only treat Blocking ASTs like this, since we
can only have a single one of this kind.

This is adding additional pointer to struct ldlm_lock, but that is
already 560 bytes so does not really mean much.

Lustre-change: https://review.whamcloud.com/53739
Lustre-commit: TBD (from d4b782c249377276dc9f6ddbf0fab34956d57af6)

Signed-off-by: Oleg Drokin <green@whamcloud.com>
Change-Id: Id2231bc3bfc3e094faae2872fe09f3c330d441df
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53840
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-15913 tests: add rename stress test via racer
Andreas Dilger [Thu, 16 Jun 2022 05:03:45 +0000 (23:03 -0600)]
LU-15913 tests: add rename stress test via racer

Add a rename stress test using the racer framework.  Use
mrename if found, to avoid stat and allow directory rename.
Sometimes create and rename files to/from subdirectories.

Run e2fsck after every run to confirm filesystem structure.

Allow tunable parameters via environment variables so they
can be set via Test-Parameters.  Parameters can be set on
different nodes via variables CLIENT_LCTL_SETPARAM_PARAM,
MDS_LCTL_SETPARAM_PARAM, OSS_LCTL_SETPARAM_PARAM.

Lustre-change: https://review.whamcloud.com/47643
Lustre-commit: TBD (from 6c63c882741637a246012a81e41289fcf0e2dbbe)

Test-Parameters: trivial testlist=racer env=ONLY=2
Test-Parameters: testlist=racer env=ONLY=2 mdtcount=2
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2ae034b864a5ccb8a59bf7028d22cd67c643f51f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53751
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
16 months agoLU-17426 tests: add crossdir parallel rename test
Andreas Dilger [Fri, 19 Jan 2024 03:44:33 +0000 (20:44 -0700)]
LU-17426 tests: add crossdir parallel rename test

Add sanityn test_81d to test cross-dir (same-MDT) parallel rename
if the MDT supports this functionality.

Lustre-change: https://review.whamcloud.com/47643
Lustre-commit: TBD (from fdd1f9df934efa070cd4aca4cf3db686261ef868)

Test-Parameters: trivial testlist=sanityn
Test-Parameters: testlist=sanityn serverversion=2.15
Test-Parameters: testlist=sanityn env=ONLY=81d,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=ONLY=81d,ONLY_REPEAT=10 mdtcount=2
Test-Parameters: testlist=sanityn env=ONLY=81d,ONLY_REPEAT=10 mdtcount=4
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ic8717e6865a9c6c9698186f4fdf34c1f4f74083f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53748
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-17426 mdt: relax same MDT file rename lock
Lai Siyao [Fri, 19 Jan 2024 19:13:19 +0000 (12:13 -0700)]
LU-17426 mdt: relax same MDT file rename lock

Allow cross-directory rename of regular files (strictly, any
non-directory) on the same MDT without holding the BigFilesystemLock
(BFL), as file renames cannot change the directory hierarchy.

This should improve the performance for these rename operations, and
reduce contention between local MDT file renames in different parts of
the directory tree.

Add "mdt.*.enable_parallel_rename_crossdir" parameter to disable
cross-directory file renames if there is an issue with this change.

Lustre-change: https://review.whamcloud.com/53726
Lustre-commit: TBD (from d466465f9add30faec256ce7f725e0f36d4e8a66)

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I511b392e46c46140cac6aa3ede02bfe793729f7f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53744
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-930 ptlrpc: clarify AT error message
Aurelien Degremont [Tue, 18 Jan 2022 13:55:01 +0000 (13:55 +0000)]
LU-930 ptlrpc: clarify AT error message

Clarify the error message related to passed deadline
for AT early replies. It was indicating that the system
was CPU bound which is most of the time wrong, as the issue
is rather communication failure delaying RPC traffic.
This could be confusing to people which will look for
CPU resource consumption where the network traffic is
more at cause.

Also try to use less cryptic keywords which makes only
sense to the feature developer, and not to admins.

Lustre-change: https://review.whamcloud.com/49548
Lustre-commit: 9ce04000fba07706c73b8adb3605c959e5b62712

Test-Parameters: trivial
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Change-Id: Icdff8f4c6fb9905233f6b8ed1b961b2fd1127667
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53772
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-16766 obdclass: trim kernel thread names in jobids
Thomas Bertschinger [Thu, 13 Jul 2023 22:32:52 +0000 (18:32 -0400)]
LU-16766 obdclass: trim kernel thread names in jobids

When collecting jobstats on operations coming from kernel threads, it
is more useful and reduces the noisiness of the data if the names of
kernel threads are trimmed so that all "kworker/CPU:ID" threads are
collected under "kworker", all "ll_sa_PID" threads under ll_sa, etc.

Lustre-change: https://review.whamcloud.com/51919
Lustre-commit: 8a9c503c002d08d0587894a748761e30c1b9a445

Signed-off-by: Thomas Bertschinger <bertschinger@lanl.gov>
Change-Id: Icd82a99c1153de0277ea5ed3f4b1d92535809921
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-16655 scrub: upgrade scrub_file from 2.12 format
Alexander Zarochentsev [Tue, 28 Mar 2023 16:00:09 +0000 (19:00 +0300)]
LU-16655 scrub: upgrade scrub_file from 2.12 format

Scrub_file->sf_oi_count has different offsets in Lustre-2.10,
Lustre-2.12, and Lustre-2.15 due to unintended format changes.
Lustre-2.15 reads sf_oi_count from offset of sf_success_count
and may initialize incorrect number of OI files, and not be
able to do FID lookups for existing filesystem objects.

Lustre-change: https://review.whamcloud.com/50455
Lustre-commit: 126275ba8339540e46f1c517decd3d69ad1cc42c

Fixes: a114f6b8c5 ("LU-13344 servers: change request timeouts to s32")
Fixes: 4c2f028a95 ("LU-9019 osd-ldiskfs: migrate to 64 bit time")
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Id7c8bd555229405d604456c48447f01fd121aca9
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53839
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoRM-620 build: New tag 2.14.0-ddn131
Andreas Dilger [Tue, 23 Jan 2024 02:09:27 +0000 (19:09 -0700)]
RM-620 build: New tag 2.14.0-ddn131

New tag 2.14.0-ddn131

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I39de60e1b13b95f532b36068933f8335a16d7b8f

16 months agoRM-620 build: New tag lipe-2.39
Andreas Dilger [Tue, 23 Jan 2024 02:09:05 +0000 (19:09 -0700)]
RM-620 build: New tag lipe-2.39

New tag lipe-2.39

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I91f46bb7da4d5fd6d06faac7c8c9975c69d7e8ce

16 months agoLU-17441 mdc: use MDS_IO_PORTAL for rename
Andreas Dilger [Thu, 18 Jan 2024 09:49:48 +0000 (02:49 -0700)]
LU-17441 mdc: use MDS_IO_PORTAL for rename

Some workloads like Apache Spark are very rename intensive, and there
here may be many concurrent renames that need the BFL lock (more than
the number of MDS_REQUEST_PORTAL service threads), they will block
these threads until each is able to get the rename lock, and prevent
other MDS_REINT RPCs from being processed.

Since the MDS_IO_PORTAL is often unused (only needed for DoM files),
and has existed since 2.11.0, it seems possible to move the rename
RPCs to be serviced by the MDS_IO_PORTAL threads to avoid contention
on the primary MDS service threads. Also, it will avoid blocking
normal file open, setattr, statfs, and other common operations if the
BFL lock is contended. Even with DoM files they may have read-on-open
handling and only DoM writes would be blocked by the uncommon rename.

Lustre-change: https://review.whamcloud.com/53725
Lustre-commit: TBD (from b31c07cf18882b150d3e49ceee85a187e7a9b159)

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I623a27de1482778f3c9fc6bb5bbcf917611dc75b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53749
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
16 months agoEX-8311 csdc: allow specify 'fast'/'best' compression type
Bobi Jam [Tue, 24 Oct 2023 14:02:55 +0000 (22:02 +0800)]
EX-8311 csdc: allow specify 'fast'/'best' compression type

Use lctl set_param osc.*.compress_type_{fast|best}=<type>:<level>
to specify the compression_type:level for LL_COMPR_TYPE_FAST/
LL_COMPR_TYPE_BEST.

lctl get_param osc.*.compress_type_{fast|best} will list these
values.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Ifeff7f25e30fc0884f0c770a3b6d0798937b3c35
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52814
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-15288 lnet: increase transaction timeout
Cyril Bordage [Tue, 7 Dec 2021 22:14:43 +0000 (23:14 +0100)]
LU-15288 lnet: increase transaction timeout

In LU-13145, it was decided to increase default transaction timeout
(LNET_TRANSACTION_TIMEOUT_DEFAULT) to 150s. But, in the associated
patch, it was set to 50s. This modification will also modify
lnd_timeout (from 16 to 49).

Lustre-change: ttps://review.whamcloud.com/45780
Lustre-commit: 18b4e28f18d55291f8a14a3bd9ee84b1a686a93e

Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I13a8b5d14230bb6e8936cb3e18540f19dbc62985
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53747
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-16913 revert "EX-7849 quota: extra debug messages"
Andreas Dilger [Fri, 19 Jan 2024 19:28:05 +0000 (19:28 +0000)]
LU-16913 revert "EX-7849 quota: extra debug messages"

This reverts commit 02242f6f1ba1867756ee5b91abd2207f646436cf.
Extra debugging is no longer needed.

Change-Id: I083b70a911ac85fb5a1054c8409146bb393e94b0
Test-Parameters: trivial testlist=sanity-quota
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53746
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-12031 tests: update interop version checks
Andreas Dilger [Mon, 22 Jan 2024 16:40:20 +0000 (16:40 +0000)]
LU-12031 tests: update interop version checks

Update the version check in sanity test_270j and
sanity-hsm test_1f to match actual landing version.

Change-Id: Ifd6d5dec50e3fcbb7ebe77ab41335a6c3994ef57
Test-Parameters: trivial
Fixes: 3bccd95ca2 ("LU-12031 mdt: explicit data version of DoM files")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53762
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoEX-8042 lipe: Fix size calculation when using -blocks option
Vitaliy Kuznetsov [Thu, 18 Jan 2024 11:31:36 +0000 (12:31 +0100)]
EX-8042 lipe: Fix size calculation when using -blocks option

This patch fixes the size calculation in the "-blocks"
option when specifying the exact size value "n[bkMG]".

Test-Parameters: trivial testlist=sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I5dd0ce69cef20ab9a9632798f350cf4c9f96cf25
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53723
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoDDN-4623 obdclass: fix upcall_cache_get_entry
Sebastien Buisson [Fri, 19 Jan 2024 17:09:14 +0000 (18:09 +0100)]
DDN-4623 obdclass: fix upcall_cache_get_entry

When an entry is found while holding the read lock, we need to
convert to a write lock and find again, to check that entry was
not modified/freed in between.
In this case, the variable indicating an entry was found must be
reset, because we might not find any valid entry after all.

Fixes: 127128bed3 ("LU-16498 obdclass: change uc_lock to rwlock")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I111af4562ac78eeb22102a8a28943e46e30b4019
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53743
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoRM-620 build: New tag 2.14.0-ddn130
Andreas Dilger [Thu, 18 Jan 2024 09:29:17 +0000 (02:29 -0700)]
RM-620 build: New tag 2.14.0-ddn130

New tag 2.14.0-ddn130

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iff6822b7d54b0cf9e1946bccf20069fa2ec51e3e

16 months agoEX-1878 lipe: resync all stale files
Alexandre Ioffe [Fri, 8 Dec 2023 22:17:37 +0000 (14:17 -0800)]
EX-1878 lipe: resync all stale files

Add --resync-all-stales option (default is on).
This option allows lamigo by default to resync
all files if any component of a file
is stale regardless to pool or OST location.
If the option is off, lamigo works the old way

Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Ibc26a21fa99f75de87a8e0328b183d96b7548c1f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53391
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8353 csdc: rename "cp_comp_*" to "cp_compr_*"
Jian Yu [Wed, 17 Jan 2024 01:10:58 +0000 (17:10 -0800)]
EX-8353 csdc: rename "cp_comp_*" to "cp_compr_*"

This patch renames "cp_comp_type", "cp_comp_level",
and "cp_chunk_log_bits" to use "compr" in the name
to be consistent with other variable names.

Test-Parameters: trivial

Change-Id: I428ff3a789b33da02832dee02f316b02d97137e2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52761
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8993 ofd: use niocount consistently
Patrick Farrell [Tue, 21 Nov 2023 20:25:34 +0000 (15:25 -0500)]
EX-8993 ofd: use niocount consistently

'niocount' refers to the number of remote niobufs, ie, the
number of separate IOs from the client.  Except for a few
places, where it's used to refer to the number of pages in
the entire RPC.  Eek.

Replace this usage with 'npages', making niocount usage
consistent.

Test-Parameters: trivial
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I266087ad8dccadb54c054b0a11fb03dc9868a725
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53206
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoEX-8971 pcc: add lctl pcc abort command to abort attaches
Qian Yingjin [Fri, 12 Jan 2024 09:03:15 +0000 (04:03 -0500)]
EX-8971 pcc: add lctl pcc abort command to abort attaches

This patch adds a new PCC command "lctl pcc abort [--wait|-w]
[--detach|-d] $LUSTRE_MNTPT $PCCROOT".
--wait|-w: wait all in-flight attaches aborted.
--detach|-d: detach the PCC copies when scan the PCC backend.

It can be used to abort in-progress attaches for a given PCC
backend. It does not remove the PCC backend from a client.

Add sanity-pcc/test_109 to verify it.

Change-Id: Ib7152f7418aa1beb840919e98bf8de53c99b5c54
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17370 utils: simplify lfs-mirror-extend help text
Alexandre Ioffe [Fri, 5 Jan 2024 04:54:11 +0000 (20:54 -0800)]
LU-17370 utils: simplify lfs-mirror-extend help text

Add list of lfs setstripe command line options
to help text of lfs mirror extend.
Simplify syntax of lfs mirror extend help text.
Update corresponding lfs-mirror-extend man page.
On man pages make left side adjustment and disable hyphenation:
'.nh', '.ad l' to prevent hyphenation of keywords

Lustre-change: https://review.whamcloud.com/53719
Lustre-commit: TBD (from 2a71d159d4ac98a3252f12796b8688bfa4d6df50)

Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Test-Parameters: trivial
Change-Id: I6cffcdb9651062e169f53868827646b876a82cb5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53598
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoEX-8851 lustre: add uncompressed size to compression header
Patrick Farrell [Wed, 20 Dec 2023 19:51:55 +0000 (14:51 -0500)]
EX-8851 lustre: add uncompressed size to compression header

It's useful to have the uncompressed size of the data in the
compression header.  Also, we have three checksum fields -
compressed, uncompressed, and header, but in practice,
checksumming the compressed data including the header is
enough to cover all of these.

This patch cleans up all of this at the same time.

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ie82e0dbe9c862ddc88999b109cea1f27577dbbff
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53520
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17218 ofd: improve filter_fid upgrade compatibility
Bobi Jam [Mon, 23 Oct 2023 07:29:07 +0000 (15:29 +0800)]
LU-17218 ofd: improve filter_fid upgrade compatibility

filter_fid could be expanded in later Lustre version, and with
upgrade then downgrade process, the filter_fid EA on disk
could has been expanded during upgrade, and won't work after
the downgrade.

This patch improves this process by allocating bigger buffer to
hold the expanded filter_fid EA then trims the unrecognizable
fileds off.

Lustre-change: https://review.whamcloud.com/52798
Lustre-commit: TBD (from cffd0a099c30794a63268da008958f722882119b)

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: I4c99f1d9f3962d46ebf9e9b799988ff3dba4f919
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53662
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-16637 llite: tolerate fresh page cache pages after truncate
Andrew Perepechko [Tue, 26 Dec 2023 17:02:12 +0000 (20:02 +0300)]
LU-16637 llite: tolerate fresh page cache pages after truncate

Truncate called by ll_layout_refesh() can race with a fast read
or tiny write, which can add an uninitialized non-uptodate page
into the page cache.

We want to avoid expensive locking for this rare case so if there
is any leftover in the cache after truncate, just check that
the pages are not uptodate, not dirty and do not have any
filesystem-specific information attached to them.

Lustre-change: https://review.whamcloud.com/53554
Lustre-commit: TBD (from f4c8d44a7c2f0fbc2c74d1832ff63c5216c22c38)

Change-Id: I8cadc022a3d1822a585f32e1a765e59ad0ff434d
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11937
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
16 months agoLU-17364 llite: don't use stale page.
Alexey Lyashkov [Fri, 12 Jan 2024 18:55:55 +0000 (13:55 -0500)]
LU-17364 llite: don't use stale page.

using stale page for write might confuse a read path,
which expect any IO page have PG_uptodate flag set,
and it caused an panic with removing from IO.

Lustre-Change: https://review.whamcloud.com/53550
Lustre-Commit: TBD (from f7b42523e669d3653ca7c442fe82afde618bbdd5)

Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Test-Parameters: testlist=sanityn env=SLOW=yes,ONLY=16k,ONLY_REPEAT=10
Signed-off-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Change-Id: Ia01129ceaecf53d8d9f301c26cd2d65122f6a267
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53666
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
16 months agoLU-16498 obdclass: fix write unlock for internal case
Sebastien Buisson [Mon, 15 Jan 2024 08:57:53 +0000 (09:57 +0100)]
LU-16498 obdclass: fix write unlock for internal case

Holding a (write) lock is mandatory for put_entry(), so fix that in
refresh_entry_internal().

Fixes: 127128bed3 ("LU-16498 obdclass: change uc_lock to rwlock")
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: If55182ca29f37f2a783fdb88ba46512944a61c47
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53674
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoRM-620 build: New tag 2.14.0-ddn129
Andreas Dilger [Sat, 13 Jan 2024 02:51:06 +0000 (19:51 -0700)]
RM-620 build: New tag 2.14.0-ddn129

New tag 2.14.0-ddn129

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9310e8bddd0fd14b5c8c1faa109bdab19454eca1

17 months agoLU-17354 osp: don't reset sequence client
Alex Zhuravlev [Tue, 12 Dec 2023 08:57:53 +0000 (11:57 +0300)]
LU-17354 osp: don't reset sequence client

do not reset sequence client if sequence allocation returned an
error, instead try to to get sequence later upon reconnection.

Lustre-change: https://review.whamcloud.com/53406
Lustre-commit: TBD (from 5cce95b35c652564b084f0721d4775d0fd522aa7)

Fixes: 6c4c51e307 ("LU-1445 osp: Use FID to track precreate cache.")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ie23b688e4f93651c4615d77a9686c44a150d3961
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53417
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoLU-17365 lod: handle llog errors gracefuly
Mikhail Pershin [Wed, 13 Dec 2023 12:43:53 +0000 (15:43 +0300)]
LU-17365 lod: handle llog errors gracefuly

Distinguish remote llog errors by their source and type
in LOD lod_sub_prep_llog() and uniform errors reported
by llog_osd_read_header() and llog_init_handle.

- Partial llog header or 0-size llog is to be
  reinitialized, new header is created
- in llog_read_header() dt_attr_get() and read_header()
  thier errors are printed and returned as -EIO to caller
- llog with invalid llog header data is skipped and new
  one is created to be used instead. To indicate that
  the llog_init_handle() returns -EINVAL error code instead
  of -EIO. Therefore network errors are to be handled by
  lod_sub_recovery_thread() retry logic while bad llog
  content will lead to immediate llog re-creation.
- lod_sub_init_llogs() tries to init all targets even
  if some failed
- always recreate llogs after recovery abort no matter
  if ctxt->loc_handle exists or not

Patch tries to cover known issues and types of error during
update log recovery and provides also better debug for
similar cases in future

Lustre-change: https://review.whamcloud.com/53510
Lustre-commit: e81805244476f1d3ffb5a2ecb0a85f54b936ce51

Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I2705e0dc245ed4365123ce47137193a9ed769673
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
17 months agoLU-10283 mdd: fix parent FID in changelog of striped directory
Dmitry Ivanov [Mon, 16 May 2022 18:15:19 +0000 (12:15 -0600)]
LU-10283 mdd: fix parent FID in changelog of striped directory

Changelog entry for the file operations such as create, rename,
link, unlink, mkdir referred to parent FID ("p=") as a shard's
FID in a striped directory. The same was true for the source's
parent FID ("sp="). This commit hides the Lustre intrinsics from
user displaying the parent's directory FID instead as expected.

An object might be in a remote MDT, in which case obtaining the parent
FID via the linkEA can be an expensive operation, so the parent FID is
cached in the mdd_object, so that the cost of the cross-MDT RPC is
amortized over the lifetime of the object.

Certain userspace tools might depend on the previous behavior of
displaying the shard's parent FID in the changelog records, so this
canp be enabled by setting mdd.*.enable_shard_pfid=1, if this is
required for compatibility.

Lustre-change: https://review.whamcloud.com/51322
Lustre-commit: 3554923af9e3260235865d90949ecd2924bbbc0e

HPE-bug-id: LUS-10721
Signed-off-by: Dmitry Ivanov <dmitry.ivanov2@hpe.com>
Change-Id: Iae15b49f5852f36ba62ae1706d3a5f4ebf307bc4
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53475
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
17 months agoLU-16498 obdclass: change uc_lock to rwlock
Sebastien Buisson [Thu, 14 Sep 2023 16:00:04 +0000 (18:00 +0200)]
LU-16498 obdclass: change uc_lock to rwlock

Change the upcall cache uc_lock to a read-write lock so that threads
can get the read lock to do concurrent lookups in the upcall cache,
and only grab the write lock in the rare case when a new entry is
added or old entries are expired. That reduces serialization between
server threads during normal operation, and avoids all of the threads
spinning for some time if the requested key (UID or gss context) is
not in the cache at all, before they sleep.

Lustre-change: https://review.whamcloud.com/52395
Lustre-commit: TBD (from 003615a0a6711334d95c42f3c41852e1cbc8e77b)

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I812400104fd2115d19386fb4a03bb3ce99c49383
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53622
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
17 months agoLU-17374 gss: get rid of rsi cache entries after req handle
Sebastien Buisson [Mon, 18 Dec 2023 13:59:30 +0000 (14:59 +0100)]
LU-17374 gss: get rid of rsi cache entries after req handle

RPCSEC init requests are kept in the rsi cache. While this is useful
during request processing involving upcall/downcall with userspace,
rsi entries are never used again once RPCSEC init requests have been
handled completely.
And keeping entries in the rsi cache has some impact on authentication
speed. When a new RPCSEC init request is received, the first step is
to check if there is a valid matching entry in the cache. It is never
the case, except if an authentication request is replayed, but GSS
rejects that anyway.
So we spend time browsing a cache from which we expect no match. Even
if the upcall cache mechanism takes this lookup opportunity to remove
invalid or expired entries, it is even better to remove cache entries
as soon as we know they are done.

Lustre-change: https://review.whamcloud.com/53488
Lustre-commit: 7a56a689d4aa588bd003e35fdb93d87cf1e56d1d

Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ia9946578c3d3149e6235d832df28214ae8984f1e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53610
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>