Whamcloud - gitweb
fs/lustre-release.git
32 hours agoLU-16464 osp: fix off-by-one errors in oxe_can_hold() 17/49617/4 master
Nikitas Angelinas [Fri, 6 Jan 2023 19:01:52 +0000 (21:01 +0200)]
LU-16464 osp: fix off-by-one errors in oxe_can_hold()

There are a couple of off-by-one errors when calculating the required
buffer size in oxe_can_hold(), which can cause the xattr entry to be
reallocated unnecessarily.

HPE-bug-id: LUS-11423
Fixes: a1c5adf7f466 ("LU-14607 osp: separate buffer for large XATTR")
Change-Id: I486963066d7f8783ad64f1ea110fb73db0a8274b
Signed-off-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49617
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16456 tests: skip conf-sanity test_129/132 in interop 01/49601/2
Andreas Dilger [Wed, 11 Jan 2023 19:02:04 +0000 (12:02 -0700)]
LU-16456 tests: skip conf-sanity test_129/132 in interop

test_129 was added in commit v2_14_56-40-gcefabee52
test_132 was added in commit v2_14_56-96-ge26d7cc39
They should be skipped for older MDS versions.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=122-133 serverversion=2.14.0
Fixes: cefabee52 ("LU-15112 mgc: do not ignore target registration failure")
Fixes: e26d7cc399 ("LU-14399 hsm: process hsm_actions in coordinator")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If1e276c816ecf2f30dc970f9b5afe85d722540e5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49601
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16461 kfilnd: Modify peer credits and RX buffers 92/49592/2
Chris Horn [Tue, 3 Jan 2023 17:03:38 +0000 (10:03 -0700)]
LU-16461 kfilnd: Modify peer credits and RX buffers

It's desirable to lower peer credits because smaller values allow us
to cancel outstanding traffic to down peers faster (because there is
less traffic in flight). Testing shows that peer_credits 16 does not
perform any worse than our current default. Let's make 16 the
new default.

In addition, testing shows a benefit for further increasing the
default number of immediate receive buffers. Increase this to 8.

HPE-bug-id: LUS-11421
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I877fe408b276071f33a99c8b3b50d13f597aaa29
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49592
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-16451 kfilnd: Improve CQ error logging 89/49589/2
Chris Horn [Tue, 1 Nov 2022 19:39:39 +0000 (13:39 -0600)]
LU-16451 kfilnd: Improve CQ error logging

Improve CQ error logging for send events by printing the errno from
the CQ event as well as the provider error. This should allow us to
better root cause TN failures.

Also remove an extra newline character.

HPE-bug-id: LUS-11314
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I79bbe0312a9124dd34285d43b6e83f9d897923c1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49589
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-13530 build: Add kernel version to depmod 73/49573/4
Timothy Day [Fri, 6 Jan 2023 20:56:13 +0000 (20:56 +0000)]
LU-13530 build: Add kernel version to depmod

The depmod commands in the postrm and
postinst scripts should use the kernel
version the package is built against.
Otherwise, depmod will use the current
kernel version - which might be different.

This patch also adds a line indicating that
the file has been modified.

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I355420a85ea0ed301433816588758197795b5ede
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49573
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-by: Thomas Stibor <thomas@stibor.net>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16445 sec: make nodemap root squash independent of map_mode 61/49561/3
Sebastien Buisson [Thu, 5 Jan 2023 14:06:39 +0000 (15:06 +0100)]
LU-16445 sec: make nodemap root squash independent of map_mode

When the admin property is set to 0 on a nodemap, the root user must
be squashed, even if the map_mode property specifies to not map uids
or gids.

Enhance sanity-sec test_17 to exercise this use case.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I1b41caa1ccc6e544ce9fac45b47d0c4c129221f7
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49561
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16285 ldlm: send the cancel RPC asap 27/49527/8
Yang Sheng [Sat, 14 Jan 2023 17:56:14 +0000 (01:56 +0800)]
LU-16285 ldlm: send the cancel RPC asap

This patch try to send cancel RPC ASAP when bl_ast
received from server. The exist problem is that
lock could be added in regular queue before bl_ast
arrived since other reason. It will prevent lock
canceling in timely manner. The other problem is
that we collect many locks in one RPC to save
the network traffic. But this process could take
a long time when dirty pages flushing.

 - The lock canceling will be processed even lock has
   been added to bl queue while bl_ast arrived. Unless
   the cancel RPC has been sent.
 - Send the cancel RPC immediatly for bl_ast lock. Don't
   try to add more locks in such case.

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: Ie5efff3f1ed4e46448371185a0c08968233e7644
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49527
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-16415 quota: enforce project quota for root 60/49460/7
Sergey Cheremencev [Sat, 17 Dec 2022 21:42:10 +0000 (01:42 +0400)]
LU-16415 quota: enforce project quota for root

Patch adds an option to enforce project quotas for root.
It is disabled by default, to enable set
osd-ldiskfs.*.quota_slave.root_prj_enable to 1
at each target where you need this option.

Patch also adds sanity-quota_1j to test new feature.

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I978dc8442235149794f85110309f63bc560defdc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49460
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Shuichi Ihara <sihara@ddn.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-16367 misc: remove deprecated code 38/49338/4
Andreas Dilger [Sat, 26 Nov 2022 06:46:32 +0000 (23:46 -0700)]
LU-16367 misc: remove deprecated code

Remove code that is or will become deprecated in this release based
on the LUSTRE_VERSION_CODE checks.

Fixes: 53fa817657 ("LU-12514 llite: move client mounting from obdclass to llite")
Fixes: 3919a282ca ("LU-15106 ofd: quiet deprecated param warning")
Fixes: 115bba9ffb ("LU-11110 ofd: remove obdfilter.*.* symlinks in few releases")
Fixes: 73f15ad0f1 ("LU-9378 utils: split getstripe and find from lfs.1")
Fixes: 88d8f0f86b ("LU-11891 utils: getstripe use --mdt-index consistently")
Fixes: 78be823f33 ("LU-15218 quota: delete unused quota ID")
Fixes: 0c5fbd80f1 ("LU-5969 lustreapi: replace llapi_get_version()")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id59371084a102d6d8257c45b55d68077f2ce7057
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49338
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16345 ofd: ofd_commitrw_read() with non-existing object 55/49255/6
Alex Zhuravlev [Mon, 28 Nov 2022 09:17:25 +0000 (12:17 +0300)]
LU-16345 ofd: ofd_commitrw_read() with non-existing object

a client can get evicted during OST_READ's bulk so it's LDLM
lock is cancelled and OST_DESTOY can remove the object.
ofd_commitrw_read() still needs to release the buffers and
ignore the object doesn't exist.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ibe9413de41c23b1b4f6d52e9b17a06590b3c0726
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49255
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
32 hours agoLU-16338 readahead: add stats for read-ahead page count 24/49224/4
Qian Yingjin [Wed, 23 Nov 2022 09:42:53 +0000 (04:42 -0500)]
LU-16338 readahead: add stats for read-ahead page count

This patch adds the stats for read-ahead page count:

lctl get_param llite.*.read_ahead_stats
llite.lustre-ffff938b7849d000.read_ahead_stats=
snapshot_time             4011.320890492 secs.nsecs
start_time                0.000000000 secs.nsecs
elapsed_time              4011.320890492 secs.nsecs
hits                      4 samples [pages]
misses                    1 samples [pages]
zero_size_window          4 samples [pages]
failed_to_reach_end       1 samples [pages]
failed_to_fast_read       1 samples [pages]
readahead_pages           1 samples [pages] 255 255 255

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Iada06eb7d78ab28cfcc7167e49d25da252da4009
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49224
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16333 osc: page fault in osc_release_bounce_pages() 10/49210/4
Andriy Skulysh [Fri, 3 Jun 2022 12:28:14 +0000 (15:28 +0300)]
LU-16333 osc: page fault in osc_release_bounce_pages()

pga[i] can be uninitialized. It happens after following
code path in osc_build_rpc():

        OBD_SLAB_ALLOC_PTR_GFP(oa, osc_obdo_kmem, GFP_NOFS);
        if (oa == NULL)
                GOTO(out, rc = -ENOMEM);

Fixes: a9ed5b149646 ("LU-12275 sec: encryption for write path")
HPE-bug-id: LUS-10991
Signed-off-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Alexander Boyko <c17825@cray.com>
Change-Id: I7e21cb9ab69f0bce9c1bdb53a4b0ac7a673887cc
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49210
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-15495 tests: fixed dbench test 88/49088/23
Alex Deiter [Wed, 9 Nov 2022 17:06:37 +0000 (21:06 +0400)]
LU-15495 tests: fixed dbench test

* Using awk to get list shared libraries
* Fixed shellcheck warnings

Test-Parameters: trivial testlist=sanity \
clientdistro=el7.9 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el8.6 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el8.6 clientarch=aarch64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el8.6 clientarch=ppc64le \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=el9.0 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=sles12sp5 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=sles15sp4 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=ubuntu2004 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial testlist=sanity \
clientdistro=ubuntu2204 clientarch=x86_64 \
env=SLOW=yes,ONLY=71

Test-Parameters: trivial env=SLOW=yes,ONLY=26 testlist=replay-dual

Test-Parameters: trivial env=SLOW=yes,ONLY=70b testlist=replay-single

Test-Parameters: trivial env=SLOW=yes,ONLY=8 testlist=sanity-pfl

Test-Parameters: trivial env=SLOW=yes,ENABLE_QUOTA=yes,ONLY=8 \
testlist=sanity-quota

Signed-off-by: Alex Deiter <alex.deiter@gmail.com>
Change-Id: Ic28bd67dcfb5ff24e65e33278ac867409a2c1cc6
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49088
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16267 lnet: fix missing error check in LUTF 51/48951/2
Cyril Bordage [Tue, 25 Oct 2022 16:52:30 +0000 (18:52 +0200)]
LU-16267 lnet: fix missing error check in LUTF

In find_replace_file function, the file is opened with default
encoding option. If the file has a different encoding it will fail.
The solution is to use a try/except for UnicodeDecodeError and skip
bad encoded files.

Test-Parameters: @lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I9115d39414d31b628d550e8289b3193d13787288
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48951
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16228 utils: add lljobstat util 88/48888/28
Lei Feng [Mon, 17 Oct 2022 05:36:14 +0000 (13:36 +0800)]
LU-16228 utils: add lljobstat util

lljobstat util read datas from job_stats file(s),
parse, aggregate the data and list top jobs.

For example:
$ ./lljobstats -n 1 -c 3
---
timestamp: 1665984678
top_jobs:
- ll_sa_3508505.0: {ops: 64, ga: 64}
- touch.500:       {ops: 6, op: 1, cl: 1, mn: 1, ga: 1, sa: 2}
- bash.0:          {ops: 3, ga: 3}
...

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I0c4ac619496c184a5aebbaf8674f5090ab722d72
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48888
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-16115 build: Linux 5.17 external module support 60/48360/9
Shaun Tancheff [Mon, 3 Oct 2022 07:51:16 +0000 (14:51 +0700)]
LU-16115 build: Linux 5.17 external module support

Linux commit v5.16-rc3-26-g129ab0d2d9f3

Added quotes around "$(CONFIG_CC_VERSION_TEXT)", however
.config stores CONFIG_CC_VERSION_TEXT with quotes thus
breaking the GNU make Makefile for external modules.

We need to supply a non-quoted value and override the
value in .config before it is used.

Test-Parameters: trivial
HPE-bug-id: LUS-11190
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I997b6987ef37a8c5b9d8f0984e81d9402a2ea705
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-13485 ldiskfs: Parallel configure tests for ldiskfs 51/38351/30
Shaun Tancheff [Wed, 7 Dec 2022 02:42:33 +0000 (20:42 -0600)]
LU-13485 ldiskfs: Parallel configure tests for ldiskfs

Transform the compile tests in ldiskfs to run in parallel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I3a097ab5cd18b57e9311980d9aa708ed25f58464
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38351
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-13485 kernel: Parallel core configure tests 61/38361/37
Shaun Tancheff [Sun, 15 Jan 2023 02:42:31 +0000 (20:42 -0600)]
LU-13485 kernel: Parallel core configure tests

Transform the compile tests in lustre-core to run in parallel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I46fa852659eb4db86a12ec4ad3efddd0fdd3a655
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38361
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-13485 libcfs: Parallel configure tests for libcfs 49/38349/36
Shaun Tancheff [Sun, 15 Jan 2023 01:57:02 +0000 (19:57 -0600)]
LU-13485 libcfs: Parallel configure tests for libcfs

Transform the compile tests in libcfs to run in parallel

Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I76ab65558dd456dc08d6ef4a1985455ce1f17913
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/38349
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-16160 llite: SIGBUS is possible on a race with page reclaim 47/49647/2
Andrew Perepechko [Sun, 15 Jan 2023 16:55:58 +0000 (11:55 -0500)]
LU-16160 llite: SIGBUS is possible on a race with page reclaim

We can restart fault handling if page truncation happens
in parallel with the fault handler.

Change-Id: I6e60783e3334f87e799dc8b0e6e63d0bb358a236
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Signed-off-by: Patrick Farrell <farr0186@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49647
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-16480 lov: fiemap improperly handles fm_extent_count=0 45/49645/5
Andrew Perepechko [Mon, 16 Jan 2023 13:13:34 +0000 (08:13 -0500)]
LU-16480 lov: fiemap improperly handles fm_extent_count=0

FIEMAP calls with fm_extent_count=0 are supposed only to
return the number of extents.

lov_object_fiemap() attempts to initialize stripe_last
based on fiemap->fm_extents[0] which is not initialized
in userspace and not even allocated in kernelspace.

Eventually, the call exits with -EINVAL and "FIEMAP does
not init start entry" kernel log message.

Fixes: 409719608c ("LU-11848 lov: FIEMAP support for PFL and FLR file")
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Change-Id: I65e706b5dd5c8a6db90a539c2602af839b4da823
HPE-bug-id: LUS-11443
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49645
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
32 hours agoLU-9680 utils: new llapi_param_display_value(). 91/49491/7
James Simmons [Thu, 12 Jan 2023 14:02:54 +0000 (09:02 -0500)]
LU-9680 utils: new llapi_param_display_value().

Currently the special YAML handling is done in lustre_cfg.c
for param handling. Other functionality internal to
liblustreapi.so will use this as well so move the handling
internal to liblustreapi.so. Currently we only make the new
llapi_param_display_value() function visible only to the
liblustreapi internal code. Later when we support /sys access
we can make this available for general use.

The "lctl get_param" and "lctl list_param" generally worked
for non-root users, but not for parameters under
/sys/kernel/debug due to permission changes in the kernel.
We still lacked proper non-root access for lctl get_param and
lctl list_param. Implement full lctl get_param functionality
for non-root users. Also make lctl list_param work for
non-root users. These changes will also work with any
parameters implemented with Netlink.

Change-Id: Ifd9aad16decb0803a336314d4dea38664ff41aa4
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49491
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-14393 recovery: reply reconstruction for batched RPCs 28/48228/13
Qian Yingjin [Tue, 16 Aug 2022 07:57:47 +0000 (03:57 -0400)]
LU-14393 recovery: reply reconstruction for batched RPCs

Batched RPC can boost the metadata performance for Lustre
dramatically. However, it also increases the complexity of the
recovery, such as how to reconstruct the reply in case of the RPC
resend if the reply was lost.

In this patch, it adds a new field @lrd_batch_idx in the data
structure @lsd_reply_data to store each slot of the "reply_data"
file:
struct lsd_reply_data {
__u64 lrd_transno; /* transaction number */
__u64 lrd_xid; /* transmission id */
__u64 lrd_data; /* per-operation data */
__u32 lrd_result; /* request result */
__u32 lrd_client_gen; /* client generation */
__u32 lrd_batch_idx; /* index in a batched RPC */
__u32 lrd_padding[7]; /* unused fields */
};

When found that a batched RPC was a resend RPC request, and if
the index of the sub request in the batched RPC is smaller or
equal than @lrd_batch_idx in the reply data, it means that the sub
request has already executed, the server will reconstruct the
reply for this sub request; if the index is larger than
@lrd_batch_idx, the server will re-execute the sub reqeust in the
batched RPC.

Disable conf-sanity/32{a,b,c,d,e,f,g}, 108{a,b} temporarily until
the compatibility issue during upgrade for new reply data format
is fixed.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Id48ecc263002cb783f5032642d05e1f3f6673837
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48228
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
32 hours agoLU-14393 protocol: basic batching processing framework 78/41378/17
Qian Yingjin [Mon, 1 Feb 2021 03:51:08 +0000 (11:51 +0800)]
LU-14393 protocol: basic batching processing framework

Batching processing can obtain boost performace. The larger the
batch size, the higher the latency for the entire batch. Although
the latency for the entire batch of operations is higher than the
latency of any single operation, the throughput of the batch of
operations is much high.

This patch implements the basic batching processing framework for
Lustre. It could be used for the future batching statahead and
WBC.

A batched RPC does not require that the opcodes of sub requests in
a batch are same. Each sub request has its own opcode. It allows
batching not only read-only requests but also multiple
modification updates with different opcodes, and even a mixed
workload which contains both read-only requests and modification
updates.

For the recovery, only the batched RPC contains a client XID,
there is no separate client XID for each sub-request. Although the
server will generate a transno for each update sub request, but
the transno only stores into the batched RPC (in @ptlrpc_body)
when the sub update request is finished. Thus the batched RPC only
stores the transno of the last sub update request. Only the
batched RPC contains the @ptlrpc_body message field. Each sub
request in a batched RPC does not contain @ptlrpc_body field.

A new field named @lrd_batch_idx is added in the client reply data
@lsd_reply_data. It indicates the sub request index in a batched
RPC. When the server finished a sub update request, it will update
@lrd_batch_idx accordingly.
When found that a batched RPC was a resend RPC, and if the index
of the sub request in the batched RPC is smaller or equal than
@lrd_batch_idx in the reply data, it means that the sub request has
already executed and committed, the server will reconstruct the
reply for this sub request; if the index is larger than
@lrd_batch_idx, the server will re-execute the sub request in the
batched RPC.

To simplify the reply/resend of the batched RPCs, the batch
processing stops at the first failure in the current design.

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idaa814e82c968811bdda1c750b18c878b2c2ca67
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/41378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16452 kfilnd: Check replay deadline before send 93/49593/2
Chris Horn [Sat, 29 Oct 2022 22:30:17 +0000 (16:30 -0600)]
LU-16452 kfilnd: Check replay deadline before send

The LND timeout needs to account for the total time needed for bulk
operations to complete. On cassini, this can be ~120 seconds due to
the CXI retry-handler timeout on both the sender and target. i.e. LND
timeout is really the max round trip time, and (LND timeout)/2 is the
max one-way trip time.

When we replay a transaction we want to at least ensure we have enough
time to deliver the message to the receiver, as this gives us a
chance at still completing transactions. We should ensure that we
still have (LND timeout)/2 seconds remaining before posting a new
transaction.

Introduce kfilnd_transaction::tn_replay_deadline,
which is set to the transaction deadline minus (LND timeout)/2.

Check the replay deadline in kfilnd_tn_state_idle() before attempting
to post the transaction. If we've exceeded that deadline then fail
the transaction with -ETIMEDOUT and set a NETWORK_TIMEOUT health
status.

Modify the throttle check in kfilnd_tn_state_idle() to check
kfilnd_transaction::tn_replay_deadline instead of
kfilnd_transaction::deadline to determine when we should timeout
a transaction that is being throttled. Note, this check is switched
to using ktime_before() rather than ktime_after() since the case
is about checking whether we are currently before the deadline rather
than after it. The current code isn't wrong. It is just grammatically
awkward.

HPE-bug-id: LUS-11304
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I1911d51cee4acea20577e3fc45c99b8948b79523
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49593
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-16451 kfilnd: Throttle traffic to down peers 91/49591/2
Chris Horn [Fri, 28 Oct 2022 22:27:17 +0000 (16:27 -0600)]
LU-16451 kfilnd: Throttle traffic to down peers

If a transaction fails with -EHOSTUNREACH then this suggests the
target is actually down. We want to avoid consuming resources on the
local NIC by trying to send messages to down peers, so we will
require a hello handshake before injecting other new messages to a
peer we suspect is down.

Introduce a new kfilnd_peer state, KP_STATE_DOWN. Peers in either
KP_STATE_UPTODATE or KP_STATE_STALE can transition to KP_STATE_DOWN.
We'll transition a peer to KP_STATE_DOWN if we fail a transaction with
them and the errno we get is either -EHOSTUNREACH or -ENOTCONN.
kfilnd_peer_down() transitions a peer to KP_STATE_DOWN as appropriate.

Similar to stale peers, if we continue to fail transactions with peers
that are down then we want to eventually purge them from the peer
cache. This logic in kfilnd_peer_stale() is moved to
kfilnd_peer_purge_old_peer(), and this new function is called by both
kfilnd_peer_stale() and kfilnd_peer_down().

Introduce kfilnd_peer_needs_throttle() that determines whether we
should queue a message for future replay pending a successful
handshake. Integrate this into kfilnd_tn_state_idle() so that we queue
messages for peers in KP_STATE_DOWN in addition to peers in
KP_STATE_NEW. Modify debug statements in this area to remove redundant
info and reflect that we can hit these conditions for down peers, not
just new peers.

Introduce kfilnd_peer_tn_failed() to interpret the errno for a
transaction failure and call kfilnd_peer_down() or kfilnd_peer_stale()
as appropriate. This function replaces all existing calls to
kfilnd_peer_stale(). kfilnd_peer_stale() is now a static function.

HPE-bug-id: LUS-11314
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I206075c3a1b2836715dc79b49b098dab51c6bb94
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49591
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-16450 kfilnd: Cancel TNs if handshake fails 90/49590/2
Chris Horn [Tue, 25 Oct 2022 19:21:17 +0000 (13:21 -0600)]
LU-16450 kfilnd: Cancel TNs if handshake fails

When sending a message to a new peer a HELLO is sent first and the
original message waits for the handshake to complete. If the HELLO
fails to be sent then the original message will continue to wait for
the full LND timeout. When we retry the original message we should
check whether there is actually an outstanding HELLO. If not, then
this indicates the HELLO failed and we should cancel the TN.

HPE-bug-id: LUS-11310
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4ed07964d5af0bcc3bdca33c1ea46fd436af2e98
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49590
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-16455 tests: recovery-small test_139() fix 79/49579/5
Elena Gryaznova [Sun, 8 Jan 2023 18:05:19 +0000 (19:05 +0100)]
LU-16455 tests: recovery-small test_139() fix

mds device calculated before stop () can not be used
after stop() because of a device-mapper device is removed and
facet device is restored:
  stop () ->
    elif dm_flakey_supported $facet; then
      if [[ -n ${!failover_host} &&
               ${!failover_host} != ${!host} ]]
         dm_cleanup_dev $facet ->
               unexport_dm_dev $facet

Without this fix test_139 fails on failover setup:
  losetup: /dev/mapper/mds1_flakey: failed to set up loop device:
    No such file or directory

To reproduce the failure just run:
  sh llmountcleanup.sh
  ONLY=139 sh recovery-small.sh
on failover setup where mds1_HOST != mds1failover_HOST

Fixes: 4597fa7d88 ("LU-13061 osp: check catlog FID after reading in")
Test-Parameters: trivial testlist="recovery-small" failover=true iscsi=1 \
  env=ONLY=139,SLOW=yes mdssizegb=10 clientcount=4 osscount=2 mdscount=2 \
  mdtcount=2 austeroptions=-R
Test-Parameters: trivial testlist="recovery-small" failover=true iscsi=1 \
  env=FAILURE_MODE=HARD,ONLY=139,SLOW=yes mdssizegb=10 clientcount=4 \
  osscount=2 mdscount=2 mdtcount=2 austeroptions=-R
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Vladimir Saveliev <vlaidimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
HPE-bug-id: LUS-10912
Change-Id: I67d98f633de4023a4430b55c6b4d308c7f17d988
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49579
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
8 days agoLU-2771 ldlm: remove obsolete LDLM_FL_SERVER_LOCK 63/49563/2
Andreas Dilger [Thu, 5 Jan 2023 22:44:50 +0000 (15:44 -0700)]
LU-2771 ldlm: remove obsolete LDLM_FL_SERVER_LOCK

The LDLM_FL_SERVER_LOCK flag and accompanying accessor macros have
never been used since they were first introduced.  Remove them.
It looks like this may have been duplicated by LDLM_FL_NS_SRV.

Test-Parameters: trivial
Fixes: caa55aec4a ("LU-2771 dlmlock: compress out unused space")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Iffc9b126334a327a9054f9acae86f4a0d03ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49563
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-13642 lnet: modify lnet_inetdev to work with large NIDS 25/49525/5
James Simmons [Thu, 5 Jan 2023 17:59:24 +0000 (12:59 -0500)]
LU-13642 lnet: modify lnet_inetdev to work with large NIDS

Change li_ipv6 field in struct lnet_inetdev to li_size which
now represents the size of the NID address. This will work
with the GUID of Inifiniband as well. Second change is
to store li_ipaddr always in network format. This will allow
direct comparsion between li_ipaddr and the nid_addr of
struct lnet_nid. We will ensure AF_IB will also be in the
same format as what will be stored in struct lnet_nid.
Implement setup with a NID address for the ko2iblnd LND driver.

Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7c27edb67263dd5bda4728c536aee266d38a4592
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49525
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16380 osd-ldiskfs: race in OI mapping 14/49514/2
Lai Siyao [Sat, 17 Dec 2022 13:06:16 +0000 (08:06 -0500)]
LU-16380 osd-ldiskfs: race in OI mapping

There is race in OI scrub thread and OI mapping entry insertion, which
may add an inconsistent OI mapping entry, but not started OI scrub
thread. This may lead to osd_fid_lookup() always returns -EINPROGRESS.

To avoid such race, osd_fid_lookup() returns -EINPROGRESS only when
OI mapping is inconsistent, and OI scrub thread is not running.

Fixes: 558784caad ("LU-15643 osd-ldiskfs: don't trigger scrub on irreparable FIDs")
Test-Parameters: mdscount=2 mdtcount=4 testlist=conf-sanity env=ONLY=108b,ONLY_REPEAT=50
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I05114b6a33940c210e9952f6e24f6c36fd7f76a2
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49514
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16335 mdt: skip target check for rm_entry 29/49329/6
Lai Siyao [Wed, 7 Dec 2022 02:53:25 +0000 (21:53 -0500)]
LU-16335 mdt: skip target check for rm_entry

For "lfs rm_entry", target may not exist, sanity check of it may fail
thus causes rm_entry fail.

Add sanity 832.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I824c7581af05c7494cf03c0c9bc999ca1abfec01
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49329
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-16302 llite: Use alloc_inode_sb() to allocate inodes 70/49070/9
Shaun Tancheff [Fri, 2 Dec 2022 09:46:31 +0000 (03:46 -0600)]
LU-16302 llite: Use alloc_inode_sb() to allocate inodes

linux-commit: v5.17-49-g8b9f3ac5b01d
  fs: introduce alloc_inode_sb() to allocate filesystems specific
      inode

Filesystems are expected to use alloc_inode_sb to allocate inodes
for proper lru handling.

Test-Parameters: trivial
HPE-bug-id: LUS-11332
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ie6f091a01df33738ed2ef6f7fef9c1f9c1a51e03
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49070
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
8 days agoLU-16459 tests: fix YAML verification function 84/49584/4
Lei Feng [Tue, 10 Jan 2023 08:51:27 +0000 (16:51 +0800)]
LU-16459 tests: fix YAML verification function

YAML verification function is not correct in tests.
Fix it and change test case accordingly.

Fixes: bedb797c5d ("LU-16110 lprocfs: make job_stats and rename_stats valid YAML")
Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I109e2294aea3d1bffa08e6d2c39a5911fa8ef7df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49584
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-16239 tests: do not cleanup clients dirs 70/48870/4
Elena Gryaznova [Fri, 14 Oct 2022 14:51:03 +0000 (17:51 +0300)]
LU-16239 tests: do not cleanup clients dirs

Patch adds the ability to not remove the clients
directories. Let's just rename them if CLEANUP set to
false.

Test-Parameters: trivial
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11158
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ibc55d32ef4946a62b00dcbf745567c123650ced9
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48870
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16214 kfilnd: Proactively handshake old peers 86/48786/4
Chris Horn [Mon, 22 Aug 2022 19:43:36 +0000 (13:43 -0600)]
LU-16214 kfilnd: Proactively handshake old peers

If asked to send a message to a peer that we haven't communicated with
for some time, then we run the risk of that peer having a stale
(or missing) peer entry for us. This can result in the target peer
silently dropping our message. To reduce the chance of this happening
proactively handshake any peer we haven't talked to in the last 2x LND
timeouts.

Note, kfilnd_peer_needs_hello() is called on both the send and receive
path. We only want to proactively handshake on the send path, so an
argument is added to this function so it can distinguish between the
two situations.

HPE-bug-id: LUS-11125
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaacb48e5c45305869bd22335ce112b21cf67e848
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48786
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
8 days agoLU-16214 kfilnd: Keep stale peer entries 85/48785/4
Chris Horn [Fri, 19 Aug 2022 20:27:26 +0000 (14:27 -0600)]
LU-16214 kfilnd: Keep stale peer entries

A peer is currently removed from the cache whenever there is a network
failure associated with the peer. This leads to situations where
incoming messages from that peer will be dropped until a handshake
can be completed.

If we instead keep these stale peer entries then we at least have a
chance of completing future transactions with the peer.

To accomplish this, we introduce states to struct kfilnd_peer.

When a kfilnd_peer is newly allocated it is assigned a state of
KP_STATE_NEW. kfilnd_peer_is_new_peer() is modified to check for this
state rather than check if kp_version is set.

When a handshake is completed the peer is assigned a state of
KP_STATE_UPTODATE.

When a peer that is up-to-date experiences a failed network operation
then it is assigned a state of KP_STATE_STALE. kfilnd_peer_stale() is
introduced to set this state. Existing callers of kfilnd_peer_down()
are converted to call kfilnd_peer_stale(). kfilnd_peer_down() is
renamed to kfilnd_peer_del().

We will initiate a handshake to any peer that is in either
KP_STATE_NEW or KP_STATE_STALE. kfilnd_peer_needs_hello() is
modified accordingly.

struct kfilnd_peer::kp_last_alive is checked by kfilnd_peer_stale().
If we haven't heard from a stale peer within five LND timeout periods,
then that peer is deleted.

An additional kfilnd_peer_alive() call is added to
kfilnd_tn_state_idle() for the TN_EVENT_RX_HELLO case, so that
peer aliveness is updated when we receive a hello request or response.

HPE-bug-id: LUS-11125
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Icfb722e58fa334d983df02742dc456a55ac2abc3
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48785
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
8 days agoLU-16213 kfilnd: Finalize replay TNs with deleted peer 84/48784/4
Chris Horn [Mon, 15 Aug 2022 21:06:25 +0000 (15:06 -0600)]
LU-16213 kfilnd: Finalize replay TNs with deleted peer

If there are transactions on the replay queue awaiting a hello
response, and the peer is marked for removal (e.g. because the hello
TN failed) then let's finalize those TNs right away rather than wait
for them to hit the timeout.

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6dc77cadaf850ab9ec37bf50241074bc3f5650b5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48784
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
8 days agoLU-16213 kfilnd: Allow one HELLO in-flight per peer 83/48783/4
Chris Horn [Fri, 19 Aug 2022 19:48:37 +0000 (13:48 -0600)]
LU-16213 kfilnd: Allow one HELLO in-flight per peer

Allow one HELLO message to be in-flight, per peer, at one time.
Accomplished by adding a flag to struct kfilnd_peer to indicate
whether a hello request has been sent to the peer. Cleared if the
send fails or when the hello response is received.

To detect situation where hello response is never received we add
kp_hello_ts to struct kfilnd_peer to record timestamp of when the
hello request was sent. If this is more than LND timeout seconds in
the past then we may send another hello.

Fix return value of kfilnd_send() when we're unable to allocate a
kfilnd_tn for the hello.

There's some code duplication with updating a peer based on hello
request and response. Consolidate processing of these hello messages
into a single function.

A race exists where a peer can be marked for removal in between a call
to kfilnd_peer_needs_hello() and the call to kfilnd_tn_alloc() inside
kfilnd_send_hello_request(). This would cause a hello request to be
sent to a new peer, created by kfilnd_peer_get() inside
kfilnd_tn_alloc(), without properly setting the kp_hello_pending flag
on that new peer. To avoid this situation, introduce
kfilnd_tn_alloc_for_peer() which takes a struct kfilnd_peer pointer
as an argument to assign to kfilnd_transaction::tn_kp. Use this to
allocate the kfilnd_transaction for the hello request inside
kfilnd_send_hello_request().

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I6bb0928a629cb398c270366fae6d1040ad67df3f
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
8 days agoLU-16213 kfilnd: Fail sends of particular message type 82/48782/4
Chris Horn [Fri, 19 Aug 2022 16:48:01 +0000 (10:48 -0600)]
LU-16213 kfilnd: Fail sends of particular message type

Add ability to use failure injection to specify a message type for
simulated failure.

For example, to simulate failure of all immediate messages:
 lctl set_param fail_loc=0xF114 fail_val=1

To simulate failure of a single hello request:
 lctl set_param fail_loc=0x8000F114 fail_val=4

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4a20e92826df75812ef5b81979944526e4b94d83
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48782
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
8 days agoLU-16213 kfilnd: Add peer info to some debug statements 81/48781/4
Chris Horn [Thu, 18 Aug 2022 19:02:18 +0000 (13:02 -0600)]
LU-16213 kfilnd: Add peer info to some debug statements

Add kfilnd_peer pointer address to some debug statements.

Use 0x%llx format consistently when printing kfilnd_peer::kp_addr

Also add the message type to the TN debug macro.

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4410ca9215f9d0a6eb65e6d4f953234fa7fba5ea
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48781
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-16213 kfilnd: Rename struct kfilnd_peer members 80/48780/4
Chris Horn [Thu, 11 Aug 2022 19:03:04 +0000 (13:03 -0600)]
LU-16213 kfilnd: Rename struct kfilnd_peer members

Prefix members of struct kfilnd_peer with kp_ to make these variable
names easier to find.

Also use 'kp' as a standard name for pointers to struct kfilnd_peer
instead of 'peer' (again to make these pointers easier to find). As
such, struct kfilnd_transaction::peer is also renamed to
struct kfilnd_transaction::tn_kp.

HPE-bug-id: LUS-11128
Test-Parameters: trivial
Change-Id: Id535c7af28a5335026037a55920c706a4e16f947
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Ron Gredvig <ron.gredvig@hpe.com>
Reviewed-by: Ian Ziemba <ian.ziemba@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-15163 osd: osd_obj_map_recover() to restart transaction 68/45368/8
Alex Zhuravlev [Tue, 26 Oct 2021 08:38:50 +0000 (11:38 +0300)]
LU-15163 osd: osd_obj_map_recover() to restart transaction

osd_obj_map_recover() stops transaction when need to call
vfs_link() and it has to start a new transaction to modify
filesystem.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6efe5444ddc959b19092bebc6e3c7dc25a29cea1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45368
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-14692 tests: allow FID_SEQ_NORMAL for MDT0000 93/46293/27
Li Dongyang [Tue, 25 Jan 2022 00:53:33 +0000 (11:53 +1100)]
LU-14692 tests: allow FID_SEQ_NORMAL for MDT0000

Fix the tests asssuming objects created for MDT0000
always have a seq number of 0, to prepare for
deprecating IDIF sequence.

Fix sanity test_312 on ZFS to properly identify which
OST the object was created on, and re-enable it.

Test-Parameters: testlist=sanity env=ONLY="39r 312"
Test-Parameters: testlist=sanity-scrub env=ONLY=19
Test-Parameters: testlist=sanity-sec env=ONLY=37
Change-Id: I4bffabe25a6f84cdba760aabea1da3429715a283
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/46293
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16460 lnet: validate data sent from user land properly 88/49588/6
James Simmons [Thu, 12 Jan 2023 14:34:10 +0000 (09:34 -0500)]
LU-16460 lnet: validate data sent from user land properly

Testing with improper setting from user land exposed some bugs in
the kernel's code handling of these cases. For tunables sent from
user land we need to do proper range checking. An improper cast
in the new Netlink tunables code preventing setting the default
LND tunable settings. Also silently ignore trying to set LND
tunables when its not supported. We shouldn't stop NI setup in
this case. Lastly setup the NI tunables to -1 when user land
doesn't provide any input. This tells the LND driver to use it
default values for the tunables. Resolve a double free when
setting up a NI with a non-existing interface. Another fix is for
net locking in lnet_net_cmd().

For lnetctl fix the YAML handling when only conns_per_peer is
requested. I only tested conns_per_peer and NI tunables changes
together before which missed the mentioned case.

Fixes: 8f8f6e2f3 ("LU-10003 lnet: use Netlink to support old and new NI APIs.")
Test-Parameters: trivial testlist=sanity-lnet
Change-Id: I7c5e993de57e3d674ecb8e3cc1bd62506470d416
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49588
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-14598 tests: skip conf-sanity test_122b in interop 83/49583/3
Andreas Dilger [Mon, 9 Jan 2023 23:27:46 +0000 (16:27 -0700)]
LU-14598 tests: skip conf-sanity test_122b in interop

Code was fixed in 2.15.0.

Test-Parameters: trivial testlist=conf-sanity env=ONLY=122b serverversion=2.14.0
Fixes: 747fed818b ("LU-14598 ofd: fix for IDIF sequence at ofd_preprw_write")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6d9480f4b43706b597df6bd74c65959776cf2b5b
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16160 revert: "llite: clear stale page's uptodate bit" 41/49541/4
Bobi Jam [Tue, 3 Jan 2023 05:57:24 +0000 (13:57 +0800)]
LU-16160 revert: "llite: clear stale page's uptodate bit"

This reverts commit 5b911e03261c3de6b0c2934c86dd191f01af4f2f
which caused a bug in cl_page_own() race with ll_releasepage()
and cl_pagevec_put() assertion failure.

Signed-off-by: Bobi Jam <bobijam@whamcloud.com>
Change-Id: Icdb8c60f4d992c9976670e1b06c5bab5ef3a3954
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49541
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
8 days agoLU-6142 osc: tidy up osc_init() 58/49458/2
Mr. NeilBrown [Tue, 20 Dec 2022 17:03:32 +0000 (12:03 -0500)]
LU-6142 osc: tidy up osc_init()

A module_init() function that registers the services
of the module should do that last, after all other
initialization has succeeded.
This patch moves the class_register_type() call to the
end and ensures everything else that might have been
set up, is cleaned up on error.

Linux-commit: e67f133d02e ("staging: lustre: osc: tidy up osc_init()")

Change-Id: I2a5ffb116c6d7c33a4530bab6e89a5ffe6117cea
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49458
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-10003 lnet: use Netlink to support LNet ping commands 60/49360/6
James Simmons [Wed, 4 Jan 2023 00:28:55 +0000 (19:28 -0500)]
LU-10003 lnet: use Netlink to support LNet ping commands

Completely replace the old pre-MR ping command ioctl using
Netlink which will also handle large NIDs. We do update
IOC_LIBCFS_PING_PEER, which only supports only small NIDs,
so older tools will keep working.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: serverversion=2.12 serverdistro=el7.9 testlist=runtests
Test-Parameters: clientversion=2.12 testlist=runtests
Change-Id: Ic82a18dc38e4bd4e78bf61da766f7a847da509a8
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-6142 ldlm: use list_first_entry in ldlm_lock 59/49359/3
Mr. NeilBrown [Sat, 10 Dec 2022 14:27:08 +0000 (09:27 -0500)]
LU-6142 ldlm: use list_first_entry in ldlm_lock

This make the code (slightly) more readable.

Linux-commit: ef7e70a ("staging: lustre: ldlm: use list_first_entry in ldlm_lock")

Change-Id: If9789fef1dec55d08dec25819aaf5152946819c5
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49359
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-6142 ldlm: tidy list walking in ldlm_flock() 58/49358/2
Mr. NeilBrown [Sat, 10 Dec 2022 14:17:46 +0000 (09:17 -0500)]
LU-6142 ldlm: tidy list walking in ldlm_flock()

Use list_for_each_entry variants to
avoid the explicit list_entry() calls.
This allows us to use list_for_each_entry_safe_from()
instread of adding a local list-walking macro.

Also improve some comments so that it is more obvious
that the locks are sorted per-owner and that we need
to find the insertion point.

Linux-commit: 3ac5a67 ("staging: lustre: ldlm: tidy list walking in ldlm_flock()")

Change-Id: Ie9a756a898a9c58db1b4f446694603a4efa37352
Signed-off-by: Mr. NeilBrown <neilb@suse.de>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49358
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andriy Skulysh <andriy.skulysh@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16369 ldiskfs: do not check enc context at lookup 24/49324/3
Sebastien Buisson [Tue, 6 Dec 2022 16:36:02 +0000 (17:36 +0100)]
LU-16369 ldiskfs: do not check enc context at lookup

On rhel8, ldiskfs should not check for encryption context of inodes
upon lookup. On these kernels, ext4 is not encryption aware, so just
assume context is fine when target is mounted as ldiskfs.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I9f9813d290ea24b34f710e2c8219e856ca8fbc58
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49324
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-8915 lnet: migrate LNet selftest group handling to Netlink 14/49314/5
James Simmons [Wed, 14 Dec 2022 19:31:26 +0000 (14:31 -0500)]
LU-8915 lnet: migrate LNet selftest group handling to Netlink

Replace the LSTIO_GROUP_LIST and LSTIO_GROUP_INFO ioctls with a Netlink
backend. Make this transitition transparent to the user. Be aware this
newer version of lnet_selftest.ko doesn't support older versions of the
lst tool. While the old interface allows only setting one group up at
a time the Netlink interface can be used to setup many groups at one
time. Currently we don't change the interface to handle larger NIDs but
this new interface will allow us to use the new NID format in a follow
on patch.

Change-Id: I18f07b380d353425c6e127e4fbd0f30e41f66944
Test-Parameters: trivial testlist=lnet-selftest
Signed-off-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49314
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
8 days agoLU-16444 enc: null-enc names cannot be digested form 50/49550/4
Sebastien Buisson [Wed, 4 Jan 2023 15:10:02 +0000 (16:10 +0100)]
LU-16444 enc: null-enc names cannot be digested form

When encrypted files have their names encrypted, long names are in
digested form in case access is done without the encryption key. The
digest is base64-encoded, and prepended with '_'.
With null encryption for file names, names are always plain text. In
this case, a legitimate '_' at the start of a name must not be
interpreted as a digested form.

sanity-sec test_54 is improved to test the case of a file whose name
starts with '_'.

Fixes: f18c87cb53 ("LU-13717 sec: handle null algo for filename encryption")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Idaad186afd06cfbabbe1d13e78f083d12876c8ff
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49550
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: jsimmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
10 days agoLU-16026 llite: always enable remote subdir mount 35/48535/7
Lai Siyao [Sun, 28 Aug 2022 19:33:29 +0000 (15:33 -0400)]
LU-16026 llite: always enable remote subdir mount

For historical reason, ROOT is revalidated with IT_LOOKUP in
.permission to ensure permission is update to date because ROOT is
never looked up. But ROOT FID and layout is not changeable, it's
PERM lock that should be revalidated, i.e., revalidate with
IT_GETATTR instead of IT_LOOKUP.

Since PERM|UPDATE lock is on the MDT where object is located, client
can cache this lock, therefore remote subdir mount doesn't need to
lookup ROOT in each file access.

Deprecate mdt.*.enable_remote_subdir_mount.

Per http://review.whamcloud.com/19195, replace 'df' with 'lfs df' in
sanity 228b since the former doesn't support transparent recovery.

Add sanity 247h.

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I66f8ee347f6c01a8a154245b10a1d93539ea13b8
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
11 days agoLU-14824 Revert "test: sanity 413a/b unlink timeout" 46/49646/4
Andreas Dilger [Mon, 16 Jan 2023 17:58:24 +0000 (17:58 +0000)]
LU-14824 Revert "test: sanity 413a/b unlink timeout"

This reverts commit 5ff3e400f1a74ea49b7eb9cf19715f0fae08c3f5.
The test_413a is timing out regularly for ldiskfs MDTs.

Change-Id: Iafd28ec648f0b30b3c9e48e8f8479979a8cb0d60
Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 fstype=ldiskfs testlist=sanity env=ONLY="413a 413b"
Test-Parameters: mdscount=2 mdtcount=4 fstype=zfs testlist=sanity env=ONLY="413a 413b"
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49646
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-16286 ldiskfs: reimplement nodelalloc optimization 07/49007/7
Andrew Perepechko [Tue, 1 Nov 2022 16:26:54 +0000 (19:26 +0300)]
LU-16286 ldiskfs: reimplement nodelalloc optimization

fiemap calls perform costly delayed extent search affecting
BRW performance, however, in Lustre we don't use delayed
allocation at all. Let's skip this search completely as we did
in RHEL7.

Change-Id: I2c3562cf5cbdf3c5532e4b79b28a040a995322b7
Signed-off-by: Andrew Perepechko <andrew.perepechko@hpe.com>
HPE-bug-id: LUS-11161
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49007
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
2 weeks agoLU-16272 libcfs: cfs_hash_for_each_empty optimization 72/48972/4
Alexander Zarochentsev [Thu, 20 Oct 2022 19:23:39 +0000 (22:23 +0300)]
LU-16272 libcfs: cfs_hash_for_each_empty optimization

Restarts from bucket 0 in cfs_hash_for_each_empty()
cause excessive cpu consumption while checking first empty
buckets.

HPE-bug-id: LUS-11311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ic03875ea25101052468213043128912ac46daf32
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48972
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16440 tests: recovery-double-scale typo fix 44/49544/3
Elena Gryaznova [Tue, 3 Jan 2023 18:37:32 +0000 (19:37 +0100)]
LU-16440 tests: recovery-double-scale typo fix

Fix the typo.

Fixes: f8e56a25cfc3 ("LU-15412 tests: Let init_clients_lists() export client vars")
Test-Parameters: trivial testlist=recovery-double-scale
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-11422
Change-Id: I91a0c545f1eb82e6b502d9b0dc434fdb174db295
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49544
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: xinliang <xinliang.liu@linaro.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-15626 tests: Fix shellcheck error for rpc 35/49535/3
Timothy Day [Fri, 30 Dec 2022 21:29:59 +0000 (21:29 +0000)]
LU-15626 tests: Fix shellcheck error for rpc

This patch addresses the errors and warnings
reported by shellcheck for rpc.sh. It also
breaks up the triple nested subshell for better
readability.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I0d4afa83a6b9d4f825f31896a52dd30319b4bf51
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49535
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 weeks agoLU-15828 o2iblnd: reset hiw proportionally 97/49497/4
Serguei Smirnov [Thu, 22 Dec 2022 22:42:48 +0000 (14:42 -0800)]
LU-15828 o2iblnd: reset hiw proportionally

As a result of connection negotiation, queue depth may end up
being shorter than "peer_tx_credits" tunables value. Before this
patch, the high-water mark "lnd_peercredits_hiw" would be set at
min(current hiw, queue depth - 1).

For example, considering that hiw is allowed to only be as low as
half of peer_tx_credits, negotiating queue_depth/peer_credits down
from 32 to 8 would always result in hiw set at 7, i.e. credits would
be released as late as possible.

With this patch, if queue depth is reduced, hiw is set proportionally
relative to the level it was at before:
hiw = (queue_depth * lnd_peercredits_hiw) / peer_tx_credits

Using the above example with queue depth initially at 32, negotiating
down to 8 would result in hiw set to 4 if "lnd_peercredits_hiw" is
initially at 16, 17, 18, 19; hiw set to 5 if "lnd_peercredits_hiw" is
initially at 20, 21, 22, 23, and so on.

Test-Parameters: trivial
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I633933d7448db1ca88d3c65de9c29e870ca2c9fb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49497
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16382 config: ensure lutf.sh is included in dist 82/49382/2
Mr NeilBrown [Tue, 13 Dec 2022 00:59:26 +0000 (11:59 +1100)]
LU-16382 config: ensure lutf.sh is included in dist

The official 2.15.1 source distribution does not contain
lutf.sh.  As lustre.spec lists it (when LUTF is enabled) this causes a
build error.

It is likely not included because "./configure --enable-dist" was run
in a context where swig was not installed.

So when determining whether to enable lutf, first check for
enable_dist and in the case for enable_lutf="yes"

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: If5f856985a6d642822baba4b6ee301c04f851217
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49382
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16385 obdlcass: stop MGC before MGS 78/49378/4
Alex Zhuravlev [Mon, 12 Dec 2022 13:35:41 +0000 (16:35 +0300)]
LU-16385 obdlcass: stop MGC before MGS

drops a reference to MGC when MGS is being umounted so that
MGC doesn't try to disconnected from a missing MGS which
can take long and hurt HA.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ib15f1ca56c47201bf6e29c12b3f81a11e55944ca
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49378
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
2 weeks agoLU-14555 lnet: asym route inconsistency warning 52/49352/3
Gian-Carlo DeFazio [Thu, 8 Dec 2022 23:17:26 +0000 (15:17 -0800)]
LU-14555 lnet: asym route inconsistency warning

remove LNET_UNDEFINED_HOPS from lnet_check_route_inconsistency()
where it is being treated as equivalent to 1 for the
value of lr_hops.

Due to the changes made in commit 3f2844dc9
"LU-14945 lnet: don't use hops to determine the route state",
LNET_UNDEFINED_HOPS is no longer considered equivalent to 1
for lr_hops in all cases, and it is valid to leave hops undefined
for multi-hop routes.

Therefore, having a multi-hop route with a hops of
LNET_UNDEFINED_HOPS is no longer inconsistent.

Fixes: 6ab060e58e ("LU-14555 lnet: asym route inconsistency warning")
Test-Parameters: trivial
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: Iab8597f59c5f8d27b16dbeda79b41e9ec4777f52
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49352
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16284 utils: lfs getstripe follows symlink 03/49003/6
Lei Feng [Tue, 1 Nov 2022 02:57:39 +0000 (10:57 +0800)]
LU-16284 utils: lfs getstripe follows symlink

'lfs getstripe' prints the information of symlink target by default.
With '--no-follow' option it prints the information of symlink itself.

Signed-off-by: Lei Feng <flei@whamcloud.com>
Test-Parameters: trivial
Change-Id: I6cef01af5bb2235bdcbf0b5c99af4b9ed5869515
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49003
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16202 build: bio_alloc takes struct block_device 20/48820/9
Shaun Tancheff [Mon, 14 Nov 2022 09:30:23 +0000 (03:30 -0600)]
LU-16202 build: bio_alloc takes struct block_device

Linux commit v5.17-rc2-21-g07888c665b40
   block: pass a block_device and opf to bio_alloc

Create a compatible bio_alloc wrapper to handle the change
in arguments and behavior.

HPE-bug-id: LUS-11267
Test-Parameters: trivial
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I060229b25785f46a9749fcdb18727af292a940ac
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48820
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
2 weeks agoLU-16165 sec: retry mechanism for identity cache 79/48579/4
Sebastien Buisson [Fri, 16 Sep 2022 16:02:51 +0000 (18:02 +0200)]
LU-16165 sec: retry mechanism for identity cache

Implement a retry mechanism in the identity cache in case the
identity up call times out.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ib70d3b851a6da3cf66dfed49b03be51da7886d01
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48579
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16121 llite: invalidate_folio and dirty_folio 66/48366/13
Shaun Tancheff [Tue, 8 Nov 2022 15:26:46 +0000 (09:26 -0600)]
LU-16121 llite: invalidate_folio and dirty_folio

linux commit v5.17-rc4-10-g128d1f8241d6
   fs: Add invalidate_folio() aops method

A struct folio is often analogous to a struct page however
a struct folio can represent (contain) multiple pages.

linux commit v5.17-rc4-38-g6f31a5a261db
   fs: Add aops->dirty_folio

__set_page_dirty_nobuffers() is replaced with filemap_dirty_folio()

Test-Parameters: trivial
HPE-bug-id: LUS-11197
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Iefe67615b333e066c49c4b884dad5bea3b3ae226
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48366
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
2 weeks agoLU-16342 mdt: not copy pool_name to quotactl in reply 42/49242/5
Sergey Cheremencev [Fri, 15 Jul 2022 10:06:43 +0000 (13:06 +0300)]
LU-16342 mdt: not copy pool_name to quotactl in reply

Don not copy pool_name in mdt reply to avoid out-of-bounds:
BUG: KASAN: slab-out-of-bounds in mdt_quotactl+0x13ff/0x1430 [mdt]

HPE-bug-id: LUS-10579
Change-Id: I34c4cd8aaccd938c95005dca06644e02132def34
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Reviewed-on: https://es-gerrit.dev.cray.com/160899
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Andriy Skulysh <c17819@cray.com>
Tested-by: Vitaly Fertman <c17818@cray.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49242
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
2 weeks agoLU-16091 enc: S_ENCRYPTED flag on OST objects for enc files 98/48198/12
Sebastien Buisson [Thu, 11 Aug 2022 15:08:11 +0000 (17:08 +0200)]
LU-16091 enc: S_ENCRYPTED flag on OST objects for enc files

Add a dumb encryption context on OST objects being created, when the
LUSTRE_ENCRYPT_FL flag gets set in the LMA, for ldiskfs backend
targets. This leads ldiskfs to internally set the LDISKFS_ENCRYPT_FL
flag on the on-disk inode. Also, it makes e2fsprogs happy to see an
enc ctx for an inode that has the LDISKFS_ENCRYPT_FL flag.

Add a dumb encryption context on OST objects being opened, if there is
not already one, for ldiskfs backend targets. This is done by adding
the LUSTRE_ENCRYPT_FL flag if necessary, at the same time as atime
gets updated. It is some sort of live self-check that fixes OST
objects created with an older Lustre version.

Enhance lfsck to detect and fix OST objects belonging to encrypted
files that are missing the encryption flag. This is implemented in the
MDT-OST consistency routine, as part of the layout checking.

Also add sanity-sec test_62 and sanity-lfsck test_42 to exercise this.

Note this patch does not add any dumb encryption context on OST
objects when the backend is ZFS.

Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Test-Parameters: testlist=sanity-sec mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 fstype=zfs
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I6bee3c82ee4d1a52275facf9e2b0d60061e0beef
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48198
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16008 tests: don't enforce umount in recovery-small/150 40/47940/5
Alex Zhuravlev [Wed, 13 Jul 2022 07:50:24 +0000 (10:50 +0300)]
LU-16008 tests: don't enforce umount in recovery-small/150

as such an enforcement disconnects all MDS clients, then
another MDS trying to talk to that original MDS gets evicted
and an unlucky RPC (e.g. rmdir in test cleanup) can fail with:
rm: cannot remove '...d110h.recovery-small/source_dir': Is a directory

Fixes: 57f3262baa7 ("LU-15788 lmv: try another MDT if statfs failed")

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I593e1425b44fc19cb7b2b7da33fa10590532f930
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47940
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-930 misc: improve .mailmap coverage 94/47894/5
Andreas Dilger [Mon, 29 Jun 2020 07:23:04 +0000 (01:23 -0600)]
LU-930 misc: improve .mailmap coverage

Improve .mailmap coverage and correctness for "git shortlog"
and related commands.

Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I41a2474f2c69e1e49b5f8569ca6cc7bfcf3ebbe5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47894
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Peter Jones <pjones@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-14824 test: sanity 413a/b unlink timeout 55/45955/16
Lai Siyao [Wed, 22 Dec 2021 12:26:49 +0000 (07:26 -0500)]
LU-14824 test: sanity 413a/b unlink timeout

Unlinking remote/striped directories is slow on zfs system, limit
total directory number for 1-stripe directory test in 413a/b on zfs
system, and don't test striped directory to avoid timeout.

Also limit total stripe object count to avoid timeout.

Test-Parameters: trivial
Test-Parameters: mdscount=2 mdtcount=4 fstype=ldiskfs testlist=sanity env=ONLY="413a 413b",ONLY_REPEAT=50
Test-Parameters: mdscount=2 mdtcount=4 fstype=zfs testlist=sanity env=ONLY="413a 413b",ONLY_REPEAT=50
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ie116e6df5aee3877ed9f093f58e7bd71f6c6d9d5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45955
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
2 weeks agoLU-14683 tests: get rid of no longer actual test 76/43676/6
Elena Gryaznova [Wed, 12 May 2021 18:59:23 +0000 (21:59 +0300)]
LU-14683 tests: get rid of no longer actual test

replay-single test_40() is no longer actual for
modern Lustre with Layout lock support.

Fixes: 945a97dbc2f0 ("LU-2628 tests: disable test_40 of replay-single")
Test-Parameters: trivial testlist=replay-single env=ONLY=40
Signed-off-by: Elenai Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-9970
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I51c3a05ef40f389535e04bd50cdf9fe51bca8acd
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/43676
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-16210 llite: replace selinux_is_enabled() 75/48875/10
Etienne AUJAMES [Thu, 6 Oct 2022 13:30:54 +0000 (15:30 +0200)]
LU-16210 llite: replace selinux_is_enabled()

selinux_is_enabled() was removed from kernel 5.1.
The commit 39e5bfa add the kernel support by assuming SELinux to be
enabled if the function selinux_is_enabled() does not exist.

This has performances impacts: on older kernel (e.g: Centos7) getxattr
RPCs was not send for "security.selinux" if selinux was disabled.
Utilities like "ls -l" always try to get "security.selinux".
See the LU-549 for more information.

This patch uses security_inode_listsecurity() when mounting the
client to know if a LSM module (selinux) required a xattr to store
file contexts. If a xattr is returned we store it and use it for in
request security context.

For getxattr/setxattr we use the stored LSM's xattr to filter xattr
security contexts like security.selinux. If xattr does not match the
stored xattr name we returned -EOPNOTSUPP to userspace.

It adds also the s_security check for security_inode_notifysecctx() to
avoid calling this function if selinux is disabled (as in
nfs_setsecurity()).

For "Enforcing SELinux Policy Check" functionnality, the selinux check
have been moved in l_getsepol: -ENODEV is returned if selinux is
disabled.

Add a regresion test "sanity test_434" for this use case.

*Note:*
This patch detects that selinux is disabled without explicitly
disabled it in kernel cmdline. This is recommended for RHEL >= 8.5.

*Performances:*
Tests with "strace -c ls -l" with 100000 files on root in a multi VMs
env (on Rocky 9). FS is remount for each tests (cache is cleaned) and
selinux is disabled.
 __________________ ___________ _________
| Total time %     | lgetxattr | statx   |
|__________________|___________|_________|
|Without the patch:|    29%    |   51%   |
|__________________|___________|_________|
|With the patch:   |    0%     |   87%   |
|__________________|___________|_________|
"ls -l" uses lgetxattr to get "security.selinux".

Linux-commit: 3d252529480c68bfd6a6774652df7c8968b28e41

Fixes: 39e5bfa ("LU-12355 llite: include file linux/selinux.h removed")
Fixes: 9bcac0b ("LU-549 llite: Improve statfs performance if selinux is disabled")
Test-Parameters: clientselinux=false clientdistro=el7.9 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux=false clientdistro=el8.5 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux=false clientdistro=el8.6 testlist=sanity env=ONLY=434,ONLY_REPEAT=20
Test-Parameters: clientselinux clientdistro=el8.6 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el8.6 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el7.9 testlist=sanity-selinux
Test-Parameters: clientselinux clientdistro=el7.9 testlist=sanity-selinux
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I4dac87ac0341b45a1c2fef836cdce0361017b3f5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48875
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
2 weeks agoLU-16227 utils: Add warning for lfs setdirstripe -D -i x,y,z 20/49420/6
Timothy Day [Thu, 15 Dec 2022 19:47:57 +0000 (19:47 +0000)]
LU-16227 utils: Add warning for lfs setdirstripe -D -i x,y,z

Adjust setdirstripe to be more user friendly. The
use of "-D -i x,y,z" now returns a clear error
that this is creating a default striped directory
layout and that this is a bad idea, if it is not
accompanied by "-c N" that matches the number of
index values given.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic9f91853d4016bf0edfb3845ac9f1edafdf73d55
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49420
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
2 weeks agoLU-16082 ldiskfs: Large EA upgrade test 50/48350/11
Alexander Zarochentsev [Sat, 20 Aug 2022 11:34:25 +0000 (14:34 +0300)]
LU-16082 ldiskfs: Large EA upgrade test

Check whether old Lustre-only ea inodes
are accessible under new ext4 versions;
additional fixes for 32newtarball test
to work with dm-flakey devices;
32newtarball now creates ldiskfs fs with
ea_inode fs feature enabled;
disk2_12 ldiskfs image is replaced by
a new disk image with a large xattr test
file;
Fix FLR file creation in 32newtarball test.

Test-Parameters: env=ONLY=32 testlist=conf-sanity serverdistro=el7.9
Test-Parameters: env=ONLY=32 testlist=conf-sanity serverdistro=el8.5
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Id8c33b91f7ca7d68a97384dce8922dd25e8ecd68
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48350
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Elena Gryaznova <elena.gryaznova@hpe.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
3 weeks agoLU-16439 socklnd: clarify error message on timeout 40/49540/3
Aurelien Degremont [Mon, 2 Jan 2023 16:26:15 +0000 (16:26 +0000)]
LU-16439 socklnd: clarify error message on timeout

When the local peer times out when writing
to another peer, prints an explicit error message
rather than a generic one. This is make it clearer
for admins and easier to debug.

Add port to help determining if this is always
the same one or not.

Test-Parameters: trivial
Change-Id: Iaefbc601963b50293743a22ff9329018e8a5fc4f
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49540
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16438 llite: remove false outdated comment 39/49539/3
Aurelien Degremont [Mon, 2 Jan 2023 16:04:31 +0000 (16:04 +0000)]
LU-16438 llite: remove false outdated comment

Old commit 99727c7a1 from Lustre 2.6 changed
ll_i2gids() behavior without updating the function
documentation accordingly. Fix it as this is confusing.

Test-Parameters: trivial
Fixes: 99727c7 ("LU-4476 kernel: support process namespace containers")
Change-Id: Iccc50fe6ac9e02de9bae7fd8f91e3e73ff45e327
Signed-off-by: Aurelien Degremont <degremoa@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49539
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x 41/49441/5
Li Dongyang [Mon, 19 Dec 2022 10:03:47 +0000 (21:03 +1100)]
LU-16413 osd-ldiskfs: fix T10PI for CentOS 8.x

Recreate the currently broken lustre kernel patches
to allow using custom integrity functions for bio.
Note we don't need to save the generate_fn anymore,
it will be used once we call bio_integrity_prep_fn().

Add upstream fix
b13e0c718568 ("block: bio-integrity: Advance seed correctly
for larger interval sizes") for CentOS 8.0 to 8.6.

Handle the kernel api changes for the T10PI generate and
verify functions introduced in CentOS 8.x kernel,
mostly because of switching to blk_integrity_iter.

Update the custom generate and verify functions, to sync
with upstream versions.
- Add T10-DIF-TYPE2, currently only a place holder,
  not used in upstream either.
- Use __be16 instead of __u16 for guard tags.

Only reuse guard tags if the rpc checksum is the same
one supported on the target. We already have some protection
during checksum type negotiation, the server
will mark the target's T10PI type as the only
T10PI checksum type supported. But it's still good to
have the logic in place.

Do not call bio_integrity_prep() if the custom interface
bio_integrity_prep_fn() does not exist, submit_bio() will
do that for us.

On the servers, show the target's T10PI checksum as
the preferred checksum_type even if it's not the fastest.
Note this is only cosmetic and does not impact the checksum
type used, which is still done during negotiation.

Change-Id: I2d0ba0b80ba9cde2977da24db08095671aa5373c
Test-Parameters: trivial
Fixes: 293844d132 ("LU-16222 kernel: RHEL 8.7 client and server support")
Fixes: f176efd183 ("LU-12269 kernel: RHEL 8.0 server support")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49441
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-14409 ldiskfs: remove stray tracing code 83/49383/3
Mr NeilBrown [Tue, 13 Dec 2022 03:35:30 +0000 (14:35 +1100)]
LU-14409 ldiskfs: remove stray tracing code

These lines should never have landed :-(

Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Fixes: 3a83078628a4 ("LU-14409 ldiskfs: Add support for SUSE 5.3.18-24.46.1")
Change-Id: I7720158605cce81721738a5f6640ccb4e0440b09
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49383
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
3 weeks agoLU-16387 lustre: switch OBD_ALLOC_LARGE to vmalloc faster 80/49380/3
Alexander Zarochentsev [Tue, 6 Dec 2022 17:10:41 +0000 (20:10 +0300)]
LU-16387 lustre: switch OBD_ALLOC_LARGE to vmalloc faster

No need to waste time trying hard to kmalloc large memory
chunk in OBD_ALLOC_LARGE. Reduce memory allocation attempts
by specifiying __GFP_NORETRY for all allocations > PAGE_SIZE
(as in kvmalloc in linux-4.18 kernel),
so the kmalloc part fails easily.

HPE-bug-id: LUS-11409
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I7ff8acfb6b467a4f5a7e61b2b8ec631bea89f8a5
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49380
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Nikitas Angelinas <nikitas.angelinas@hpe.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-15626 tests: Fix shellcheck warning for acceptance-small 50/49350/5
Timothy Day [Thu, 8 Dec 2022 19:09:38 +0000 (19:09 +0000)]
LU-15626 tests: Fix shellcheck warning for acceptance-small

This patch addresses the warning and style suggestions
reported by shellcheck. The patch also ensures that
all spaces have been moved to tabs, and the script now
logs what test suites are about to be run.

Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia88758d0bf89e7d0aa67dfae31d969c780507b88
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49350
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16335 test: add fail_abort_cleanup() 35/49335/4
Lai Siyao [Wed, 7 Dec 2022 04:04:42 +0000 (23:04 -0500)]
LU-16335 test: add fail_abort_cleanup()

Add helper fail_abort_cleanup() to unlink test directories (call lfs
rm_entry if directory is broken) after fail_abort because after
LU-16159 update logs will be canceled upon recovery abort, which may
leave broken directories.

Update replay-single.sh in places where fail_abort is called and
directory may become broken.

Test-Parameters: trivial mdscount=2 mdtcount=4 testlist=replay-single,replay-single,replay-single,replay-single
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I260689b1a6fa5b0b4db5aab5095cb062ae57d612
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49335
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16322: build: Add client build support for openEuler 87/49187/4
Xinliang Liu [Wed, 26 Oct 2022 08:58:14 +0000 (08:58 +0000)]
LU-16322: build: Add client build support for openEuler

The kernel of current openEuler LTS version 22.03 is based on Linux
5.10.0 which is already supported in Lustre master. Thus we only need
to add build support for openEuler client.

OpenEuler Linux although is not compatible with RHEL, but it uses the
same package manager DNF/tools as RHEL and references the package
naming of RHEL. Thus we can reuse most of the RHEL build logic/scripts
for openEuler client building.

OpenEuler Linux is becoming the mainstream Linux distro in China. So
adding support for it makes sense for the users. For more details about
it see: https://www.openeuler.org/en/.

Test-Parameters: trivial
Change-Id: I8e8b59d36e566c6e49b12346c2fde985153f014d
Signed-off-by: Xinliang Liu <xinliang.liu@linaro.org>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49187
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-16279 lnet: improve error reporting in LUTF 87/48987/2
Cyril Bordage [Mon, 31 Oct 2022 11:08:44 +0000 (12:08 +0100)]
LU-16279 lnet: improve error reporting in LUTF

When an error occurs without using an RPC, the error reporting lacks
of traceback, listing only the exception itself. This patch adds the
traceback to the error string reported by R().

Test-Parameters: @lnet
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I3fe5f7628a3f96aeb7941ec75db6b6b5e49e9d84
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48987
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16268 mdd: set effective changelog mask correctly 61/48961/3
Mikhail Pershin [Tue, 25 Oct 2022 15:34:45 +0000 (18:34 +0300)]
LU-16268 mdd: set effective changelog mask correctly

When changelog mask is changed from MINMASK to a particular
value then recalculation is missed, so effective mask could
stay unchanged against expectations.

Patch adds additional check that old mask is MINMASK or not
to decide if mask recalculation is needed.

Test 160o is extended for that issue.

Fixes: ffe259f81cda ("LU-13055 changelog: use default mask if server has no mask")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ia3c93e19daeb71ff1042ebdb555e918faf89f844
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48961
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-16117 build: Avoid excessive modpost warnings 62/48362/9
Shaun Tancheff [Thu, 10 Nov 2022 06:53:47 +0000 (00:53 -0600)]
LU-16117 build: Avoid excessive modpost warnings

To avoid modpost warnings about duplicate symbols do not add
the LINUX_OBJ kernel symbols to the KBUILD_EXTRA_SYMBOLS list

Test-Parameters: trivial
HPE-bug-id: LUS-11192
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I85fc90661efcb66e4aa39c9bd3393dbe4f7ba5eb
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48362
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
3 weeks agoLU-16113 build: Fix configure tests for lock_page_memcg 44/49144/8
Shaun Tancheff [Tue, 15 Nov 2022 04:06:05 +0000 (22:06 -0600)]
LU-16113 build: Fix configure tests for lock_page_memcg

Linux commit v5.15-12273-gab2f9d2d3626
   mm: unexport {,un}lock_page_memcg

Fails when lock_page_memcg exists but is not exported.

Adjust usage of [un]lock_page_memcg() to vvp_[un]lock_page_memcg() and
define the mapping accordingly to avoid the compile error.

Test-Parameters: trivial
HPE-bug-id: LUS-11189
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I18029d078a00a0b21a14721bcdf953939b4118a1
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49144
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
3 weeks agoLU-16116 build: Configure tests for rhltable, bitmap_alloc... 61/48361/9
Shaun Tancheff [Wed, 9 Nov 2022 13:33:40 +0000 (07:33 -0600)]
LU-16116 build: Configure tests for rhltable, bitmap_alloc...

rhel8.6 with kernel 5.18 breaks a couple of compile tests

struct rhltable test fails with:
... error: ‘hlt’ is used uninitialized in this function
    [-Werror=uninitialized]

rdma_wr() test failes with:
... error: assignment discards ‘const’ qualifier from pointer
    target type [-Werror=discarded-qualifiers]
   wr = rdma_wr(NULL);

nla_strdup() test fails due to unused variable 'tmp'

Test-Parameters: trivial
HPE-bug-id: LUS-11191
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib2b1d223ac809cea157158fe35fd2535b04367df
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
3 weeks agoLU-16118 build: Use pde_data() when available 63/48363/9
Shaun Tancheff [Sat, 12 Nov 2022 09:29:42 +0000 (03:29 -0600)]
LU-16118 build: Use pde_data() when available

Linux commit v5.16-11573-g6dfbbae14a7b
   introduce pde_data() and
Linux commit v5.16-11574-g359745d78351
   remove PDE_DATA()

Use PDE_DATA() when pde_data is not available.

Test-Parameters: trivial
HPE-bug-id: LUS-11193
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ida570462acd466a251adc81a14bc1fbf35d96b00
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48363
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
3 weeks agoLU-13642 lnet: Allow IP specification 60/47660/24
Frank Sehr [Thu, 16 Jun 2022 18:40:06 +0000 (11:40 -0700)]
LU-13642 lnet: Allow IP specification

Allows selecting an interface by specifying an IP address in the NID.
All variations of interface and IP address are considered.

1 no interface and no IP address is specified: Select first interface
2 interface and no IP: Select main IP address
3 no interface and IP specified: Select first interface
        that has the IP address
4 interface and IP specified: Verify that interface and IP match

The change does not have any effect on current configurations and
will be active when the changes in lnetctl, YAML or
module parameter are applied.
This patch effects only socklnd component. A macro is defined in
lnet-types.h to check if an IP address is set (IPV4 or IPV6).
Further IPV6 changes are not integrated.

For further reference please read

IP specification in LNet
https://wiki.whamcloud.com/display/LNet/IP+specification+in+LNet

Test-Parameters: trivial
Signed-off-by: Frank Sehr <fsehr@whamcloud.com>
Change-Id: Ifdf8f884ce1ee1fb1b97ca3121aa83efb46f8ef0
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/47660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-15288 lnet: increase transaction timeout 80/45780/3
Cyril Bordage [Tue, 7 Dec 2021 22:14:43 +0000 (23:14 +0100)]
LU-15288 lnet: increase transaction timeout

In LU-13145, it was decided to increase default transaction timeout
(LNET_TRANSACTION_TIMEOUT_DEFAULT) to 150s. But, in the associated
patch, it was set to 50s. This modification will also modify
lnd_timeout (from 16 to 49).

Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I13a8b5d14230bb6e8936cb3e18540f19dbc62985
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/45780
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
3 weeks agoLU-16321 osd: Allow fiemap on kernel buffers 90/49190/7
Shaun Tancheff [Fri, 2 Dec 2022 10:19:59 +0000 (04:19 -0600)]
LU-16321 osd: Allow fiemap on kernel buffers

Linux commit v5.17-rc3-19-g967747bbc084
  uaccess: remove CONFIG_SET_FS

When KERNEL_DS gone lustre needs an alternative for fiemap to
copy extents to kernel space memory.

Direct in-kernel calls to inode->f_ops->fiemap() can utilize
an otherwise unused flag on fiemap_extent_info fi_flags
to indicate the fiemap extent buffer is allocated in kernel space.

Include ldiskfs patches for ldiskfs_fiemap() to
define EXT4_FIEMAP_FLAG_MEMCPY and utilize it.

HPE-bug-id: LUS-11337
Fixes: d0337cab8e ("LU-14195 osd: don't use set_fs() for ->fiemap() calls.")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I7a8edb481833fd1bdcf7b6cd6e08397c1754baee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49190
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Neil Brown <neilb@suse.de>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Petros Koutoupis <petros.koutoupis@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-14645 tests: test lfs setdirstripe with '/$' 63/49463/2
Jian Yu [Tue, 20 Dec 2022 20:24:25 +0000 (12:24 -0800)]
LU-14645 tests: test lfs setdirstripe with '/$'

This patch improves one of the lfs setdirstripe tests to
verify that dir name ending with '/' also works.

Test-Parameters: trivial mdscount=2 mdtcount=4 \
env=ONLY=24B testlist=sanity

Change-Id: I237d5a9ebad42cc0569aa1db487d0df147372316
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49463
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16373 tests: failover mds1 back to the primary server 45/49345/2
Jian Yu [Thu, 8 Dec 2022 07:56:36 +0000 (23:56 -0800)]
LU-16373 tests: failover mds1 back to the primary server

This patch fixes recovery-small test 144a to failover
mds1 back to the primary server so that stack_trap can
set timeout parameter on the correct mds node.

Test-Parameters: trivial \
env=SLOW=yes,FAILURE_MODE=HARD,ONLY=144a \
clientcount=4 mdtcount=1 mdscount=2 osscount=2 \
austeroptions=-R failover=true iscsi=1 \
testlist=recovery-small

Change-Id: Idbfdb7b084c7edac8784008e0455f76632aa685b
Test-Parameters: trivial testlist=recovery-small
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49345
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16433 llite: check vvp_account_page_dirtied 12/49512/6
Jian Yu [Thu, 29 Dec 2022 08:21:32 +0000 (00:21 -0800)]
LU-16433 llite: check vvp_account_page_dirtied

This patch removes duplicated codes from vvp_set_pagevec_dirty()
and check vvp_account_page_dirtied to determine if falling back
to call __set_page_dirty_nobuffers().

HAVE_ACCOUNT_PAGE_DIRTIED_EXPORT also needs to be checked because
vvp_account_page_dirtied is not defined if account_page_dirtied
is exported.

Test-Parameters: trivial clientdistro=el8.6 testlist=sanity

Test-Parameters: trivial clientdistro=el8.7 testlist=sanity

Test-Parameters: trivial clientdistro=el9.0 \
env=SANITY_EXCEPT="130 244a" testlist=sanity

Test-Parameters: trivial clientdistro=sles15sp4 \
env=SANITY_EXCEPT="27J 101j 244a" testlist=sanity

Change-Id: I272033d7494a157145224b1b8ce999a80958aa6c
Fixes: 4bf090b811 ("LU-15959 kernel: new kernel [SLES15 SP4 5.14.21-150400.24.18.1]")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49512
Tested-by: Maloo <maloo@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Shuichi Ihara <sihara@ddn.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
3 weeks agoLU-16120 build: Add support for kobj_type default_groups 65/48365/12
Shaun Tancheff [Fri, 16 Dec 2022 09:41:54 +0000 (03:41 -0600)]
LU-16120 build: Add support for kobj_type default_groups

Linux commit v5.1-rc3-29-gaa30f47cf666
  kobject: Add support for default attribute groups to kobj_type

Linux commit v5.18-rc1-2-gcdb4f26a63c3
  kobject: kobj_type: remove default_attrs

Switch to using kobj_type default_groups when it is available.
Provide support for default_attrs for older kernels.

HPE-bug-id: LUS-11196
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I43b03c67c22307293a2abc444aa1a73889ca09ee
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48365
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-16297 ptlrpc: don't panic during reconnection 29/49029/9
Alexander Boyko [Thu, 3 Nov 2022 11:23:20 +0000 (07:23 -0400)]
LU-16297 ptlrpc: don't panic during reconnection

ptlrpc_send_rpc() could race with ptlrpc_connect_import_locked()
in the middle of assertion check and this leads to a wrong panic.
Assertion checks

(AT_OFF || imp->imp_state != LUSTRE_IMP_FULL ||

reconnect changes import state and flags
and second part

(imp->imp_msghdr_flags & MSGHDR_AT_SUPPORT) ||
!(imp->imp_connect_data.ocd_connect_flags & OBD_CONNECT_AT)))

MSGHDR_AT_SUPPORT is disabled during client reconnection.
It is not good to use locking at this hot part, so fix changes
assertion to a report.

HPE-bug-id: LUS-10985
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ifc9e413c679c3e8a4c8f4f541251bebabae41c82
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
3 weeks agoLU-15935 tests: add version check to replay-dual test_33 98/49398/3
Jian Yu [Wed, 14 Dec 2022 02:31:05 +0000 (18:31 -0800)]
LU-15935 tests: add version check to replay-dual test_33

This patch adds MDS version check to replay-dual test_33
to avoid interop test failure.

Test-Parameters: trivial \
serverjob=lustre-b2_15 serverbuildno=28 \
env=ONLY=33 testlist=replay-dual

Test-Parameters: trivial env=ONLY=33 testlist=replay-dual

Change-Id: I3ec665302a431d3c0f07bc819a08237dbc5b4309
Fixes: 1a79d395dd ("LU-15935 target: keep track of multirpc slots in last_rcvd")
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/49398
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>