Whamcloud - gitweb
fs/lustre-release.git
13 months agoLU-9839 clio: lov active ios accounting fix
Alexander Zarochentsev [Tue, 21 Nov 2023 14:46:44 +0000 (09:46 -0500)]
LU-9839 clio: lov active ios accounting fix

ASSERT(atomic_read(&lov->lo_active_ios)==0) is triggered due to a
bug in active_ios accounting. For some cl_io_init(,CIT_MISC,,)
calls increment the lov_active_ios counter is not protected by the
layout lock. So the checks for active_ios != 0 are racy and not
preventing another thread from starting new cl_io and incrementing
the active_ios counter after any check but before the assertion.

The lov_active_ios counter increment should be done under the
same condition as taking the layout type lock.
The ci_type=CIT_MISC and ci_ignore_layout=1 should not be used
in ll_dom_finish_open() as the I/O doesn't come
"from the osc layer" and may race with a layout change.

Lustre-change: https://review.whamcloud.com/51638
Lustre-commit: 5bc1dd825b700677b002a43463a463c3ccb665ec

HPE-bug-id: LUS-11628
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I35fda85b968b847a87e73dd36bbb1648c744d62c
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17624 ssk: support FIPS mode on client
Sebastien Buisson [Wed, 6 Mar 2024 15:33:25 +0000 (15:33 +0000)]
LU-17624 ssk: support FIPS mode on client

In FIPS mode, only certain crypto methods are allowed. This has an
impact on the DHKE mechanism implemented for SSK, as this relies on
a prime number generated for the client key. More specifically, FIPS
mode imposes that only certain safe, well-known primes be used.

OpenSSL prior to v1.1 just imposes a requirement on the prime length.
OpenSSL v1.1 requires the use of a specific primitive when FIPS mode
is on, to fetch a well-known prime based on a prime NID.
OpenSSL v3 is capable of detecting FIPS mode is enforced, and picks up
a well-known prime instead of generating one.

Because of this, primes used for the DHKE are identical on all clients
in FIPS mode. So urge admins to use a short expiration time on SSK
keys, one day instead of one week, so that security contexts are
re-negotiated more often.

The NIST recommended primes are from see Table 26 in Appendix D of:
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Ar3.pdf

Lustre-change: https://review.whamcloud.com/54314
Lustre-commit: 5dc91df283fb5a7030b384f224085d73268dcca5

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1 clientdistro=el9.2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 clientdistro=el9.2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I52b1926393e51fba6a9e92a837f86a38516ef6ad
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54804
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17643 gss: make a local copy of the sptlrpc llog
Sebastien Buisson [Thu, 14 Mar 2024 17:15:29 +0000 (18:15 +0100)]
LU-17643 gss: make a local copy of the sptlrpc llog

Make a local copy on server side of the sptlrpc llog, so that
the targets that do not manage to connect to the MGS know at least
which security flavor to accept from clients.
This needs to pass the super_block to config_log_find_or_add().

Add sanity-sec test_70 to check that sptlrpc llog on MDS and OSS side
is equivalent to the one from the MGS.

Lustre-change: https://review.whamcloud.com/54394
Lustre-commit: 5921cb2a5b8b7e1301b2c1502be6f8006ab4082a

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I81f0136746e2df7cca1b34c4a17e4b7135a43c29
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-5134 utils: Add parallel option to lctl set_param
Ryan Haasken [Tue, 3 May 2016 19:49:57 +0000 (15:49 -0400)]
LU-5134 utils: Add parallel option to lctl set_param

Add a "-t" option to lctl set_param to enable setting multiple matched
parameters in parallel. When called with "-t", lctl will set up a work
queue of matched file names and spawn a fixed number of threads per
CPU. Each thread will pull items off the work queue, write to the file
associated with each work item, and return when there are no more
items on the work queue.

A field called po_parallel_threads is added to struct param_opts to
indicate the number of threads set_param should run in parallel. If in
parallel, jt_lcfg_setparam initializes a work queue and passes it to
do_param_op, which adds each matched item to the work queue. Once
jt_lcfg_setparam has called do_param_op for each param-value pair, it
passes the work queue to sp_run_threads, which creates threads, each
of which call write_param to set the parameter. If not in parallel,
jt_lcfg_setparam does not pass a work queue to do_param_op, and
do_param_op directly calls write_param on each matched param.

param_display was renamed to do_param_op to more accurately reflect
what it does.

If lctl is compiled without pthread support, "lctl set_param" will
still accept the "-t" option, but it will print a warning message, and
it will set the parameters in series.

The new "-t" option to set_param was documented in the lctl usage and
in the man page.

Lustre-change: https://review.whamcloud.com/10555
Lustre-commit: 345a2497d08f6b9afd74ed0188a70489f7a43e5d

HPE-bug-id: LUS-2592
Signed-off-by: Ryan Haasken <haasken@cray.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3f96a6f06c50d4ba2ce97050c35f46b976dfc005
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17713 mdd: validate the length of mdd append_pool name
Emoly Liu [Wed, 10 Apr 2024 09:18:03 +0000 (09:18 +0000)]
LU-17713 mdd: validate the length of mdd append_pool name

Validate the length of mdd append_pool name (<= LOV_MAXPOOLNAME)
before saving it in function append_pool_store().
Also, sanity.sh test_27M is improved a little to verify this fix.

Lustre-change: https://review.whamcloud.com/54691
Lustre-commit: 509a7cf9778968f796794c3743e62bc6b2a71592

Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id7083fab60e9a18af4d8eedfa3d55f37544ba15d
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17692 flock: get extra reference for lockd
Yang Sheng [Thu, 28 Mar 2024 19:54:06 +0000 (03:54 +0800)]
LU-17692 flock: get extra reference for lockd

We should get local locking first for GETLK. Else
the lock_owner could be released while working with
lockd.

Lustre-change: https://review.whamcloud.com/54622
Lustre-commit: 7f8af8f37eadb0d332c94472ae9cb9556f4425d2

Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I56e4204e315c2bdbc496b7961519ae45ab1820fe
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9392 sec: add server_upcall rbac role
Sebastien Buisson [Tue, 12 Mar 2024 10:32:59 +0000 (11:32 +0100)]
EX-9392 sec: add server_upcall rbac role

The purpose of the new server_upcall rbac role is to control whether
clients use the server side defined identity upcall. When set, clients
do comply with the server side identity upcall. When not set, clients
are leveraging the special INTERNAL identity upcall, which means
servers trust supplementary groups as provided by the clients.

Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I01dcedad5da0e175aa7b8d187f2affd34d933e39
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17518 gss: do not trust supp groups from client with krb
Sebastien Buisson [Fri, 9 Feb 2024 15:42:40 +0000 (16:42 +0100)]
LU-17518 gss: do not trust supp groups from client with krb

Thanks to Kerberos, Lustre does not have to trust clients anymore,
but relies on keytabs and tickets, cryptographically validated, to
recognize clients and users.
RPC provided supplementary groups should not be trusted, but checked
thanks to identity upcall and the trusted UID from the ticket.

Add sanity-krb5 test_9 to exercise this.

Lustre-change: https://review.whamcloud.com/53987
Lustre-commit: b09f56c208c6c34375d098f66075688f329b7c76

Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.8
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4113ef654492e76fcd377b2c0cc74e484b27850b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9530 tests: fix issues in backport of LU-13569
Serguei Smirnov [Fri, 26 Apr 2024 21:48:55 +0000 (14:48 -0700)]
EX-9530 tests: fix issues in backport of LU-13569

Backport of "LU-13569 tests: Check LNet Health recovery logic"
introduced adding of redundant lnets and drop rules.
Clean this up.

Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: trivial testlist=sanity-lnet clientversion=EXA6 serverversion=2.15
Fixes: 2b6f7a39 ("LU-13569 tests: Check LNet Health recovery logic")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1e2d5d31f77a29504182650be30f9db7087d82cc
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17440 lnet: prevent errorneous decref for asym route
Gian-Carlo DeFazio [Thu, 29 Feb 2024 00:44:48 +0000 (16:44 -0800)]
LU-17440 lnet: prevent errorneous decref for asym route

The following stack trace was seen on a lustre server:
Call Trace TBD:
[<0>] libcfs_call_trace+0x6f/0xa0 [libcfs]
[<0>] lbug_with_loc+0x3f/0x70 [libcfs]
[<0>] lnet_destroy_peer_ni_locked+0x44d/0x4e0 [lnet]
[<0>] lnet_handle_find_routed_path+0x86c/0xee0 [lnet]
[<0>] lnet_select_pathway+0xb95/0x16c0 [lnet]
[<0>] lnet_send+0x6d/0x1e0 [lnet]
[<0>] lnet_parse_local+0x3ed/0xdd0 [lnet]
[<0>] lnet_parse+0xd7d/0x1490 [lnet]
[<0>] kiblnd_handle_rx+0x30e/0x900 [ko2iblnd]
[<0>] kiblnd_scheduler+0x104b/0x10d0 [ko2iblnd]
[<0>] kthread+0x14c/0x170
[<0>] ret_from_fork+0x1f/0x40

It was discovered that the lnet routes between the server
and a client cluster were misconfigured, so that the clients
had routes to the server through all 8 available routers,
but the server had routes to the clients through only 7 of
the routers.

The server was contacted by a client node through the
router with the missing route. It incremented the ref count
for the corresponding struct lnet_peer_ni for that router,
but then, because it had no route through that peer, changed
the value of the struct lnet_peer_ni to a peer with a route
back to the client. It then decremented the new
struct lnet_peer_ni which resulted in the ref count being
decremented to 0 which caused an LBUG.

Detect if the peer is a router to the appropriate net.
If so, decrement its ref count at the end of the function,
if not, decrement its ref count immediately.

Lustre-change: https://review.whamcloud.com/53896
Lustre-commit: 2b210f39059be998b80b0acc13c12451960b63bb

Fixes: 60cfce ("LU-17062 lnet: Update lnet_peer_*_decref_locked usage")
Test-Parameters: testlist=sanity-lnet mdscount=1 osscount=2 clientcount=1
Signed-off-by: Gian-Carlo DeFazio <defazio1@llnl.gov>
Change-Id: I2d00faef60ae8768afa7afbb1b00a62ba90535bb
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54883
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
13 months agoLU-17062 lnet: Update lnet_peer_*_decref_locked usage
Shaun Tancheff [Sat, 16 Sep 2023 05:54:54 +0000 (00:54 -0500)]
LU-17062 lnet: Update lnet_peer_*_decref_locked usage

Move decref's to occur after last reference to prevent
use after free.

Lustre-change: https://review.whamcloud.com/52184
Lustre-commit: 60cfceb8c59364f786b31ac36c2c245b9a1e495a

HPE-bug-id: LUS-11799
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I2382ece560039383f644b6aee73a9481d6bb5673
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54897
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17724 gss: fix bad use of user buffer in rsi upcall
Sebastien Buisson [Thu, 11 Apr 2024 06:58:19 +0000 (08:58 +0200)]
LU-17724 gss: fix bad use of user buffer in rsi upcall

Use the proper kernel buffer to print message out when
upcall_cache_set_upcall() returns an error.

Lustre-change: https://review.whamcloud.com/54730
Lustre-commit: fe8c195f7a5ef3e653b6eaff8863c4c94e97e28c

Fixes: a462a119ec ("LU-17497 obdclass: check upcall incorrect values")
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ice781b4506822f1fd4ce0a062ce742f51e366525
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54887
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-15851 lnet: Adjust niov checks for large MD
Chris Horn [Sat, 16 Apr 2022 16:01:57 +0000 (10:01 -0600)]
LU-15851 lnet: Adjust niov checks for large MD

An LNet user can allocate a large contiguous MD. That MD can have >
LNET_MAX_IOV pages which causes some LNDs to assert on either niov
argument passed to lnd_recv() or the value stored in
lnet_msg::msg_niov. This is true even in cases where the actual
transfer size is <= LNET_MTU and will not exceed limits in the LNDs.

Adjust ksocklnd_send()/ksocklnd_recv() to assert on the return value
of lnet_extract_kiov().

Remove the assert on msg_niov (payload_niov) from kiblnd_send().
kiblnd_setup_rd_kiov() will already fail if we exceed ko2iblnd's
available scatter gather entries.

Lustre-change: https://review.whamcloud.com/47319
Lustre-commit: 105193b4a147257a0f9332053a16eb676dc99623

HPE-bug-id: LUS-10878
Test-Parameters: trivial
Fixes: 857f11169f ("LU-13004 lnet: always put a page list into struct lnet_libmd")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: Iaa851d90f735d04e5167bb9c07235625759245b2
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexey Lyashkov <alexey.lyashkov@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54847
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17630 osc: add cond_resched() to osc_lru_shrink()
Alex Zhuravlev [Tue, 9 Apr 2024 10:35:00 +0000 (13:35 +0300)]
LU-17630 osc: add cond_resched() to osc_lru_shrink()

osc_lru_shrink() may need to handle lots of pages and this way
can block scheduling for long. add couple cond_resched() to
prevent kernel warnings and other thread's starvation.

Lustre-change: https://review.whamcloud.com/54346
Lustre-commit: 69eb7b89c7f36ec6a8970e87fc8859207f4b9c0c

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I862c568ac777c0b929a1ffb61e246b079aee6718
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54708
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoEX-9192 ofd: take local chunk-aligned lock
Alex Zhuravlev [Wed, 24 Apr 2024 11:53:17 +0000 (14:53 +0300)]
EX-9192 ofd: take local chunk-aligned lock

On OST side to prevent racing read-modify-write against same
compressed chunk from the same client.

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Signed-off-by: Artem Blagodarenko <artem.blagodarenko@hpe.com>
Change-Id: Iffaf2d2856e276cb2f9becce2506154314217e3c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54890
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
13 months agoLU-17767 build: struct lsmcontext has slot or id member
Sebastien Buisson [Tue, 23 Apr 2024 19:03:19 +0000 (12:03 -0700)]
LU-17767 build: struct lsmcontext has slot or id member

With Ubuntu 24.04 kernel 6.8.0-31-generic, the struct lsmcontext uses
a field named 'id' to identify the LSM module, instead of 'slot' in
previous kernel versions.

Lustre-change: https://review.whamcloud.com/54881
Lustre-commit: TBD (from 7764f73b8e25f8658867e7ab080fe5d8ec62230b)

Fixes: 0e66489401 ("LU-16619 build: Ubuntu jammy 5.19 client support")
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I5080e60614b42ed63103f93cae1f481851742d0b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54882
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17774 build: pass systemdsystemunitdir to "make debs"
Jian Yu [Fri, 26 Apr 2024 17:32:46 +0000 (10:32 -0700)]
LU-17774 build: pass systemdsystemunitdir to "make debs"

This patch passes "--with-systemdsystemunitdir" configure
option to the configure command performed in "make debs".
It also updates debian/lustre-{client,server}-utils.install
with the detected/specified directory for systemd service files.

Lustre-change: https://review.whamcloud.com/54902
Lustre-commit: TBD (from f2621099bbbc032a053800940cb62d03dfbd7120)

Test-Parameters: trivial clientdistro=ubuntu2204

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I7c36904ea0ed0f393a76b0fb0ad444b330dfa78c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54921
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoEX-9192 csdc: Fix the upper mergeable chunk pointer
Artem Blagodarenko [Tue, 16 Apr 2024 18:02:32 +0000 (19:02 +0100)]
EX-9192 csdc: Fix the upper mergeable chunk pointer

If the full chunk is followed by un mergeable page, the upper
mergeable chunk pointer is occasionally set to this unmanageable page.
The chunk size is calculated wrongly then and the next condition
suggest not to compress this chunk, because its size is not equal to
the expected size.

The pointer should be moved to the first instruction after the
can_merge_pages().

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I09fedc770c8bbcac4864b32372a941da5e0c7ac3
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54814
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17504 build: fix lock_handle array-index-out-of-bounds
Andreas Dilger [Sat, 27 Apr 2024 01:48:49 +0000 (18:48 -0700)]
LU-17504 build: fix lock_handle array-index-out-of-bounds

After Linux kernel patch "ubsan: Tighten UBSAN_BOUNDS on GCC"
(commit v6.4-rc2-1-g2d47c6956ab3), flexible trailing arrays
declared like 'lock_handle[2]' will generate warnings when
CONFIG_UBSAN & co. is enabled:

    UBSAN: array-index-out-of-bounds in ldlm_request.c:1282:18
    index 2 is out of range for type 'lustre_handle [2]'

The declaration lock_handle[LDLM_LOCKREQ_HANDLES] confuses the
compiler into thinking there are only two fields in lock_handle,
but the caller often allocates extra fields beyond this for more
locks to be cancelled due to Early Lock Cancellation or from LRU.

Rather than have a second flexible array after lustre_handle[2],
declare the whole array as flexible, and fix up the few sites
that are allocating this array to ensure LDLM_LOCKREQ_HANDLES
fields are allocated at a minimum.

This subtly changes the checks in wiretest.c due to the removal
of the 2 "base" handles in ldlm_request, but I believe this is not
changing the wire protocol because it still allocates those handles
directly, and I have verified interoperability with a 2.14.0 server.

Lustre-change: https://review.whamcloud.com/54926
Lustre-commit: TBD (from 765bf07e894178a6a6f1477559a793af3a52412e)

Test-Parameters: testlist=runtests clientversion=2.14
Test-Parameters: testlist=runtests serverversion=2.14
Test-Parameters: testlist=runtests clientversion=2.15
Test-Parameters: testlist=runtests serverversion=2.15
Test-Parameters: testlist=runtests clientversion=EXA5
Test-Parameters: testlist=runtests serverversion=EXA5
Test-Parameters: testlist=runtests clientversion=EXA6
Test-Parameters: testlist=runtests serverversion=EXA6
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9695fb44f1b5c84bb750d2983cdd8b939e3ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54941
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
13 months agoLU-17784 build: improve wiretest for flexible arrays
Shaun Tancheff [Fri, 26 Apr 2024 18:32:09 +0000 (11:32 -0700)]
LU-17784 build: improve wiretest for flexible arrays

Flexible array checking can additionally probe that the size
of the array element is correct.

Lustre-change: https://review.whamcloud.com/54929
Lustre-commit: TBD (from a5cbf26e7985dfe60471d060439eb7cd90a17fc2)

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib7de3d156a2e77dfaf2e9ab1df8fab524c073610
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54938
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17504 build: fix gcc-13 [-Werror=stringop-overread] error
Shaun Tancheff [Thu, 25 Apr 2024 22:36:44 +0000 (15:36 -0700)]
LU-17504 build: fix gcc-13 [-Werror=stringop-overread] error

This patch fixes the following [-Werror=stringop-overread] and
[-Werror=attribute-warning] errors detected by gcc 13:

lustre/mgc/mgc_request.c:190:21: error: 'strcmp' reading 1 or
more bytes from a region of size 0 [-Werror=stringop-overread]
  190 | if (strcmp(logname, cld->cld_logname) == 0) {
      |     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In function 'fortify_memcpy_chk',
    inlined from 'class_handle_ioctl' at
/root/lustre-release/lustre/obdclass/class_obd.c:381:3:
include/linux/fortify-string.h:528:25: error:
call to '__write_overflow_field' declared with attribute warning:
detected write beyond size of field (1st parameter);
maybe use struct_group()? [-Werror=attribute-warning]
  528 |  __write_overflow_field(p_size_field, size);
      |  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Lustre-change: https://review.whamcloud.com/54834
Lustre-commit: TBD (from 787b45323742a00e262334ba6dfa8c7aff80bdac)

Signed-off-by: Jian Yu <yujian@whamcloud.com>
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I59f5a88b4cd64c9f4e67e568546baada371543b1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54874
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
13 months agoLU-17657 build: gcc 13 stricter enum checking
Shaun Tancheff [Fri, 26 Apr 2024 17:53:36 +0000 (10:53 -0700)]
LU-17657 build: gcc 13 stricter enum checking

gcc 13 does not allow mixing of emum and integer
types between function declaration and implementation.

Cleanup a couple of instances where an enum is treated
as an uint32_t / __u32 and treat it as an enum type.

lustre/lov/lov_ea.c: In function 'lsme_unpack_comp':
lustre/lov/lov_ea.c:531:21: error: array subscript
   'struct lov_stripe_md_entry[0]' is partly outside array bounds
    of 'struct lov_stripe_md_entry[0]' [-Werror=array-bounds=]
  531 |                 lsme->lsme_magic = magic;

Lustre-change: https://review.whamcloud.com/54468
Lustre-commit: TBD (from 617e7a25b12e0cdb865188414b6d1206eedec69a)

Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I8e2ef989ecbdebe5e13bcea0fbb210c4a14eb45e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54873
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoRM-620 build: New tag 2.14.0-ddn144
Andreas Dilger [Mon, 15 Apr 2024 09:59:08 +0000 (03:59 -0600)]
RM-620 build: New tag 2.14.0-ddn144

New tag 2.14.0-ddn144

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I2f3f6483d625cc777bcdd310e3acdde0530b3fb8

14 months agoRM-620 build: New tag lipe-2.48
Andreas Dilger [Mon, 15 Apr 2024 09:58:40 +0000 (03:58 -0600)]
RM-620 build: New tag lipe-2.48

New tag lipe-2.48

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I89c19d04e633a386d12553fda6cecca2b5b38322

14 months agoEX-9585 lipe: add lipe_find3 projid option
Andreas Dilger [Sat, 13 Apr 2024 05:38:00 +0000 (23:38 -0600)]
EX-9585 lipe: add lipe_find3 projid option

Add an option to print the project ID for a file with the
"-printf" argument, both as long option %{projid} as well
as short option and "%LP" that is compatible with "lfs find".

Sort all of the existing and new options alphabetically so that
it is easier to see which ones are implemented in the future.

Update the lipe-find3.1 man page and add a test case.

Test-Parameters: trivial testlist=sanity-lipe-find3.sh
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I18d2d3cc161c8aa92eb27c33b06214b6f5ce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54784
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17685 utils: Allow nocompr flag in lfs mirror extend
Alexandre Ioffe [Thu, 28 Mar 2024 02:34:58 +0000 (19:34 -0700)]
LU-17685 utils: Allow nocompr flag in lfs mirror extend

Extend the set of allowed optional flags in
'lfs mirror extend' command by LCME_FL_NOCOMPR. Allowed syntax:
--flags=prefer
--flags=nocompr
--flags=prefer,nocompr

Lustre-change: https://review.whamcloud.com/54640
Lustre-commit: 37e1316050c93e5233f77ebcd399a8272b989605

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: Id1538182eca0142464c19c0c4b1406592e615be1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54593
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
14 months agoEX-9524 mdt: enable parallel_rename_crossdir
Li Xi [Fri, 5 Apr 2024 13:21:45 +0000 (21:21 +0800)]
EX-9524 mdt: enable parallel_rename_crossdir

parallel_rename_crossdir was not enabled due to a problem when
porting the following patch.

Fixes: ce01016a4a ("LU-17426 mdt: relax same MDT file rename lock")

The test case that excercise the feature was not run due to
the version check problem when porting the following patch.

Fixes: bc59df8232 ("LU-17426 tests: add crossdir parallel rename test")

Change-Id: I9316c599c6bd24891fbab3484935147d812b6f1c
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54682
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17078 ldlm: do not spin up thread for local cancels
Patrick Farrell [Thu, 31 Aug 2023 00:10:30 +0000 (20:10 -0400)]
LU-17078 ldlm: do not spin up thread for local cancels

When doing lockless IO on the client, the server is
responsible for taking LDLM locks for each IO.

Currently, the server sends these locks to a separate
thread for cancellation.  This behavior is necessary on the
client where a lock may protect a large number of cached
pages, so cancelling it in a user thread may introduce
unacceptable delays.  But the server doesn't have cached
pages, so it makes more sense for the server to do the
cancellation in the same thread.

We do this by not spinning up an ldlm_bl thread for
cancellations of local (server side only) locks.

This improves 4K DIO random read performance by about 9%.

Without patch, maximum server IOPs on 4K reads:
2864k IOPS

With patch:
3118k IOPS

This is the maximum performance achieved with many clients
and client threads doing 4K random AIO reads from different
files.

Lustre-change: https://review.whamcloud.com/52192
Lustre-commit: 291ac6e6925e3bdf31f527de2bedf5f19706b230

Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: Ia996732780d278c5d0bc290c5484e3bc325a347a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/52193
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17297 grant: move tgt_grant_sanity_check() calls
Vladimir Saveliev [Fri, 17 Nov 2023 15:30:06 +0000 (18:30 +0300)]
LU-17297 grant: move tgt_grant_sanity_check() calls

Call tgt_grant_sanity_check() in ofd_obd_disconnect() and in
mdt_obd_disconnect() after call to tgt_grant_discard().

Otherwise, sum of grants does not match to total grant counter which
is reported as LustreError:
    ofd_obd_disconnect: tot_granted 0 != fo_tot_granted 8388608

This is because on stale export eviction
class_disconnect_stale_exports() moves stale exports to separate list
but does not update obd's grant counters.

Test to illustrate the issue is included.

Lustre-change: https://review.whamcloud.com/53171
Lustre-commit: 9df01eee755bbac5bed560f365fab85c1b1164ae

Test-Parameters: trivial testlist=recovery-small env=ONLY=156 serverversion=EXA5
Test-Parameters: trivial testlist=recovery-small env=ONLY=156 serverversion=2.15.4
HPE-bug-id: LUS-11469
Signed-off-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Change-Id: I0b4568b88a2fe7b50f4eac50b4b064d7afbc7a75
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54672
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17510 obdclass: fix wake up when queuing close request.
Mr NeilBrown [Mon, 4 Mar 2024 02:15:17 +0000 (13:15 +1100)]
LU-17510 obdclass: fix wake up when queuing close request.

The waitqueue for requests that need to be sent but that haven't been
allocated a slot is kept ordered by request arrival for fairness.  So
new requests are added to the end.

For requests other than 'close' there is a limit to the number of
active requests (slots) and requests are assigned to slot on a
first-come-first-served basis, so they are simply removed from the
head of the list.

For 'close' requests it is important that these not block indefinitely
behind other other requests so there is one slot that can only be used
by a close request - and only if no other slots are used by a close
request.  These requests do not follow a strict FIFO order.

When a non-"close" request completes we wake the first request on the
list.  There is no point searching all the way down the list for a
close request that could also be woken.  We only do that when a
"close" request completes.  This optimises the common case.

However: when a request is first queued we add it to the end of the
queue and then wake up the first deserving request if there is one.
When there are free slots, this is expected to wake the request just
queued.  When there are no free slots, nothing is woken.

When a "close" request is queued and added to the end of the queue
after other non-close requests, we need to potentially search to the
end of the queue for a close request to wake, just as we do when a
close request completes.  Unfortunately we don't.  This can result in
a close request blocking indefinitely.

So: change the wakeup in obd_get_mod_rpc_slot() to match the wakeup in
obd_put_mod_rpc_slot().  This ensure consistent handling and in
particular will handle a close request immediately if there are no
other close requests in flight.

Clarify comment in claim_mod_rpc_function() to make and perform minor
code cleanup there.

Lustre-change: https://review.whamcloud.com/54259
Lustre-commit: 7a2296a397381a5f6f9473b297f0062e8ff15948

Fixes: b5fde4d6c023 ("LU-17197 obdclass: preserve fairness when waiting for rpc slot")
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7b658efc0298a091166f0f18ce460fc3148047eb
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54688
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17261 lov: unlink can handle bogus striping
Alex Zhuravlev [Sat, 23 Mar 2024 17:13:32 +0000 (20:13 +0300)]
LU-17261 lov: unlink can handle bogus striping

Allow removing a file which has uninitialized OST objects in the
layout, possibly because LFSCK reconnected an orphan object back
into a mirrored file after the mirror had been deleted.

Don't wait and retry to access the bogus OST or MDT index in this
case, because the target will never appear, so waiting is futile.

Lustre-change: https://review.whamcloud.com/54544
Lustre-commit: 4ae823762db40d790ddd00c29e969b5c8e376430

Lustre-change: https://review.whamcloud.com/54719
Lustre-commit: 47573f85e60ac91f69c09b9edfbffc3f74fef298

Test-Parameters: testlist=sanity-flr,sanity-flr,sanity-flr,sanity-flr
Fixes: 94a4663db9 ("LU-17334 lmv: handle object created on newly added MDT")
Fixes: f35f897ec8 ("LU-17334 lov: handle object created on newly added OST")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I90b97c0e2d560d71b2a4c32a47fcfd7ae4e5535d
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54752
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16350 osd-ldiskfs: no_llseek removed, dquot_transfer
Shaun Tancheff [Thu, 11 Apr 2024 18:12:15 +0000 (11:12 -0700)]
LU-16350 osd-ldiskfs: no_llseek removed, dquot_transfer

Linux commit v5.19-rc2-6-g868941b14441
  fs: remove no_llseek

With the removal of no_llseek, leaving .llseek set to NULL
is functionally equivalent. Only provide no_llseek if it exists.

Linux commit v5.19-rc3-6-g71e7b535b890
 quota: port quota helpers mount ids

dquot_transfer adds a user namespace argument. Provide an
osd_dquot_transfer() wrapper to discard the additional
argument for older kernels.

Lustre-change: https://review.whamcloud.com/49266
Lustre-commit: 2de1dbd440e2b26ea1bdf663b92a3e8c62a95ee7

Test-Parameters: trivial
HPE-bug-id: LUS-11376
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: If3165aed0d7b827b90e26d9f0174137d087ce57a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54745
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16692 tests: remove force_new_seq from some test suites
Li Dongyang [Fri, 15 Mar 2024 11:39:30 +0000 (22:39 +1100)]
LU-16692 tests: remove force_new_seq from some test suites

force_new_seq was used in some tests to avoid the
situation where the sequence from replay request
could be different than the one osp is at, due to
previous sequence width has been used up.

Now it can be handled so remvoe the force_new_seq
to speed up test runs.
Some force_new_seq are still required to make sure
there are enough objects in the current precreate pool
for the overstriping test cases.

Lustre-change: https://review.whamcloud.com/54433
Lustre-commit: 9ef186b71b350127e7cfb67be5729f9e0bd39c79

Change-Id: Id1bc6760e721db61c11b1c3d6b2fa82965459728
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54705
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16692 osp: do not assert on seq got over network
Li Dongyang [Tue, 13 Feb 2024 04:10:53 +0000 (15:10 +1100)]
LU-16692 osp: do not assert on seq got over network

Replay requests have FIDs already assigned and the
sequence could be different to the osp:
seq rollover happened after the original request,
then something triggers replay, or osp lost the
seq rollover record on storage.

Detect this and avoid the assert in osp_fid_diff(),
we don't update the last id on osp in this case,
otherwise orhpan cleanup could cleanup the objects
in the current osp's sequence.

Also when rollover seq happens in osp, do not
LASSERT() if we didn't get a new seq, most likely
on ofd/ost the previous seq update was lost on storage.
We could return the error code and let precreate
thread try again.

Cleanup lu_fid_diff() which is not used.
In osp_create(), do not call osp_update_last_fid()
again for the regular non-replay case, it's already
done via osp_object_assign_fid()->osp_precreate_get_fid().

Lustre-change: https://review.whamcloud.com/54020
Lustre-commit: f00d2467fc7c5ebd8a313683e039bf945a4b7094

Change-Id: I509c00b998933d45865c9540e12a2db7d1b2b8ed
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54704
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16692 osp: osp_fid_diff vs rollover_new_seq race
Li Dongyang [Mon, 19 Feb 2024 02:27:22 +0000 (13:27 +1100)]
LU-16692 osp: osp_fid_diff vs rollover_new_seq race

osp_fid_diff/osp_objs_precreated is accessing the
last_created_fid and pre_used_fid without opd_pre_lock,
and this could race with osp_precreate_rollover_new_seq()
when updating them to new fids.

Lustre-change: https://review.whamcloud.com/54087
Lustre-commit: bc256c25631960e1386f3359bb6c85cfe6481fb7

Change-Id: I3a61c99570b5532776ddc43247c1513b8c89fb32
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54703
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-9806 obdclass: wait for all exports to go
Alex Zhuravlev [Fri, 12 Apr 2024 05:28:28 +0000 (08:28 +0300)]
LU-9806 obdclass: wait for all exports to go

obd_zombie_export_add() removes an export from the stale list
and then schedules a job to destroy that export. in this short
window ofd_fini()/mdt_fini() can find obd_linked_exports list
empty and no work in zombie work queue. then the obd is being
removed and concurrent export destroy may find the obd in a
unexpected state:
LustreError: 11166:0:(tgt_lastrcvd.c:469:tgt_client_free())
ASSERTION( lut && lut->lut_client_bitmap ) failed

use obd_stale_export_num counter to block in obd_zombie_barrier.

move atomic_inc() from class_unlink_export to obd_export_zombie_add()
as self-exports are not added to the stale list. I

Lustre-change: https://review.whamcloud.com/50147
Lustre-commit: 08f9ebe93b300c39d2af1fb8e82a22e9c84f401b

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I62ed019f86becd3c66f5fcdf991f13cd47466e5e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54753
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17557 osd: only accounting inodes are special
Alex Zhuravlev [Mon, 19 Feb 2024 08:18:45 +0000 (11:18 +0300)]
LU-17557 osd: only accounting inodes are special

don't treat all inodes special (system) because 5.14 turns filesystem
read-only when we try to access an non-existing inode with
LDISKFS_IGET_SPECIAL flag.

Lustre-change: https://review.whamcloud.com/54091
Lustre-commit: 333c7518f18fad80fe504766ae9645f2ede0108c

Fixes: 2c0b2b7540 ("LU-13166 osd-ldiskfs: fix to allow to get system inode")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0c05adaf7b94e04c094cb069e8271bf478010b8c
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54716
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-16887 scrub: delete OI when inode missing
Alexander Boyko [Thu, 11 Apr 2024 11:22:17 +0000 (19:22 +0800)]
LU-16887 scrub: delete OI when inode missing

osd_iget_check() function have no ability to check
OI when osd_iget() returns error, because inode is
lost during error. Let's return old logic.

Scrub doesn't check consistency between OI and inode
for items from inconsistent list. When OI points to
worng inode, OI record should be deleted.
(This part of 51263 had been merged into b_es6_0 along with
https://review.whamcloud.com/52037)

Lustre-change: https://review.whamcloud.com/51263
Lustre-commit: c24a090ec389ae9ca2bedb4c7e3ee777deb63c7f

Fixes: 716de353b ("LU-15542 osd-ldiskfs: exclude EA inode from processing")
HPE-bug-id: LUS-11540, LUS-11585
Signed-off-by: Alexander Boyko <alexander.boyko@hpe.com>
Change-Id: Ic1618db1c8ee24bb307a9cf3f5ca98441a739b7f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54709
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16623 tests: ignore sanity-pfl stripe-count off-by-1
Andreas Dilger [Sun, 14 Apr 2024 06:03:24 +0000 (00:03 -0600)]
LU-16623 tests: ignore sanity-pfl stripe-count off-by-1

In some cases the MDS may not create all stripes on a file, if the
MDT-OST connection does not have precreated objects.  This is OK,
so the tests should not fail the stripe-count check if trying to
create a fully-striped file and one of the stripes is missing.

Lustre-change: https://review.whamcloud.com/54778
Lustre-commit: TBD (from 6380f3f13f7ffe854365bf55410bb34db801529a)

Test-Parameters: trivial testlist=sanity-flr
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ie482fdf86f82e7a2292c021761885249a6c551f1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54779
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
14 months agoLU-17497 tests: skip sanity-sec/69 for old MDS
Andreas Dilger [Sun, 14 Apr 2024 07:49:46 +0000 (01:49 -0600)]
LU-17497 tests: skip sanity-sec/69 for old MDS

Older MDS versions do not have strict checking for identity_upcall
or rsi_upcall, don't run the test with those servers.

Lustre-change: https://review.whamcloud.com/54782
Lustre-commit: TBD (from 57b39fb5fecc895dc220835789d6011479bdd4db)

Test-Parameters: trivial testlist=sanity-sec env=ONLY=69 serverversion=2.15
Fixes: a462a119ec ("LU-17497 obdclass: check upcall incorrect values")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Icdfda82eca32c2de7e88991ead0d9723023ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54783
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
14 months agoEX-8981 tests: fix compression_enabled check
Andreas Dilger [Sun, 14 Apr 2024 19:18:55 +0000 (13:18 -0600)]
EX-8981 tests: fix compression_enabled check

The compression_enabled() check was returning
"true" and "false" but these are invalid return values
for bash, and need to be numeric values.  As such,
the function was essentially always returning "false"
and causing every subtest using this function to be
skipped during testing since it was introduced.

Change it to return a numeric value as it should.
Run testing for affected tests both x86_64 and aarch64
to test that it is working both ways.

Test-Parameters: testlist=sanity-compr env=SANITY_ONLY="460",ONLY="0 1 1000-1080",HONOR_EXCEPT=y
Test-Parameters: testlist=hot-pools env=ONLY=80,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-flr env=ONLY=43,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-sec env=ONLY="66 67",HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-pfl env=ONLY=100,HONOR_EXCEPT=y
Test-Parameters: testlist=sanity-pfl env=ONLY=100,HONOR_EXCEPT=y clientdistro=el8.8 clientarch=aarch64
Fixes: 8465bfa296 ("EX-8981 csdc: execute tests if compression is enabled")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ice87f55617038b5c34da0bc1f76c3998d3ec639f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54786
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
14 months agoEX-9482 tests: skip sanity-pfl tests if no compression
Andreas Dilger [Sun, 14 Apr 2024 07:07:21 +0000 (01:07 -0600)]
EX-9482 tests: skip sanity-pfl tests if no compression

Skip sanity-pfl test_100* for servers that do not understand CSDC.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-pfl
Test-Parameters: testlist=sanity-pfl serverjob=lustre-master serverbuildno=4521 serverdistro=el8.9
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ib3fcfd77e9e7ffb122ed6ade9015b02d42ea8319
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54781
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
14 months agoEX-8981 tests: skip sanity-lfsck/18i if no CSDC
Andreas Dilger [Sun, 14 Apr 2024 06:29:13 +0000 (00:29 -0600)]
EX-8981 tests: skip sanity-lfsck/18i if no CSDC

Skip sanity-lfsck test_18i if compression is not enabled/available.

Test-Parameters: trivial
Test-Parameters: testlist=sanity-lfsck env=ONLY=18,HONOR_EXCEPT=y clientdistro=el8.8 clientarch=aarch64
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I32b701ac91f072137f9f61d2cca39482f40b5ce5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54780
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
14 months agoLU-15378 tests: skip sanity test_64h for old servers
Andreas Dilger [Mon, 8 Apr 2024 19:14:48 +0000 (13:14 -0600)]
LU-15378 tests: skip sanity test_64h for old servers

Running sanity test_64h fails intermittently with EXA5.2 servers,
skip it during interop since there are a number of fixes in this
area and EXA5 grant interop isn't super critical.

Test-Parameters: trivial testlist=sanity env=ONLY=64 serverversion=EXA5
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I65d9e247aa62c02345c3cd0f9575e3e0ba1ff2ce
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54699
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
14 months agoRM-620 build: New tag 2.14.0-ddn143
Andreas Dilger [Tue, 9 Apr 2024 21:49:45 +0000 (15:49 -0600)]
RM-620 build: New tag 2.14.0-ddn143

New tag 2.14.0-ddn143

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4472a2c301d3d730108fcceed03bce1933d0c4cd

14 months agoLU-17034 quota: tmp fix against memory corruption
Sergey Cheremencev [Mon, 8 Apr 2024 11:43:53 +0000 (14:43 +0300)]
LU-17034 quota: tmp fix against memory corruption

Change QMT_INIT_SLV_CNT from 64 to 2000 to avoid accessing
memory out of array lqeg_arr. It could happen when at least
one of OSTs has index larger than the whole number of OSTs.
It is a temporary solution and maximum supported OST index
is 0x7d0. Later it will be changed with the longterm
solution.

Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I8d9444017fa9847142f3df77c63368282ff134c4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54696
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoDDN-4905 revert: "quota: lqeg_arr memmory corruption"
Sergey Cheremencev [Mon, 8 Apr 2024 11:25:41 +0000 (14:25 +0300)]
DDN-4905 revert: "quota: lqeg_arr memmory corruption"

This reverts commit 7c6d08994b23cc3ef112e3626f9402dbccf0bc2c
("LU-17034 quota: lqeg_arr memmory corruption")
as it causes following panic:

 qmt_map_lge_idx()) ASSERTION( k < lgd->lqeg_num_used ) failed: Cannot map ostidx 32 for 0000000072ee3f23
 qmt_map_lge_idx()) LBUG
 Call Trace TBD:
 libcfs_call_trace+0x6f/0xa0 [libcfs]
 lbug_with_loc+0x3f/0x70 [libcfs]
 qmt_map_lge_idx+0x7f/0x90 [lquota]
 qmt_seed_glbe_all+0x17f/0x770 [lquota]
 qmt_revalidate_lqes+0x213/0x360 [lquota]
 qmt_dqacq0+0x7d5/0x2320 [lquota]
 qmt_intent_policy+0x8d2/0xf10 [lquota]
 mdt_intent_opc+0x9a9/0xa80 [mdt]
 mdt_intent_policy+0x1fd/0x390 [mdt]
 ldlm_lock_enqueue+0x469/0xa90 [ptlrpc]
 ldlm_handle_enqueue0+0x61a/0x16c0 [ptlrpc]
 tgt_enqueue+0xa4/0x200 [ptlrpc]
 tgt_request_handle+0xc9c/0x1950 [ptlrpc]
 ptlrpc_server_handle_request+0x323/0xbd0 [ptlrpc]
 ptlrpc_main+0xbf1/0x1510 [ptlrpc]
 kthread+0x134/0x150
 ret_from_fork+0x1f/0x40
 Kernel panic - not syncing: LBUG

Fixes: 7c6d08994b ("LU-17034 quota: lqeg_arr memmory corruption")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: Iff377529d2862c869b751b4c942b476262951570
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54695
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoRM-620 build: New tag 2.14.0-ddn142
Andreas Dilger [Sun, 7 Apr 2024 19:18:52 +0000 (13:18 -0600)]
RM-620 build: New tag 2.14.0-ddn142

New tag 2.14.0-ddn142

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I6f94ce2c50e7be19c386d214dde67f1455aebeb7

14 months agoRM-620 build: New tag lipe-2.47
Andreas Dilger [Sun, 7 Apr 2024 19:18:32 +0000 (13:18 -0600)]
RM-620 build: New tag lipe-2.47

New tag lipe-2.47

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1d96d0f7049eeafd353e60d6a79ec19cf6554c0d

14 months agoEX-9523 csdc: fix ofd_preprw_write() for sanity 819b test
Artem Blagodarenko [Fri, 5 Apr 2024 13:03:56 +0000 (14:03 +0100)]
EX-9523 csdc: fix ofd_preprw_write() for sanity 819b test

Sanity 819b asserts:
tgt_brw_write()) ASSERTION( npages_local == npages_remote )

The test triggers fault inject in ofd_preprw_write():
if (OBD_FAIL_CHECK(OBD_FAIL_OST_2BIG_NIOBUF))
         rnb[i].rnb_len += PAGE_SIZE;

ofd_preprw_write() calculates npages_local taking in account
additional len from fault inject, BUT npages_remote is calulated
BEFORE the fault inject. So npages_remote was not adjusted.

To solve the problem it is enough to move range_to_page_count() call.

Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Fixes: 217341228f ("EX-7601 tgt: add remote_pages for writes")
Change-Id: Ifd659985a78c7630049a17622aff2eb7f4525fb1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54681
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17566 mdt: move squash code in new/old_init_ucred
Aurelien Degremont [Tue, 27 Feb 2024 12:20:33 +0000 (13:20 +0100)]
LU-17566 mdt: move squash code in new/old_init_ucred

Move the uid/gid squashing code at the same place,
at the bottom of the function, to make code refactoring
simpler later.

The squashing code is mostly clearing suppgids from ucred,
and no code was using between the old and new position in
the function. So that should be pretty safe.

Handle suppgids clearing the same way for both function
and for both UID or GID squashing.

Lustre-change: https://review.whamcloud.com/54194
Lustre-commit: 1730d8093fb36e7957414d314755ae5208da1011

Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Change-Id: I29669af26cf68491bf1b6020548116acf318c0c7
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54558
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17056 tests: force osc import reconnect in sanity-sec 30b
Sebastien Buisson [Mon, 28 Aug 2023 08:09:53 +0000 (10:09 +0200)]
LU-17056 tests: force osc import reconnect in sanity-sec 30b

In sanity-sec test_30b, force reconnect of idle osc imports
so that security flavor is correctly updated.
In case of failure, dump more information about state of the imports
and the srpc connections.

Lustre-change: https://review.whamcloud.com/54349
Lustre-commit: fa2cfb49decf3d897f63023c998a23fd98c5c3ea

Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaecc7321b12e61a266e97d3640a3288f0e7ec9dd
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54657
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17666: configure lnet before add net in sanity-sec:31
Li Xi [Fri, 22 Mar 2024 12:30:57 +0000 (20:30 +0800)]
LU-17666: configure lnet before add net in sanity-sec:31

If "options lnet config_on_load=1" is not configured in
modprobe.d, the lnet will not be configured when trying to
add a network. The command will hit problem.

/usr/sbin/lnetctl net add --if eth1 --net tcp999
add:
    - net:
          errno: -22
          descr: "cannot add network: Invalid argument"

Test-Parameters: trivial testlist=sanity-sec env=ONLY=31

Lustre-change: https://review.whamcloud.com/54543
Lustre-commit: e163883f76dac45a516b7d89671513d31063b7d6

Change-Id: If65b7cb372d4f04a10ea066d62f3ae43029fcf65
Signed-off-by: Li Xi <lixi@ddn.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54654
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-14911 osp: release thandle if it was created
Alex Zhuravlev [Thu, 4 Apr 2024 16:49:49 +0000 (19:49 +0300)]
LU-14911 osp: release thandle if it was created

osp_statfs_update() could leak thandle if transaction couldn't
start for a reason.

Lustre-change:  https://review.whamcloud.com/44504
Lustre-commit: c807e3f33b39409a061fa997cac57ac394c503ba

Change-Id: I541a5e4a7860008eb179d905ac57997b737f178c
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54670
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17634 hsm: serialize HSM restore for a file on a client
Qian Yingjin [Wed, 13 Mar 2024 01:33:19 +0000 (21:33 -0400)]
LU-17634 hsm: serialize HSM restore for a file on a client

For a file in HSM released, exists, archived status, start tens of
processes to read it in parallel on a client, and one read process
may report "No data available" error.

After analyzed the error, we found the following bug in HSM code:
Reading a released file already granted LAYOUT lock on a client:
P1:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
        ->ldlm_cancel_resource_local()
          remove LAYOUT lock from resource into cancel list
          NOT yet cancel the LAYOUT lock on the client via ELC...

P2:
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
      SKIP: No any conflict LAYOUT lock on resource lock list as P1
      has already move it (if any) into its cancel list
    ->mdt_hsm_request()
      ->cdt_restore_handle_add()
        ->cdt_restore_handle_find()
        ->list_add_tail(): add @crh to restore handle list
        NOT yet obtain EX LAYOUT lock to cancel cached LAYOUT
        locks on client side...

P3:
->ll_file_read_iter()
->ll_do_fast_read(): => return -ENODATA;
->vvp_io_init()
->lov_io_init_released(): io->ci_restore_needed = 1;
->vvp_io_fini()
  ->ll_layout_restore()
    ->mdc_ioc_hsm_request()
      ->mdc_hsm_request_lock_to_cancel()
      SKIP as P1 has already move the conflict LAYOUT lock
      (if any) into its cancel list
    ->mdt_hsm_request()
      ->cdt_restore_handle_add()
        ->cdt_restore_handle_find()
        SKIP as found a restore handle with same FID in the
        the restore handle list added by P2.
  ->ll_layout_refresh()
  ->io->ci_need_restart = vio->vui_layout_gen != gen;
  ->LAYOUT gen does not have any change as the LAYOUT lock on
    the client is not revoken yet, will not restart I/O...
->return -ENODATA; =>from fast read

We can fix this bug by serializing the HSM restore operation on a
client by using the @lli->lli_layout_mutex simply.

Add sanity-hsm/test_12{t, u} to verfiy it.

Lustre-change: https://review.whamcloud.com/54366
Lustre-commit: a6b3faffeaea7abbef389ad5296880a522a13460

Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Idc2a8c1818386c64798d7e28500c20c80ff369f1
Reviewed-by: Etienne AUJAMES <eaujames@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Xi <lixi@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54653
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17499 llite: inode lock in ll_migrate()
Alex Zhuravlev [Thu, 15 Feb 2024 16:24:12 +0000 (19:24 +0300)]
LU-17499 llite: inode lock in ll_migrate()

should be taken after data version check as this is the
correct locking order used in another paths like lseek.

Lustre-change: https://review.whamcloud.com/54041
Lustre-commit: 133fd8b4b11a0228f71d60e3d145d93be16014c9

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I0bafb8db215a2ea004928ff36049d8f053507c6f
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54651
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17448 lod: don't skip uninited components
Alex Zhuravlev [Wed, 6 Mar 2024 18:00:51 +0000 (21:00 +0300)]
LU-17448 lod: don't skip uninited components

don't skip uninitialized component during declaration as we need
to declare potential records to llogs if the component is created
in this transaction later.

Lustre-change: https://review.whamcloud.com/54302
Lustre-commit: 35b1076aef8fbb2840de2b831765a20ec937d034

Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ia1cbfaae9b28e40fd68fa125d748ec0b5319f512
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54655
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17546 osd: use __vfs_removexattr
Alex Zhuravlev [Fri, 16 Feb 2024 05:31:41 +0000 (08:31 +0300)]
LU-17546 osd: use __vfs_removexattr

as otherwise vfs_removexattr() taking inode's lock confict with
osd_execute_truncate() while we don't really need inode's lock
because another per-object lock has been already taken.

Lustre-change: https://review.whamcloud.com/54072
Lustre-commit: b9ef5d1e7f7dd1055a6ea6d3dc9f176fa910a372

Fixes: dcd5607ce0 ("LU-13430 vfs: add ll_vfs_getxattr/ll_vfs_setxattr compat macro")
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I43c1c60d2a9f911b6395e1b7546507074a90b1cf
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54656
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17497 obdclass: check upcall incorrect values
Sebastien Buisson [Thu, 1 Feb 2024 15:52:22 +0000 (16:52 +0100)]
LU-17497 obdclass: check upcall incorrect values

Identity upcall is set via lctl set_param mdt.*.identity_upcall=xxx,
and rsi upcall is set via lctl set_param sptlrpc.gss.rsi_upcall=xxx.
Possible values are a valid path to an executable, and also INTERNAL
to enable support of supplementary groups from client, or NONE to
disable identity upcall.
Add an upcall cache function that checks the user provided string, to
make sure we do not store an invalid value. And print a message to
stdout to explain the accepted values.

Lustre-change: https://review.whamcloud.com/53878
Lustre-commit: 2153e86541884ef7a5c1697a5d00daf6fa6461a4

Add sanity-sec test_69 to exercise this.

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iaf59e72aa1612f5579db175d8999dcf0053308ed
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53879
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17446 revert: "ldlm: Do not wait for BL AST RPC completion on cancel"
Andreas Dilger [Mon, 1 Apr 2024 17:12:43 +0000 (17:12 +0000)]
LU-17446 revert: "ldlm: Do not wait for BL AST RPC completion on cancel"

This reverts commit cfd5411db998c2b0427e310a19b8741b1ec3644e.
There can be LASSERT triggered due to blocking callbacks on one lock.
This will be fixed to handle this in a more generic manner.

Change-Id: I5bdd59e3668de0f1db02f3654c73531712a77c72
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54643
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
14 months agoEX-9136 lipe: Update the display format for fstats
Vitaliy Kuznetsov [Sat, 6 Apr 2024 09:08:34 +0000 (11:08 +0200)]
EX-9136 lipe: Update the display format for fstats

This patch affects only .out format report types.
The data output format has been updated according to
the request in the ticket.

Also:
1. Fixed incorrect information display in some tables;
2. Expanded additional information for each table;
3. The size in tables is now displayed in various formats,
   not just in KB;
4. Except for the “File Size” table, the size obtained from
   block_size is now used everywhere;
5. Fixed an issue with displaying the allocated size
   for directories;
6. Fixed the total size calculation for all directories;
7. Removed tables that are not yet available;
8. Added additional information about the number of missed
   files for each table;
9. All txt information for working with reports in the .out
   format is combined into a single array to
   simplify code maintenance.

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Id75b3af12ea00761850a9009848621539c016446
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54658
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17696 llite: remove LASSERT from ll_ddelete()
Jian Yu [Fri, 5 Apr 2024 08:37:03 +0000 (01:37 -0700)]
LU-17696 llite: remove LASSERT from ll_ddelete()

On Linux kernel 6.8, the changes in commit 2f42f1eb9093
("Call retain_dentry() with refcount 0") made d_delete()
instances called for dentries with ->d_lock held and
refcount equal to 0, which caused the following assertion
failure on Lustre client:

(dcache.c:136:ll_ddelete()) ASSERTION( d_count(de) == 1 ) failed

The value of d_count(de) became 0 instead of 1. Since
retain_dentry() was called either with refcount 0 or 1,
we can simply remove the LASSERT(ll_d_count(de) == 1)
from ll_ddelete() to avoid the above failure.

Lustre-change: https://review.whamcloud.com/54676
Lustre-commit: TBD (from 50bd3822d2977cd45e56521d137aec2ce5829529)

Change-Id: Ic4a39d9328326634190cd0719b4c0637e1bf315c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54679
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17504 build: fix array-index-out-of-bounds warning
Jian Yu [Wed, 3 Apr 2024 18:53:03 +0000 (11:53 -0700)]
LU-17504 build: fix array-index-out-of-bounds warning

On Linux kernel 6.5, due to commit 2d47c6956ab3
("ubsan: Tighten UBSAN_BOUNDS on GCC"), flexible
trailing arrays declared like 'lc_array_sum[1];'
will generate warnings when CONFIG_UBSAN & co. is
enabled:

  UBSAN: array-index-out-of-bounds in lprocfs_status.c:1609:17
  index 1 is out of range for type '__s64 [1]'

Since LPROCFS_STATS_FLAG_IRQ_SAFE flag is only used
in one place - obd_memory() counter, we can just
remove it and change obd_memory over to a regular
percpu_counter. This would both simplify the
lprocfs_counter() code, move over to using more
kernel functionality instead of libcfs, as well as
reduce overhead slightly for the memory accounting code.

Lustre-change: https://review.whamcloud.com/54365
Lustre-commit: TBD (from 21505a19d671868171de2ad0f94120b1ca779695)

Change-Id: Ic461c4b30317bfd2b1e9f5b6be84c4a7fb4e3eb9
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54660
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17592 build: compatibility updates for kernel 6.8
Shaun Tancheff [Tue, 2 Apr 2024 22:52:02 +0000 (15:52 -0700)]
LU-17592 build: compatibility updates for kernel 6.8

Linux commit v4.9-12227-g7b737965b331 introduced
  staging/lustre/libcfs: Convert to hotplug state machine
Linux commit v4.10-rc1-5-g4205e4786d0b
  cpu/hotplug: Provide dynamic range for prepare stage
Linux commit v6.7-rc2-1-g15bece7bec0d
  cpu/hotplug: Remove unused CPU hotplug states

CPUHP_LUSTRE_CFS_DEAD was introduced in 4.9 and removed in 6.8
CPUHP_BP_PREPARE_DYN was introduced in 4.10

With no distro kernels between 4.10 and 4.11 switch to
CPUHP_BP_PREPARE_DYN

Linux commit v6.7-rc1-3-gda549bdd15c2
  dentry: switch the lists of children to hlist
Provide trival wrappers to abstract the changed members

Linux commit v6.7-rc4-79-gaf7628d6ec19
  fs: convert error_remove_page to error_remove_folio
Proved a generic_error_remove_folio() for older kernels.

Lustre-change: https://review.whamcloud.com/54229
Lustre-commit: TBD (from 2036974a891ffac3ecffc7b2a21ca50bc6c94f78)

HPE-bug-id: LUS-12181
Fixes: ce98bfe5f72 ("LU-10499 pcc: add readonly mode for PCC")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ib2e85c2acd3d0934e1c4712dad53b80f0ddb1b08
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54586
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17592 build: kernel 6.8 -Werror=missing-prototypes
Shaun Tancheff [Tue, 2 Apr 2024 22:49:17 +0000 (15:49 -0700)]
LU-17592 build: kernel 6.8 -Werror=missing-prototypes

Linux commit v6.7-rc4-156-g0fcb70851fbf
  Makefile.extrawarn: turn on missing-prototypes globally

With -Wmissing-prototypes and -Werror cleanup some additional
funtions that are implicitly static and provide declarations
for those that are exported.

Add SERVER_ONLY and SERVER_ONLY_EXPORT_SYMBOL to wrap functions
that are only exported for and used by server components.

Lustre-change: https://review.whamcloud.com/54228
Lustre-commit: TBD (from 1da648a24984c94cccdf6686ab9c3aed28d32a47)

HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ice5219df5463effe964d2cd2114f003d185337da
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54584
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17592 build: kernel 6.8 removed strlcpy()
Shaun Tancheff [Tue, 2 Apr 2024 22:45:49 +0000 (15:45 -0700)]
LU-17592 build: kernel 6.8 removed strlcpy()

Linux commit v6.7-11707-gd26270061ae6
  string: Remove strlcpy()

strlcpy() is removed, use strscpy() and provide a strscpy()
for kernels that do not have one.

Lustre-change: https://review.whamcloud.com/54227
Lustre-commit: TBD (from 1861e4ce8ec6d66d17ed73042f39bacb6496685c)

HPE-bug-id: LUS-12181
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Ieab872f20e08d17a4842bc944fa38f9867de81f9
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54576
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoRM-620 build: New tag 2.14.0-ddn141
Andreas Dilger [Sun, 31 Mar 2024 15:38:40 +0000 (09:38 -0600)]
RM-620 build: New tag 2.14.0-ddn141

New tag 2.14.0-ddn141

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I5951e02d762098a10026e744a3894d2dd77b2c0a

14 months agoRM-620 build: New tag lipe-2.46
Andreas Dilger [Sun, 31 Mar 2024 15:38:16 +0000 (09:38 -0600)]
RM-620 build: New tag lipe-2.46

New tag lipe-2.46

Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I484423fbdfcd8ef6c5162e79fcb040a4039e20e0

14 months agoLU-8191 lnet: remove unused, fix non-static functions
Timothy Day [Thu, 28 Mar 2024 08:11:45 +0000 (01:11 -0700)]
LU-8191 lnet: remove unused, fix non-static functions

Static analysis shows that a number of functions
could be made static. This patch also declares
several functions in lnet static.

Lustre-change: https://review.whamcloud.com/51436
Lustre-commit: 43cbc93f1edc493e47fe5c4059bf0bae6a20c207

It is wrong to remove lnet_selftest_structure_assertion()
since it contained BUILD_BUGs used to ensure different LNet
Selftest versions can interoperate.

Add a dummy user for lnet_selftest_structure_assertion() in
LNet Selftest init. This should prevent analyzers from picking
this up as an unused function.

Lustre-change: https://review.whamcloud.com/54635
Lustre-commit: TBD (from ed2a2286d17a7d23b86a87094d1eb2abac8ea015)

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ie1b49c5652553715cd9f96b56090d33a95e3b438
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54604
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17334 lmv: exclude newly added MDT in mkdir
Lai Siyao [Thu, 18 Jan 2024 15:59:25 +0000 (10:59 -0500)]
LU-17334 lmv: exclude newly added MDT in mkdir

Exclude newly added MDT in QoS mkdir for 30 seconds in case
connections between MDTs are not ready, which may cause lookup fail.

Lustre-change: https://review.whamcloud.com/53860
Lustre-commit: a2b08583a1dc8ab18c4ea4a4b900870761a5c252

Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibb5e6eda29ddfff8f66708d72e33453a96f5e7ef
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Hongchao Zhang <hongchao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54608
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoEX-9212 lipe: Fix for 'stripe_count = -1' in stats reports
Vitaliy Kuznetsov [Thu, 28 Mar 2024 18:11:47 +0000 (19:11 +0100)]
EX-9212 lipe: Fix for 'stripe_count = -1' in stats reports

This patch corrects the display of the range for the table by the
number of stripes. Now the range will only contain one position
and will support a value of -1.

Also corrects the display for a range in other tables by
removing the fractional part.

For example:
[        -1 ] ...
[         1 ] ...
[         2 ] ...
[        10 ] ...

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Iff1933f69b713c7e5dff9145c5516fa050294d2e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54611
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoEX-9230 lipe: Add device name in file report name
Vitaliy Kuznetsov [Tue, 26 Mar 2024 13:55:30 +0000 (14:55 +0100)]
EX-9230 lipe: Add device name in file report name

This patch adds the device name (eg MDT-xxxx) to the
report name when automatically generating the name.
It also corrects the end time in the file name
(when scanning is completed) to the initial time
(when scanning began). Only for lipe_scan3.

Example of a new file name for a report:
files_sizes_report_lustre-MDT0000.2024-03-26-09:54:25.out

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I2e79404e459b5717858b92a0783fe3f1bad552ab
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54574
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoEX-9459 lipe: Fix behavior when getting attributes
Vitaliy Kuznetsov [Fri, 29 Mar 2024 14:29:49 +0000 (15:29 +0100)]
EX-9459 lipe: Fix behavior when getting attributes

This improvement is important and is intended for the
--collect-fsize-stats output policy in lipe_scan3.

This patch prevents the scanning process from stopping and
completing if any LOV attribute is not received correctly.
Instead of halting the scan, the patch adds additional error
counters, and all types of reports will now include new
error statistics.

Also add a counters for objects that have no size/allocate size.

An example of a new block with error information from a report
with the .out extension which will contain the following fields:

Error counters:
Allocated blocks is empty: 11101
Size is empty: 0
Without size (all size value empty): 59
Failed to get LOV attr: 0
Failed to get mirror count: 0
Failed to get stripe count: 0
Failed to get stripe size: 0

Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I1817ea189f3d554894822ad8d12a8514546b13b0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54583
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17611 utils: fix wrong static declarations
Mikhail Pershin [Fri, 29 Mar 2024 08:12:21 +0000 (01:12 -0700)]
LU-17611 utils: fix wrong static declarations

Revert wrong changes made to zfs mount utils

Lustre-change: https://review.whamcloud.com/54293
Lustre-commit: f45a0288b00597bc797963f7aa01cae5167b024e

Test-Parameters: trivial
Fixes: c7e9bdf8d4 ("LU-8191 utils: remove unused, fix non-static functions")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: I162d349ebadbf93a89abf49bd41465979d561423
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54630
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 utils: remove unused, fix non-static functions
Timothy Day [Fri, 29 Mar 2024 08:06:32 +0000 (01:06 -0700)]
LU-8191 utils: remove unused, fix non-static functions

Remove several functions which are never called.

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in various Lustre utils static.

Some missing headers caused some functions being
incorrectly marked as possible candidates for
being made static. These missing headers have
been added.

Lustre-change: https://review.whamcloud.com/51439
Lustre-commit: c7e9bdf8d4bb5e1127eb87472fbf0414823d5461

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Id51f922be57c33c011ee2f9e509ca164cc480edf
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54629
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 llverfs: fix non-static functions
Timothy Day [Fri, 29 Mar 2024 07:51:09 +0000 (00:51 -0700)]
LU-8191 llverfs: fix non-static functions

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in llverfs.c static.

Making functions new_file() and new_dir() static
causes new format truncation errors. Check the
return of snprintf() to silence these.

Lustre-change: https://review.whamcloud.com/53754
Lustre-commit: 986219131215e44a98703a6fb29d941b5f181aa3

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ieccf1e40c1da627571a7a95adbb85599185f1342
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54628
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 lustre: convert lmv,lod,lov functions to static
Timothy Day [Fri, 29 Mar 2024 07:48:07 +0000 (00:48 -0700)]
LU-8191 lustre: convert lmv,lod,lov functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in lmv, lod, and lov static.

Also, remove one unused function: lov_lsm_entry()

Another function is intentionally unused for
debugging purposes. It was detected by static
analysis, but it has been left untouched.

Lustre-change: https://review.whamcloud.com/51479
Lustre-commit: 78605b74a56a5338aa93bbf394fe35d0f7c17c5b

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If409226ea201587c7f95d4a65ffaef72671b5ac2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54627
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 liblustre: add missing functions to header
Timothy Day [Fri, 29 Mar 2024 07:43:05 +0000 (00:43 -0700)]
LU-8191 liblustre: add missing functions to header

A number of functions were missing from lustreapi.h,
causing them to be marked incorrectly as functions that
could be made static. They have been added to the
header so applications can use them.

Static analysis shows that a number of functions
could be made static. This patch also declares
several functions in liblustre static.

liblustreapi_nodemap.c and liblustreapi_ioctl.c were
missing an internal header, causing some functions
to be incorrectly flagged. This patch also adds that
header.

Initialize a previously uninitialized variable in
llapi_fid_to_handle().

Lustre-change: https://review.whamcloud.com/51434
Lustre-commit: 25523e5a35138a0534b01ff561169e501cc30787

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I67b9c59418b62602ffe36eb4284eb1e8d4a3b19b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54626
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 libcfs: convert functions to static, removed function
Timothy Day [Thu, 28 Mar 2024 23:47:01 +0000 (16:47 -0700)]
LU-8191 libcfs: convert functions to static, removed function

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in libcfs static.

Remove cfs_expr_list_values_free in string.c, since
it is not used.

A header was missing in param.c, causing a number of
functions to be missing declarations.

Lustre-change: https://review.whamcloud.com/51428
Lustre-commit: 067dfd8d2701572ebe7246726c14a6e72a78cb33

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia580881efa806bde49d532e5b2d8f5097f2294e0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54620
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 lnet: convert functions in utils to static
Timothy Day [Thu, 28 Mar 2024 23:39:27 +0000 (16:39 -0700)]
LU-8191 lnet: convert functions in utils to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in various LNet utils and lnetconfig to
static.

In LNet selftest (lst), one unused function was
removed entirely. Some declarations were moved to
made static.

Lustre-change: https://review.whamcloud.com/51427
Lustre-commit: d9cd9992b9e04bfad1ebd755f78d3e96850eaa32

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia4528281b3c87d77e46abb95f47ab0bdc72168c0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54619
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 ptlrpc: add missing headers
Timothy Day [Thu, 28 Mar 2024 23:31:20 +0000 (16:31 -0700)]
LU-8191 ptlrpc: add missing headers

Missing headers for several c-files in ptlrpc
cause functions to be incorrectly marked as only
being used within their respective c-files. This
patch adds those missing headers. It also addresses
a couple minor style issues.

Lustre-change: https://review.whamcloud.com/51480
Lustre-commit: fba78a416621343aadeeb4d28df86e422b653bfe

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Idd8fa747a671079aba2b691ef23cc7564e5e2430
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54618
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 lustre: convert mdc,mdd,mdt,mgc functions to static
Timothy Day [Thu, 28 Mar 2024 23:12:34 +0000 (16:12 -0700)]
LU-8191 lustre: convert mdc,mdd,mdt,mgc functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in mdc, mdd, mdt, and mgc static.

Also, remove mgs_client_add() since it was unused, and
move a declaration from a c-file to the proper header file.

Lustre-change: https://review.whamcloud.com/51478
Lustre-commit: 38d151f2f65d76fc392d356519227a648e114b5f

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ia23f62465c27c83a9a0260bb45e8c8b710491558
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54617
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 lustre: convert osp,osd,osc,ofd functions to static
Timothy Day [Thu, 28 Mar 2024 23:02:17 +0000 (16:02 -0700)]
LU-8191 lustre: convert osp,osd,osc,ofd functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in osp, osd, osc, and ofd static.

Also, fix a few minor style issues.

Lustre-change: https://review.whamcloud.com/51477
Lustre-commit: 2fa075d10f72f1400e4da9bdcaec905858b0a264

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I3d7af7ec0fa2978bfdd0cb490f18f485a78f81f6
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54616
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 lustre: convert ec,fid,ldlm,quota functions to static
Timothy Day [Thu, 28 Mar 2024 22:57:12 +0000 (15:57 -0700)]
LU-8191 lustre: convert ec,fid,ldlm,quota functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in ec, fid, ldlm, and quota static.

Lustre-change: https://review.whamcloud.com/51476
Lustre-commit: bcc828bdf88f8a19e56535ff3840e06065b606b2

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Ic64bdf0d802fd4c963b7b7d3a654575ebde5c07d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54615
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 target: convert functions to static
Timothy Day [Thu, 28 Mar 2024 23:54:13 +0000 (16:54 -0700)]
LU-8191 target: convert functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in target static.

Also, remove an unused function tgt_obd_log_cancel(),
and add some headers where they were missing.

Lustre-change: https://review.whamcloud.com/51475
Lustre-commit: bcbf31a9e1bd6399053a5a757b406152f0d65a42

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I1823df3562cb181b275788560166c92b63483637
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54614
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 llite: convert functions to static
Timothy Day [Thu, 28 Mar 2024 22:35:16 +0000 (15:35 -0700)]
LU-8191 llite: convert functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in llite static.

Also, conserve more * in comments.

Lustre-change: https://review.whamcloud.com/51441
Lustre-commit: a586a2bbc37879dc22382793d1704e7708b80887

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: Iafa3bb84de158e31b27b7784243bc15e78187f10
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54613
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 obdclass: add static and remove functions
Timothy Day [Thu, 28 Mar 2024 08:22:46 +0000 (01:22 -0700)]
LU-8191 obdclass: add static and remove functions

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in obdclass static.

There are a few functions which are never called
anywhere. These are removed. Additionally, there
is some debugging code (added 15 years ago) that
has also been removed.

Lustre-change: https://review.whamcloud.com/51440
Lustre-commit: b3f0995b772877f0bea5d3d790db441244c3c821

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I5f1d438c9663e62789d26093ec9bdd5d76a3060a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54605
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 tests: convert functions to static
Timothy Day [Thu, 28 Mar 2024 07:53:54 +0000 (00:53 -0700)]
LU-8191 tests: convert functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in various test helpers static.

Lustre-change: https://review.whamcloud.com/51433
Lustre-commit: 629d6bca95f96bd307d6a9da9b04d73d3fe7c68f

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I065fb4398ed1670ce6ad58cf946054f6bd1ec282
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54603
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 mdt: convert functions to static
Timothy Day [Thu, 28 Mar 2024 07:46:50 +0000 (00:46 -0700)]
LU-8191 mdt: convert functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in mdt_coordinator.c static.

 mdt_coordinator.c:2145:9: warning: Should this function be static?
 ssize_t loop_period_show(struct kobject *kobj,
         ^

Further patches will follow to clean up the
remaining non-static functions in other subsystems.

Lustre-change: https://review.whamcloud.com/51345
Lustre-commit: a1d332f613ac41f4a565c3aeca6633fbd618467e

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I0350b0d5c88c0a8d1f1748d1d429cdf90afb96b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54602
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-8191 ptlrpc: convert functions to static
Timothy Day [Thu, 28 Mar 2024 07:39:48 +0000 (00:39 -0700)]
LU-8191 ptlrpc: convert functions to static

Static analysis shows that a number of functions
could be made static. This patch declares several
functions in ptlrpc static.

Lustre-change: https://review.whamcloud.com/51353
Lustre-commit: 41d94aa9c9c01fa88ae152625d5ffd97256bab73

Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: If0d92f7f4e625c146833f360806ae80b8914cc20
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54601
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-16520 build: Move strscpy to libcfs common header
Shaun Tancheff [Tue, 26 Mar 2024 07:19:09 +0000 (00:19 -0700)]
LU-16520 build: Move strscpy to libcfs common header

Ensure strscpy is available to lustre

Lustre-change: https://review.whamcloud.com/49863
Lustre-commit: 7fe7f4ca06b9c8d128f7ba36988e36f8141ed53d

Test-Parameters: trivial
Fixes: 0b406c91d17 ("LU-13642 lnet: modify lnet_inetdev to work with large NIDS")
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I0c3673c2aa7e6b61671521a8cabde8a364f7f6f8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-15074 build: Use strlcpy if strscpy is not available
Shaun Tancheff [Tue, 26 Mar 2024 07:11:51 +0000 (00:11 -0700)]
LU-15074 build: Use strlcpy if strscpy is not available

Linux commit v4.2-rc1-2-g30035e45753b
    string: provide strscpy()

    The strscpy() API is intended to be used instead of strlcpy(),
    and instead of most uses of strncpy().

Unfortunatley strscpy is not always available.

Test for strscpy and fallback to strlcpy when strscpy is
not available.

Lustre-change: https://review.whamcloud.com/45175
Lustre-commit: 7861d0754d42ed2a02a330eb730cb43f21dd30f2

Test-Parameters: trivial
Fixes: b77a6d86936 ("LU-14665 lnet: simplify lnet_ni_add_interface")
HPE-bug-id: LUS-9546
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: I43038e4a6260dafb57195ec3417ce009f5a3fad4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54569
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-14665 lnet: simplify lnet_ni_add_interface
Olaf Faaland [Tue, 26 Mar 2024 07:05:56 +0000 (00:05 -0700)]
LU-14665 lnet: simplify lnet_ni_add_interface

Remove an unnecessary counter and move the comment before
the relevant code.  Improve error messages.

Lustre-change: https://review.whamcloud.com/43525
Lustre-commit: b77a6d86936c32bb5f36e9806323ba00a18b0f4b

Test-parameters: trivial

Signed-off-by: Olaf Faaland <faaland1@llnl.gov>
Change-Id: Iffc7a128b16bc1b2be7a44413a5972c97b12a5fa
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54568
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoLU-17609 sec: nodemap readonly_mount for remount
Sebastien Buisson [Tue, 5 Mar 2024 13:43:02 +0000 (14:43 +0100)]
LU-17609 sec: nodemap readonly_mount for remount

The readonly_mount property on nodemaps forces read-only mount from
clients. Clients trying rw remount (via mount -o remount,rw) should
also be forced to read-only.

Also improve sanity-sec test_61 to exercise client remount.

Lustre-change: https://review.whamcloud.com/54282
Lustre-commit: 27cf3e0ac8576841106b3fcbd58fd5d7d419197d

Fixes: e7ce67de92 ("LU-15451 sec: read-only nodemap flag")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I61f8141001d2ff9e832e5c93d8f5997479af98a6
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54561
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17605 obdclass: do not wait forever acquiring entry
Sebastien Buisson [Mon, 4 Mar 2024 16:26:30 +0000 (17:26 +0100)]
LU-17605 obdclass: do not wait forever acquiring entry

The process of refreshing an entry via refresh_entry() goes through
an upcall/downcall. If the upcall succeeds, we enter a wait queue.
If after that the downcall is never called, we hit the expiry timeout,
and we get removed from the wait queue.
But if the entry is not new, the expiry time will be
MAX_SCHEDULE_TIMEOUT == LONG_MAX, which means an infinite wait.
So avoid waiting forever if an entry could not be refreshed, and call
wake_up_all() if the wait for the ACQUIRING state failed.

Lustre-change: https://review.whamcloud.com/54269
Lustre-commit: d3157bc43390b56e2de2c7251802a67ccebe4952

Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I50ee59654adc221027c79cb68fa182b9abed50fa
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54551
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17612 gss: always try to unlink key in error
Sebastien Buisson [Thu, 7 Mar 2024 15:30:59 +0000 (16:30 +0100)]
LU-17612 gss: always try to unlink key in error

In case of error in context negotiation carried out in userspace,
always try to unlink key to avoid leaking it.

Lustre-change: https://review.whamcloud.com/54316
Lustre-commit: 21aa8404a42b79f7e0434cfe75411f85d7ee063a

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic771f1e4f1b6474caaa89f63c3b02678e163d3d3
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54557
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17612 sec: return keyring errors to userspace
Aurelien Degremont [Tue, 5 Mar 2024 08:29:23 +0000 (09:29 +0100)]
LU-17612 sec: return keyring errors to userspace

In current code, Linux keyring errors, when using GSS Kerberos,
are all masked under a generic ECONNREFUSED error. That makes
it hard to understand the root cause of the problem
for the I/O caller.

Update the code to propagate errors from request_key() up to
the application.

struct ptlrpc_cli_ctx * gss_sec_lookup_ctx_kr(...) is modified
to now returns a NULL pointer or -errval. This is tested by callers
and propagated. NULL values are still converted to ECONNREFUSED.

Lustre-change: https://review.whamcloud.com/54296
Lustre-commit: cd8625792f10d51fceca4717544ff8016609c3be

Test-Parameters: trivial
Test-Parameters: kerberos=true testlist=sanity-krb5
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Change-Id: I13792f141a961036bc9f7629a4a2db692e245c41
Signed-off-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54556
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoLU-17057 tests: Fix "...endpoint shutdown" under sanity-sec
Sebastien Buisson [Thu, 7 Mar 2024 08:28:53 +0000 (13:58 +0530)]
LU-17057 tests: Fix "...endpoint shutdown" under sanity-sec

This patch fixes test_0 failing with "Cannot send after
transport endpoint shutdown" by introducing wait_ssk()
in sec_setup() to deterministicly applied SSK flavor.

Lustre-change: https://review.whamcloud.com/54311
Lustre-commit: 24ae2519acd15691d9e319ffc8675cee60529b95

Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Test-Parameters: trivial mdscount=2 mdtcount=4 osscount=1 ostcount=8 clientcount=2 testlist=sanity-sec env=SHARED_KEY=true,ONLY=0
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: Ia14021ab82913507df02dbb5a12c8596663f15d9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54555
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
14 months agoEX-8981 csdc: execute tests if compression is enabled
Artem Blagodarenko [Tue, 20 Feb 2024 00:13:28 +0000 (00:13 +0000)]
EX-8981 csdc: execute tests if compression is enabled

"csdc: Provide finer grained enable_compression control" patch
enables CSDC by default on all archs except aarch64 and ppc64le.
No need to execute enable_compression/disable_compression for
each test, so this test changes them to compression_enabled
check that skips test if compression is disabled for the arch.

This patch forces to skip compression tests for ppc64le and
aarch64 so they stop adding noise to the test results.

Test-Parameters: testlist=sanity-compr env=SANITY_ONLY="460",ONLY="0 1001-1080" serverversion=2.15
Test-Parameters: testlist=hot-pools env=ONLY=80 serverversion=2.15
Test-Parameters: testlist=sanity-flr env=ONLY=43 serverversion=2.15
Test-Parameters: testlist=sanity-sec env=ONLY="66 67" serverversion=2.15
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I2b5deba0672f3bad028f383cf1205e66c2b05529
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54105
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
14 months agoEX-9108 obdclass: disable T10PI guard from disk when decompressing
Li Dongyang [Tue, 26 Mar 2024 05:09:21 +0000 (16:09 +1100)]
EX-9108 obdclass: disable T10PI guard from disk when decompressing

When server handles chunk unaligned read, we need to read
the chunk from disk and decompress the raw data.
This means we should not use the guard tags from the storage,
as they only match the raw data.
Disable the guard from disk and the T10PI checksum will be
recalculated later using decompressed data.

Change-Id: I49964e1412e9d02769797c64aab290b17851e26f
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54567
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>