Whamcloud - gitweb
Andreas Dilger [Thu, 30 May 2024 17:04:27 +0000 (11:04 -0600)]
EX-9708 utils: lfs setstripe adds -E with -Z
When specifying a layout with "lfs setstripe -Z" it will ignore
this option if no PFL component is specified with "-E".
Instead, "lfs setstripe -Z" should automatically upgrade the file
layout to a PFL layout so the compression parameters are saved.
Test-Parameters: trivial testlist=sanity-compr
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I29cc373fabd352d6f8b6781c238806b75cce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55264
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Timothy Day [Tue, 9 Jan 2024 17:17:10 +0000 (17:17 +0000)]
LU-17242 debug: use dump_stack() where possible
In some cases, libcfs_debug_dumpstack() can fail to output a
stack trace - either because the needed symbols are not exported
or those symbols can't be resolved at runtime. This seems to
occur more often with newer kernels. The messages appears only
as:
Lustre: ldlm_cb01_002: service thread pid 57876 was inactive for
40.494 seconds. The thread might be hung, or it might only be
slow and will resume later. Dumping the stack trace for
debugging purposes:
Pid: 57876, comm: ldlm_cb01_002 6.1.70 #1 SMP PREEMPT_DYNAMIC
Thu Jan 4 18:52:41 UTC 2024
Call Trace TBD:
with no stack trace (seen on CentOS 8.5 with ml 6.1.70).
For reference, the runtime symbol lookup was added and updated in:
b49ce7a ("LU-12400 libcfs: save_stack_trace_tsk if ARCH_STACKWALK")
58ac9d3 ("LU-14099 build: Fix for unconfigured arch_stackwalk")
First, add a message when the symbol can't be resolved correctly.
This makes it much easier to understand why the stack trace is
missing.
Second, replace libcfs_debug_dumpstack(NULL) with dump_stack().
When the task_struct is NULL, libcfs uses the current
task_struct. This replicates the functionality of dump_stack().
Using dump_stack() is more reliable, more in line with kernel
style, and not likely to be un-exported in the future.
Finally, in lustre/osc/osc_object.c the stack isn't dumped since
there is already an LBUG().
There only remains one user of libcfs_debug_dumpstack() which
uses a task_struct other than current. This can be cleaned up
in a future patch.
Lustre-change: https://review.whamcloud.com/53625
Lustre-commit:
ecac0c175d934fd5624c9ad8db8f45dbc33fb56c
Test-Parameters: trivial
Signed-off-by: Timothy Day <timday@amazon.com>
Change-Id: I196c1da7e39b1a694c0cb67ecfaab58ab3e4662c
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55239
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Alexander Zarochentsev [Mon, 29 Apr 2024 17:37:34 +0000 (17:37 +0000)]
LU-17851 ldiskfs: restart long fallocate tx
__ext4_journal_ensure_credits() may allow a long fs operation
like fallocate to run for too long, if the initial credits
estimation is enough high.
The fix is to force tx restart if tx state is not T_RUNNING.
Lustre-change: https://review.whamcloud.com/55111
Lustre-commit:
f317b5c30e478fdecceea4bd07c85ff305e9d81d
HPE-bug-id: LUS-12311
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: Ib03d78739997caa6d13690b41ef7d01609a3623b
Reviewed-by: Andrew Perepechko <andrew.perepechko@hpe.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55247
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaly Fertman [Tue, 13 Jul 2021 16:07:14 +0000 (19:07 +0300)]
LU-14847 ptlrpc: two replay lock threads
conflict to each other what leads to:
ASSERTION( atomic_read(&imp->imp_replay_inflight) == 1 )
replay_lock_interpret() does ptlrpc_connect_import() on error, and one
thread will appear starting with connect reply interpret.
replay_lock_interpret() also wakes up ldlm_lock_replay_thread() which
does ptlrpc_import_recovery_state_machine().
It may happen that both threads will get to ldlm_replay_locks() on the
next round at the same time, both increment imp_replay_inflight and
the second one will assert.
The problem appeared in LU-13600 which added ldlm_lock_replay_thread()
with the ptlrpc_import_recovery_state_machine() call.
Lustre-change: https://review.whamcloud.com/44294
Lustre-commit:
d7d7eb50c8f5fd3fc5a7808fb112d233bdef34d7
HPE-bug-id: LUS-10147
Fixes:
3b613a442b ("LU-13600 ptlrpc: limit rate of lock replays")
Signed-off-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Signed-off-by: Xing Huang <hxing@ddn.com>
Change-Id: Ia9aafb631e3ba5f850504cc58b4826acec2813bd
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mike Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55249
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lai Siyao [Tue, 19 Dec 2023 08:24:07 +0000 (03:24 -0500)]
LU-9457 test: improve sanity 253
Improve sanity test_253: set high watermark to 50M, and fill OST with
fallocate.
Lustre-change: https://review.whamcloud.com/53548
Lustre-commit:
e934646f5ea87cd8a432db0e672c6ea48867ea47
Test-Parameters: trivial
Test-Parameters: testlist=sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity env=EXCEPT=77c
Test-Parameters: testlist=sanity env=EXCEPT=77c
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I85139d7fc0697d08c21bdb19432b40c8dab82ee9
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55276
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Lai Siyao [Fri, 3 May 2024 00:27:04 +0000 (20:27 -0400)]
LU-15988 osp: don't print nid on -ESTALE
Osp_send_update_req() should not access import upon -ESTALE, because
this MDT may be in umount.
Lustre-change: https://review.whamcloud.com/55049
Lustre-commit:
ae26dbc3387a17b763cbc901fa256d894a1f88fb
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibd869e4e8da4f90ffd608a36d866264d5d552d0e
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55288
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Thu, 16 May 2024 19:57:42 +0000 (21:57 +0200)]
LU-15496 tests: fix sanity/398c to use proper OSC name
For ppc64le and aarch64 clients, the OSC import instance name does
not have "ffff" at the start, so use the proper device name for this
subtest.
Clean up the rest of test_398c to meet modern test code style.
Also add debugging to sanity/398c from #53462.
Lustre-change: https://review.whamcloud.com/55132
Lustre-commit:
b1b57bcadeeb5a87ac75387c4aa4ae084e1a27e0
Lustre-change: https://review.whamcloud.com/53462
Lustre-commit:
304ca31e2aa15c576e468a86e45d8817c8eca391
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: If8c72fa9b13eace009f39daf82454221eba6761b
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: Alex Deiter
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55313
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Mikhail Pershin [Sat, 18 May 2024 19:43:05 +0000 (22:43 +0300)]
LU-15644 llog: don't replace llog error with -ENOTDIR
The dt_try_as_dir() contains check for object existence
which is reported as -ENOTDIR after all. In case of llog
that goes to upper level and cause error reporting to
console. It is not relevant neither by error code nor by
debug level
Patch skips check for object existence in case of llog,
it is excessive anyway.
Debug level is reduced as well to don't spawn console
messages in case of -ENOENT, -ESTALE or -EIO errors
Lustre-change: https://review.whamcloud.com/55151
Lustre-commit:
bd9839f7dbdf59751e7cdc234602eb338c518104
Fixes:
1ebc9ed460 ("LU-15902 obdclass: dt_try_as_dir() check dir exists")
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Id404204566898a6ac2e258b7824491effc5fc92e
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55152
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Thu, 30 May 2024 01:17:55 +0000 (18:17 -0700)]
LU-17883 kernel: update SLES15 SP5 [5.14.21-150500.55.65.1]
Update SLES15 SP5 kernel to 5.14.21-150500.55.65.1 for Lustre client.
Lustre-change: https://review.whamcloud.com/55227
Lustre-commit: TBD (from
1372c20c7d85c4d5c216c566647a883af1c5f16a)
Test-Parameters: trivial mdtcount=4 mdscount=2 \
clientdistro=sles15sp5 testlist=sanity
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-1
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-2
Test-Parameters: optional clientdistro=sles15sp5 testgroup=full-part-3
Change-Id: Ie0601c190e52d6192bf389338be51c77db03a9c2
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55229
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Wed, 29 May 2024 00:40:19 +0000 (17:40 -0700)]
LU-17402 kernel: RHEL 8.10 client support
This patch makes changes to support RHEL 8.10 release
with kernel 4.18.0-553.el8_10 for Lustre client.
Lustre-change: https://review.whamcloud.com/54800
Lustre-commit: TBD (from
6748f47fac79e557ae21eb790b597be6449c926a)
Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.10 serverdistro=el8.8 testlist=sanity
Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.10 serverdistro=el8.8 testlist=sanity
Test-Parameters: optional clientdistro=el8.10 testgroup=full-part-1
Test-Parameters: optional clientdistro=el8.10 testgroup=full-part-2
Test-Parameters: optional clientdistro=el8.10 testgroup=full-part-3
Change-Id: I0a9a262d13e0b0de3607da0982468fd8b5f6a7aa
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55207
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Thu, 30 May 2024 23:00:40 +0000 (16:00 -0700)]
LU-17404 kernel: update RHEL 9.4 [5.14.0-427.18.1.el9_4]
Update RHEL 9.4 kernel to 5.14.0-427.18.1.el9_4 for Lustre client.
Lustre-change: https://review.whamcloud.com/55203
Lustre-commit: TBD (from
07a23833999207c336532bcf75aa9d5a954f1b07)
Test-Parameters: trivial \
mdtcount=4 mdscount=2 clientdistro=el9.4 testlist=sanity
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-3
Change-Id: If18027650ff953733f2e57727b71d2daa61d249c
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55208
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Elena Gryaznova [Tue, 26 Apr 2022 13:37:27 +0000 (16:37 +0300)]
LU-15785 tests: do not detect versions for RPC_MODE mode
lustre_version_code() is called each time when do_rpc_nodes()
is called. It is not needed to detect versions for RPC_MODE mode.
Lustre-change: https://review.whamcloud.com/47144
Lustre-change:
e3fcd81ae5f378ac62754a659c7adf0e0b656cf3
Fixes:
8fa23490bb ("LU-1538 tests: standardize test script init - sanity")
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10914
Change-Id: Ia7645de0a4eedfddf859c80e661ebcb2e45de140
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Vladimir Saveliev <vladimir.saveliev@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55272
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Thu, 30 May 2024 00:45:45 +0000 (18:45 -0600)]
RM-620 build: New tag 2.14.0-ddn150
New tag 2.14.0-ddn150
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7cac3d582c510f1e19316b97ccfe26dd239dce31
Andreas Dilger [Thu, 30 May 2024 00:45:22 +0000 (18:45 -0600)]
RM-620 build: New tag lipe-2.51
New tag lipe-2.51
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I814564f4535217c614ecc8bbda0ed842661ebf08
Etienne AUJAMES [Mon, 8 Jan 2024 15:06:08 +0000 (16:06 +0100)]
LU-17250 mgs: generate a new MDT configuration by copy
The configuration for a new MDT is generated by reading the client
configuration. The MGS filter existing mdc/osc, interpret the
records and then create the corresponding osp/osc device for the MDT.
The main idea of this patch is first to convert and copy the records
from the client configuration to create the new MDT.
And then, copy the remaining record sections from an existing MDT.
So the new MDT can inherit OST pools and parameters from the existing
one.
This avoids complex compatibility checks for IPv4/v6 NID because
add_uuid records are copied without need to parse NIDs.
This also allows to copy "add failnid" section from the client.
This patch extend the usage to "add failnid" section on MDT
configurations.
Here are the steps to copy a existing MDT configuration:
1/ read client configuration and generate osp MDT/OST records for the
new MDT
1/ find an existing MDT configuration
2/ copy and convert the remaining configuration records from the
existing MDT configuration (parameters and OST pools)
Add the regresion test conf-sanity 137.
Lustre-change: https://review.whamcloud.com/53614
Lustre-commit:
d4682ff4cc44413810a68e572cf7f05d5b188bb4
Test-Parameters: mdtcount=4 fstype=zfs testlist=conf-sanity
Test-Parameters: mdtcount=4 fstype=ldiskfs testlist=conf-sanity
Test-Parameters: mdtcount=4 fstype=zfs testlist=conf-sanity env=ONLY=137,ONLY_REPEAT=10
Test-Parameters: mdtcount=4 fstype=ldiskfs testlist=conf-sanity env=ONLY=137,ONLY_REPEAT=10
Signed-off-by: Etienne AUJAMES <eaujames@ddn.com>
Change-Id: I4a99085b8930a0dd8002bde87d4e8c575aaccba0
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55101
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Patrick Farrell [Fri, 15 Dec 2023 20:48:53 +0000 (15:48 -0500)]
LU-13805 llite: Fix return for non-queued aio
If an AIO fails or is completed synchronously (even
partially), the VFS will handle calling the completion
callback to finish the AIO, and so Lustre needs to return
the number of bytes successfully completed to the VFS.
This fixes a bug where if an AIO was racing with buffered
I/O, the AIO would fall back to buffered I/O, causing it to
complete before returning to the VFS rather than being
queued. In this case, Lustre would return 0 the VFS, and
the VFS would complete the AIO and report 0 bytes moved.
This fixes the logic for this.
Lustre-Commit:
8a5bb81f774b9d41f1009b07010372fa9cd03a62
Lustre-Change: https://review.whamcloud.com/49915
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Change-Id: I9306402201e2962bbff04a4264c37bd0f1eca7b7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53696
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Qian Yingjin <qian@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Sat, 27 Apr 2024 02:48:15 +0000 (20:48 -0600)]
LU-17788 ptlrpc: restore watchdog revival message
Restore the "Service thread pid NNN completed after SSS.mmm
seconds. This likely indicates the system was overloaded"
message that was lost during ptlrpc watchdog restructuring.
Do not rate limit this message, so that it is possible to see
when all threads are restored, even if their corresponding
"Service thread pid NNN was inactive" message was throttled.
Update recovery-small test_10a to check for these messages,
so that they are not removed again in the future.
Lustre-change: https://review.whamcloud.com/54942
Lustre-commit:
20c09eff4d397e7158aa4408e0cb50b102cc61c0
Test-Parameters: testlist=recovery-small env=ONLY=10a
Fixes:
fc9de679a4 ("LU-9859 libcfs: add watchdog for ptlrpc service threads.")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I0c7e96fb7f73ca5562a6f5ad780a79ffc83ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55095
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Vitaliy Kuznetsov [Tue, 21 May 2024 19:05:16 +0000 (21:05 +0200)]
EX-9585 lipe: add lipe_find3 pool option
Add an option to print the OST pool for a file with the
"-printf" argument, both as long option %{pool} as well as
short option and "%Lp" that is compatible with "lfs find".
The long %{pools} option prints *all* pools in the layout.
Update the lipe-find3.1 man page and add test cases for both.
Test-Parameters: trivial testlist=sanity-lipe-find3,sanity-lipe-scan3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I18d2d3cc161c8aa92eb27c33b06214b6f53ebbe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54785
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alexandre Ioffe <aioffe@ddn.com>
Vitaliy Kuznetsov [Wed, 29 May 2024 15:00:30 +0000 (17:00 +0200)]
EX-9121 lipe: Trivial improvements for report merging
Small changes that do not affect the functionality, but allow to
reuse some functions in other parts of lipe3, for example in the
utility for merging different directory stats reports.
Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Ib7eeeccb651e7bcff4ddfc78c66a35793df7bd1d
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55232
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Etienne AUJAMES [Thu, 26 Oct 2023 19:28:55 +0000 (21:28 +0200)]
LU-16566 sptlrpc: remove rq_sepol from ptlrpc_request
This patch remove rq_sepol from ptlrpc_request to reduce the memory
consumption on the servers.
rq_sepol field is 327 bytes long allocated for each request and this
is rarely used (it needs SELinux activated with the send_sepol
feature).
The patch store the SELinux policy status string in a separate object.
The pointer is stored in ptlrpc_sec->ps_sepol and protected by RCU
(mostly read-only, the SELinux policy should rarely change).
When the policy status needs to be packed in a request, we take a
reference to the current ps_sepol object and release it after the
packing. If the policy has changed in the meantime, the object used
will be free after.
A read operation is added to srpc_sepol parameter to return the
SELinux policy string cached in Lustre.
Lustre-change: https://review.whamcloud.com/52845
Lustre-commit:
3f70481c93dcabbb30267608a0054f4d7092e0db
Test-Parameters: testlist=sanity-selinux env=ONLY=21,ONLY_REPEAT=50
Test-Parameters: testlist=sanity-selinux env=ONLY=21,ONLY_REPEAT=50
Signed-off-by: Etienne AUJAMES <etienne.aujames@cea.fr>
Change-Id: I80fb76c97885c4b2987eb7f91a9bfe6e0e6e6c70
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55211
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Thu, 31 Aug 2023 20:50:56 +0000 (14:50 -0600)]
LU-17000 ptlrpc: fix string overflow warnings
Fix potential string overflow warnings in sptlrpc_flavor2name()
calling strncat() with the full size of the target buffer
instead of the *remaining* space in the target buffer.
Fix potential string overflow warning in sepol_seq_write_old()
and sepol_seq_write() potentially copying an unterminated string
from userspace via strncpy() and not terminating it afterward.
Since the maximum incoming parameter size is known in advance,
is reasonably small (~342 bytes), and is only used temporarily,
reorganize the code to avoid two buffer allocations and copies.
Use memcpy() to copy the string since its length is known, and
always add a NUL terminator to the string afterward.
Improvements to error messages and code style in these functions.
Addresses-Coverity: 199034 ("Out-of-bounds access")
Addresses-Coverity: 199063 ("Out-of-bounds access")
Addresses-Coverity: 199108 ("Out-of-bounds access")
Addresses-Coverity: 397374 ("String not null terminated")
Addresses-Coverity: 397394 ("String not null terminated")
Lustre-change: https://review.whamcloud.com/52210
Lustre-commit:
ff62700fa8ee717a71de13baec25f0d69640ae7c
Test-Parameters: trivial testlist=sanity-sec,sanity-selinux
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ia810ce9f07b663a90049bb78af21c06f0e3ebbe5
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55210
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Hongchao Zhang [Sat, 20 Apr 2024 06:31:51 +0000 (14:31 +0800)]
LU-17873 test: ignore WIFSIGNALED if rc is 0
Ignored the checking resulst of WIFSIGNALED if the return status
of the "lctl test_create" thread is zero.
Lustre-change: https://review.whamcloud.com/55194
Lustre-commit: TBD (from
d1000ae89065a6868d0dbbd5c752ff06299d36c4)
Test-Parameters: trivial envdefinitions=SLOW=yes,DEBUG_SIZE=64 mdtcount=1 \
testlist=mds-survey,mds-survey,mds-survey,mds-survey,mds-survey,mds-survey
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ifc3727d48010c9f00f38baff9ff91b5cc3afce5c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55185
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Fri, 12 Apr 2024 01:18:28 +0000 (19:18 -0600)]
LU-16915 tests: improve distro type checking
Improve lustre_os_release() infrastructure to reduce redundant
code and make it easier to use.
Lustre-change: https://review.whamcloud.com/54790
Lustre-commit:
1ffbec13c0f745d0b9c6b91959b1afa52f99d63b
Test-Parameters: trivial
Fixes:
339b5e918f ("LU-16915 tests: except sanity-sec test_51")
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id02223752df4eb3fd3b62b339e8c417eb33ebbe5
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55213
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Fri, 12 Apr 2024 01:18:28 +0000 (19:18 -0600)]
LU-16915 tests: except sanity-sec test_51
Skip sanity-sec test_51 since it has started failing recently with
the move to el9.3 servers.
Add common lustre_os_release infrastructure to make such checking
easier in the future.
Lustre-change: https://review.whamcloud.com/54751
Lustre-commit:
b881bd1051451ed18610e0cc3c3cd56c8803cbc9
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Id02223752df4eb3fd3b62b339e8c417eb3e86a12
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55212
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Rebanta Mitra [Tue, 28 May 2024 00:17:43 +0000 (17:17 -0700)]
LU-17877 lnet: export REGISTER_FUNC with EXPORT_SYMBOL_GPL
This patch exports REGISTER_FUNC and UNREGISTER_FUNC
with EXPORT_SYMBOL_GPL to load GPL-licensed modules.
Lustre-change: https://review.whamcloud.com/55217
Lustre-commit: TBD (from
b3bdf8ba7fb316905b76decb35bab8dc1947ed91)
Test-Parameters: trivial
Signed-off-by: Rebanta Mitra <rmitra@nvidia.com>
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I3a0d4e2b27911af36e210692d28892590eb0371c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55218
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Shaun Tancheff [Wed, 15 May 2024 06:30:39 +0000 (23:30 -0700)]
LU-17816 llapi: ensure pool name is nul terminated
strncpy() usage is inconsistent about the size of pool name
and sometimes for get to ensure a nul byte is placed at the
end of the copy.
CoverityID: 397181 ("Buffer not null terminated (BUFFER_SIZE)")
Also cleanup a case of checking that an unsigned value >= 0
CoverityID: 397820 ("Unsigned compared against 0 (NO_EFFECT)")
Lustre-change: https://review.whamcloud.com/55018
Lustre-commit:
64469274a4f3e202c76cf9a2757b8f36e8d0ee08
Signed-off-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Change-Id: Idec7adaf89c9dabc0275687c4a069fc8fa63e7a7
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55119
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Ake Sandgren [Wed, 15 May 2024 05:14:36 +0000 (22:14 -0700)]
LU-16819 build: use mofed path based on target kernel
Instead of using "uname -r", which limits builds to the currently
running kernel, use the target kernel which is available in
LINUXRELEASE, if the directory is available.
Building for a specific kernel is common practice when using DKMS.
Lustre-change: https://review.whamcloud.com/50937
Lustre-commit:
0e9708016b9948676484d290326c1fe8a269eb80
Test-Parameters: trivial
Signed-off-by: Ake Sandgren <ake.sandgren@hpc2n.umu.se>
Change-Id: Ifce912061a74fc5b7435cd940105190f0c3cd544
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55118
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Mon, 20 May 2024 19:59:50 +0000 (12:59 -0700)]
LU-17749 kernel: update RHEL 8.9 [4.18.0-513.24.1.el8_9]
Update RHEL 8.9 kernel to 4.18.0-513.24.1.el8_9 for Lustre client.
Lustre-change: https://review.whamcloud.com/54821
Lustre-commit: TBD (from
23a99efd9104b328ce1edb5fc9094bce2c06e9b9)
Test-Parameters: trivial fstype=ldiskfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity
Test-Parameters: trivial fstype=zfs mdtcount=4 mdscount=2 \
clientdistro=el8.9 serverdistro=el8.8 testlist=sanity
Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-1
Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-2
Test-Parameters: optional clientdistro=el8.9 serverdistro=el8.8 \
testgroup=full-part-3
Change-Id: I94b5a95e9e85f2f5e0cddb1dbb519ef92520ad0b
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55158
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Jian Yu [Sat, 11 May 2024 01:16:43 +0000 (18:16 -0700)]
LU-17404 kernel: new kernel [RHEL 9.4 5.14.0-427.16.1.el9_4]
This patch makes changes to support new RHEL 9.4 release
for Lustre client.
Lustre-change: https://review.whamcloud.com/54712
Lustre-commit: TBD (from
177846a0aa58b35d43696b3c3c5d71df0109ab14)
Test-Parameters: trivial \
mdtcount=4 mdscount=2 clientdistro=el9.4 testlist=sanity
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-1
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-2
Test-Parameters: optional clientdistro=el9.4 testgroup=full-part-3
Change-Id: Ic292c01ad16dc06e8dee966c4a211896fea284c0
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54746
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Minh Diep <mdiep@whamcloud.com>
Cyril Bordage [Wed, 24 Apr 2024 02:21:53 +0000 (04:21 +0200)]
LU-14810 lnet: ongoing push when discovery is stopped
If a push is not completed when discovery thread is stopped, then we
still have ln_dc_handler used as md handler (from
lnet_peer_send_push). That leads to assert failure from
lnet_assert_handler_unused.
To fix that, we call lnet_assert_handler_unused only after the monitor
thread has been stopped. Thus, the patch for LU-17496 is not needed
anymore.
Lustre-change: https://review.whamcloud.com/54884
Lustre-commit:
3ba393a5cb21ff0f8bd8a09c341ee01e936321c7
Fixes:
36b14a23a6 ("LU-17207 lnet: race b/w monitor thr stop and discovery push")
Test-Parameters: testlist=sanity-lnet env=ONLY="212 220",ONLY_REPEAT=100
Signed-off-by: Cyril Bordage <cbordage@whamcloud.com>
Change-Id: I426c37b12a3d29327a7295f528a5b875a9ac88a0
Reviewed-by: Shaun Tancheff <shaun.tancheff@hpe.com>
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55167
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Fri, 19 Apr 2024 02:53:10 +0000 (22:53 -0400)]
LU-17745 llite: fix the umount panic due to BDI unregister
There is a regression in the patch for LU-16954 on the old RHEL
kernel (RHEL8.2). When the Lustre is unmounted, the client gets
a crash.
In LU-16954, to avoid the remount failure, we explicitly
unregister the sysfs for the @bdi on the new kernel such as Unbutu
2204 v5.15 kernel.
However, this is not needed for the old kernel such RHEL 8.2.
In this patch, we remove the explicit unregister for the old kenel
to avoid the client crash during unmount.
Lustre-change: https://review.whamcloud.com/54850
Lustre-commit:
facff17860ff9a577bad0bf8fb932e869475e011
Fixes:
dcc1dd39a6 ("LU-16954 llite: add SB_I_CGROUPWB on super block for cgroup")
Test-Parameters: clientdistro=ubuntu2204 testlist=sanity-sec
Test-Parameters: clientdistro=el8.9 testlist=sanity-sec
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ic6df572744bed8994c08fb1369cc9beccbe2d87a
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55166
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Wed, 15 May 2024 04:56:31 +0000 (21:56 -0700)]
LU-17850 build: prefer LINUXRELEASE over uname -r
In a container or chroot environment, "uname -r" reports
the host instead of the target kernel version. We should
use the LINUXRELEASE variable which is configured in
config/lustre-build-linux.m4 with the value from UTS_RELEASE.
Lustre-change: https://review.whamcloud.com/55108
Lustre-commit: TBD (from
c587c5bdf1c10e4b96e88bb3a0f1972a75dbe9cb)
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: Iaa48027f5ae873e1298695a264db1c351d9eac5c
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55116
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Yang Sheng <ys@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Mon, 18 Mar 2024 15:37:02 +0000 (18:37 +0300)]
LU-17649 ptlrpc: fix -EACCES connection error handling
Connection errors -EACCES and -EROFS leave import in
intermediate state. It is still active as well as pinger
over it but has obd_no_recov set. That allows import to
recover after all if server security is updated. But even
in FULL state any RPC over import gets -ESHUTDOWN as
obd_no_recov is set
Meanwhile obd_no_recov is not supposed to be used in that
way, it reflects particular mount option and should not
be recovered ever. So patch sets import to deactive state
instead, making import not operational too but with
option to be activated manually or remounted
Server connections like LWP, MDT-OST and MDT-MDT are
excluded and are never deactivated. Such errors are
considered as temporary until remote target updates own
security as required or administrative intervention will
restart target as needed.
In both cases console message is issued.
Lustre-change: https://review.whamcloud.com/54448
Lustre-commit:
3f13f89e2f19b46a8f27ad007c10251147984875
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib83e1b0ac541823ec236591f08145340d6f6bf04
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55224
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Yang Sheng [Mon, 13 May 2024 14:44:16 +0000 (22:44 +0800)]
LU-17847 sec: wake up for rsc entry
We should wake up the waiter after rsc do_upcall.
Otherwise it may be stuck for a long time.
Lustre-change: https://review.whamcloud.com/55094
Lustre-commit:
99b1a2b5df9cffeae68ec88dfe784881109386d8
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I87d1e5a9687056c8ee2428aad45dafda16247de2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55222
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Mon, 8 Apr 2024 09:06:50 +0000 (11:06 +0200)]
LU-17714 gss: cleanup user keyring usage
User keys are linked to the user keyring. But we should not keep an
extra reference on the user keyring for every user key being created.
This leads to too many references on this keyring, and prevents proper
destroy in case the system wants to clean it up (because the user
logged off for instance).
And when unlinking a user key, we need to take care of the user
namespace, in order to fetch the real user keyring, and not the one
associated with the mapped uid in the user namespace.
Finally we must handle the case where the user key is explicitly
revoked via 'keyctl revoke' on the command line, by carrying out the
same cleanup as when 'lfs flushctx' is called. This properly drops
references on the key, and frees the security context associated with
the key.
Lustre-change: https://review.whamcloud.com/54692
Lustre-commit:
afe0e091d1b82391a929df74717b9665a6f0ab75
Test-Parameters: kerberos=true testlist=sanity-krb5
Fixes:
eef24d8a97 ("LU-17173 gss: user keys go to user keyring")
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Ic168b68f8652689aa4402eaa4fcdbd852743d320
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55170
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Mon, 8 Apr 2024 15:52:50 +0000 (17:52 +0200)]
LU-17714 gss: protect against revoked session keyring
In case the session keyring is revoked, request_key() still tries to
search it. Sadly this keyring is searched before the user keyring, so
it will return -EKEYREVOKED, and the user keyring, that does contain
the Lustre key, will not even be searched.
To work around this issue in the kernel implementation of request_key,
override the current process's credentials with no session keyring,
if we detect it has been revoked.
Lustre-change: https://review.whamcloud.com/54706
Lustre-commit:
045ab5c0273a843493ed2d6d3486b41efe36b834
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I64b6ac4693a47cf43d6fa1bf4e17bfb4907670fa
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55171
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Tue, 30 Jan 2024 12:13:52 +0000 (13:13 +0100)]
LU-17483 gss: refresh req context with already existing one
When we are processing a request with a root GSS context that
has the PTLRPC_CTX_ERROR_BIT bit set, try to replace it with an
already existing context. Such a context can already be up-to-date
thanks to other authentication requests sent to failover NIDs while
the current request was in the delay list. This valid context can be
fetched from the struct ptlrpc_sec.
Lustre-change: https://review.whamcloud.com/53859
Lustre-commit:
c76f7288fa772b48cf81050663e2124b25ab3994
Test-Parameters: kerberos=true testlist=sanity-krb5
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: Iff1cf727c4579cba6456e010aac6537cf888b0ae
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55169
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Fri, 17 May 2024 00:01:49 +0000 (18:01 -0600)]
RM-620 build: New tag 2.14.0-ddn149
New tag 2.14.0-ddn149
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I355196fa930dd63c414bb50c99359b6c2b1ebb32
Stephane Thiell [Thu, 11 Feb 2021 00:15:02 +0000 (16:15 -0800)]
LU-13609 mgs: fix config_log buffer handling
Fix buffer handling in mgs_list_logs() to list all MGS config_logs
using multiple ioctl calls when we have a large number of targets.
Lustre-change: https://review.whamcloud.com/41478
Lustre-commit:
e3f17defc141d8847562b610931255d37ed4dd3c
Fixes:
1d97a8b4cd3d ("LU-13609 llog: list all the log files correctly on MGS/MDT")
Signed-off-by: Stephane Thiell <sthiell@stanford.edu>
Change-Id: I1bf32e918e242f4da83c3d1624b7285a18a88d01
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55102
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Mr NeilBrown [Tue, 21 Mar 2023 23:08:29 +0000 (19:08 -0400)]
LU-10391 mgs: fix lots of white-space irregularities
In preparation for changing the code, fix lots of white-space issues
in mgs_llog.c
Lustre-change: https://review.whamcloud.com/50091
Lustre-commit:
60e6e35f4cad3f79b2e96ddf41a8d8a02d6047ac
Test-Parameters: trivial
Signed-off-by: Mr NeilBrown <neilb@suse.de>
Change-Id: I7fb40a473e3e4709778339b773988ec7079d20d8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55100
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Arshad Hussain [Tue, 9 Jan 2024 06:12:57 +0000 (11:42 +0530)]
LU-16861 obdfilter: Exclude quotes when getting NIDs
In get_targets(), when getting NIDs the quotes were also included.
Exclude quotes when generating NIDs as they are not required.
Use $LCTL instead of $lctl, and make it also work in Janitor testing.
Lustre-change: https://review.whamcloud.com/53620
Lustre-commit:
c265e1c7b045bf1f9e5b2919c282b63086929ab6
Test-Parameters: trivial testlist=obdfilter-survey
Fixes:
9ef9906d7 ("LU-6863 tests: change obdfilter-survey.sh for CLIENTONLY mode")
Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Change-Id: I8642539fc6b396f1339e20e4fef8bc78cda2d969
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55090
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Artem Blagodarenko [Thu, 16 May 2024 13:18:34 +0000 (09:18 -0400)]
EX-9784 csdc: Do not print error if a chunk is not compressed
is_chunk_start() can decide that a chunk can not be decompressed in
two cases: 1) chunk has not been compressed 2) chunk is corrupted
The error message should be printed only in case 2)
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I85f4850f989ba0fc8f00653f8f6b0f1b4837d625
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55128
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Tue, 26 Oct 2021 08:38:50 +0000 (11:38 +0300)]
LU-15163 osd: osd_obj_map_recover() to restart transaction
osd_obj_map_recover() stops transaction when need to call
vfs_link() and it has to start a new transaction to modify
filesystem.
Lustre-commit:
7bf0e557a2b3a463e4d78e81b6ab93987d3dc8af
Lustre-change: https://review.whamcloud.com/45368
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I6efe5444ddc959b19092bebc6e3c7dc25a29cea1
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55124
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Tue, 14 May 2024 05:30:37 +0000 (07:30 +0200)]
RM-620 build: New tag 2.14.0-ddn148
New tag 2.14.0-ddn148
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I33b2d18f60a7ddbaeb20ac219fd361b13fc12de4
Alexandre Ioffe [Tue, 14 May 2024 00:25:19 +0000 (17:25 -0700)]
EX-9054 lipe: fix incorrect pool pointer usage
Use pointer to pool struct instead of pool name.
Fixes:
504b0b0b61 (EX-9054 lipe: Add SSH stats per agent)
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I436aaa2ea9eb0059c5cee00882fe4332c6e22fe5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55096
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Mon, 13 May 2024 22:41:43 +0000 (00:41 +0200)]
RM-620 build: New tag 2.14.0-ddn147
New tag 2.14.0-ddn147
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I9f900fdfa9d48220e954ca6053e3d19cb60de9c5
Andreas Dilger [Mon, 13 May 2024 22:40:35 +0000 (00:40 +0200)]
RM-620 build: New tag lipe-2.50
New tag lipe-2.50
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I30e3b022099dac627c25482f4f883450544fceae
Elena Gryaznova [Tue, 11 Jan 2022 17:23:30 +0000 (20:23 +0300)]
LU-15429 tests: mount_mds_client() fix
mount/umount client is to be executed on active facet/host,
not on mds1_HOST. Without this fix test_140a() fails on
failover setup:
CMD: lm0101 umount /mnt/lustre2 2>&1
CMD: lm0102 rmdir /mnt/lustre2
lm0102: rmdir: failed to remove '/mnt/lustre2':
No such file or directory
test_140a: FAIL: no clients with recovery disabled
To reproduce the failure just run:
ONLY="107 140a" sh recovery-small.sh
on failover setup where mds1_HOST != mds1failover_HOST.
Lustre-commit:
1d2e2195873e82a603531e34f3f7d4c634490209
Lustre-change: https://review.whamcloud.com/46043
Fixes:
8bd04b4e57 ("LU-12722 target: disable recovery for local clients")
Test-Parameters: trivial env=ONLY="140a 140b" testlist=recovery-small
Signed-off-by: Elena Gryaznova <elena.gryaznova@hpe.com>
HPE-bug-id: LUS-10669
Change-Id: Ifbdedfda840e8421fa8a969f73131ca23982a28b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55041
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alexandre Ioffe [Wed, 7 Feb 2024 22:12:26 +0000 (14:12 -0800)]
EX-9054 lipe: Add SSH stats per agent
- Add stats counters on SSH per agent: error and
disconnection counters
- Add summary counter on total SSH inactivity
- Disconnect agent when lfs command option request fails
- Report log on INFO level when agent becomes active again
- Fixed minor bugs:
o memory leak when system error happens in
pthread_tryjoin_np()
o Missed stats on job retries
Test-Parameters: trivial testlist=hot-pools
Signed-off-by: Alexandre Ioffe <aioffe@ddn.com>
Change-Id: I35eebf61d35eb913a167ebd795779188a6217dac
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53957
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lai Siyao [Thu, 25 Apr 2024 08:15:49 +0000 (04:15 -0400)]
LU-17756 lod: add tunable lod.*.max_stripes_per_mdt
Add a tunable lod.*.max_stripes_per_mdt for directory overstriping.
The default value is 1 for interoperation.
Add sanity 300uh 300ui.
Lustre-change: https://review.whamcloud.com/54945
Lustre-commit: TBD (from
90d013f8897df887e0eed90593f24751fca97f65)
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Id8199f01f5e2d62ead6bf43d239eee8ec1e4cbb5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54947
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Patrick Farrell [Thu, 19 Jan 2023 20:05:38 +0000 (15:05 -0500)]
LU-12273 lod: metadata overstriping
This adds overstriping for MDTs, similar to overstriping
for OSTs (added in LU-9846). This adds a new option to
setdirstripe, -C, allowing creation of more than one stripe
per MDT. It is also possible to place multiple stripes on
the same MDT using specific striping with -m.
This allows a single directory to more fully use the full
capability of each MDT in the file system.
Two limitations of note:
1. This requires > 1 MDT, otherwise the DNE subsystem is
not initialized.
2. Due to recovery limitations, we allow a max of only 5
stripes per MDT.
MDT overstriping increases mdtest-hard-write performance by
up to 13%, mdtest-hard-stat by 93%, at the cost of a slight
drop in mdtest-hard-read (7%), with no change in delete.
4 MDTs, 1 stripe/MDT:
mdtest-hard-write 117.399467 kIOPS : time 339.496 seconds
mdtest-hard-stat 727.020749 kIOPS : time 55.666 seconds
mdtest-hard-read 245.556392 kIOPS : time 162.897 seconds
mdtest-hard-delete 104.379111 kIOPS : time 382.710 seconds
4 MDTs, 4 stripes/MDTs:
mdtest-hard-write 132.963290 kIOPS : time 309.093 seconds
mdtest-hard-stat 1408.161148 kIOPS : time 30.107 seconds
mdtest-hard-read 229.383910 kIOPS : time 179.576 seconds
mdtest-hard-delete 103.284369 kIOPS : time 398.442 seconds
Lustre-change: https://review.whamcloud.com/35034
Lustre-commit:
81ac7c0c989dd862e2215a4635c77e5123289658
Signed-off-by: Patrick Farrell <pfarrell@whamcloud.com>
Signed-off-by: Qian Yingjin <qian@ddn.com>
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: I11556b223029820bd335e87c7bf073970e03468d
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/53570
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Zhuravlev [Thu, 9 May 2024 17:34:29 +0000 (20:34 +0300)]
LU-17204 lod: don't panic on short LOVEA
when we request LOVEA and find the existing buffer is not enough,
we ask for LOVEA's size and reallocate the buffer. but LOVEA can
shrink in parallel (e.g. new default striping), so our expectation
that the size must be greater than size of the existing buffer is
not correct. replace the corresponding assertion with a simple
repeat + extra check for a livelock.
Lustre-commit:
8fa3532b1ee887be378adbf9432707b2d8a2d814
Lustre-change: https://review.whamcloud.com/52727
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: I26ad5091228bf78858f8538478dbcbdb235cddf4
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55065
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaliy Kuznetsov [Wed, 8 May 2024 11:54:03 +0000 (13:54 +0200)]
EX-9121 lipe: Add functionality for parsing ranges from JSON
This patch is the third in a series of patches that implement
functionality for combining size statistics reports and includes
functionality for reading and saving the resulting ranges in tables
obtained from different reports in JSON format.
Only affects file size statistics and this patch is the final one
for reports on file size statistics.
Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I9714f5ea970b103652f7714c93d76be2549ad3b8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55048
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaliy Kuznetsov [Wed, 8 May 2024 15:00:41 +0000 (17:00 +0200)]
EX-9121 lipe: Add functionality for parsing tables from JSON
This patch is the second in a series of patches that implement
functionality for combining reports on size statistics and
includes functionality for reading and recording tables obtained
from different reports in JSON format.
Only affects file size statistics.
Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I742879e61049e00b98d5f9defd7dabaea85fba0a
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55047
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaliy Kuznetsov [Wed, 8 May 2024 13:04:40 +0000 (15:04 +0200)]
EX-9121 lipe: Add entry point for report merging option
This patch adds a new option for lipe_scan3 to merge
statistics reports.
This option will work like this:
lipe_scan3 --merge-reports=/dir_with_reports
File with the results:
Path to out: merged_report.out
Path to yaml: merged_report.yaml
Path to json: merged_report.json
Path to csv: merged_report.csv
This patch is the first in a series of patches to implement the
functionality for merging reports on size statistics and includes
functionality for initialization and the first entry point.
Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Ia02c28811922e0abba52a9c2d6408da8df9ae4c2
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55046
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Vitaliy Kuznetsov [Wed, 8 May 2024 13:02:57 +0000 (15:02 +0200)]
EX-9121 lipe: Trivial improvements for report merging
Small changes that do not affect the functionality, but allow to
reuse some functions in other parts of lipe3, for example in the
utility for merging different stats reports.
Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: I1bc2d4b22e57a369acea86bf60d8f460c5b3b093
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55045
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Lai Siyao [Sat, 8 Jul 2023 20:35:43 +0000 (16:35 -0400)]
LU-15553 test: mkdir_on_mdt0 in recovery-small.sh
Many subtests in recovery-small.sh requires test dir be created on
MDT0, replace mkdir with mkdir_on_mdt0.
Fixes:
b9c4dc3c33 ("LU-14792 llite: enable filesystem-wide default LMV")
Lustre-change: https://review.whamcloud.com/51669
Lustre-commit:
3b0d2821845cf87ae7f03bf41ceae00237d94121
Test-Parameters: trivial
Test-Parameters: testlist=recovery-small,recovery-small,recovery-small
Test-Parameters: mdscount=2 mdtcount=4 testlist=recovery-small,recovery-small,recovery-small
Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com>
Change-Id: Ibc37b2dd25bcd94794392f5ff8a79df2e7932dcc
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Alex Deiter <alex.deiter@gmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55059
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Chris Horn [Wed, 23 Nov 2022 17:28:45 +0000 (10:28 -0700)]
LU-16643 lnet: Health logging improvements
LNet health activity can generate noise in console logs. The NI/Peer
NI recovery pings could be expected to fail and the related messages
from lnet_handle_recovery_reply() are generally redundant.
Improve this logging by having the lnet_monitor_thread() provide a
summary of NIs in recovery.
Another useful metric in spotting network trouble is if we have
messages exceeding their deadline. We do not currently log this
information. Keep a count of messages that have exceeded their
deadline and track the total excess time. The lnet_monitor_thread()
will then provide a summary of the number of messages and their
average excess time at a regular interval. These stats are then
reset when the monitor thread prints this information to the console.
Because NIs can be in recovery for extended periods of time, the
interval of console updates will increase from 1 to 5 minutes.
The interval is reset when it is detected that there are no longer any
NIs in recovery and there haven't been any messages past their
deadline since the last console update.
Lustre-change: https://review.whamcloud.com/50305
Lustre-commit:
0cb3d86c4004d75810c54bb897ad7fbb6d5ec05f
Test-Parameters: trivial
HPE-bug-id: LUS-11500
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I4ffffd0412806184282178ce0aca3073dd30d7e0
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55073
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Thu, 25 Apr 2024 16:42:44 +0000 (18:42 +0200)]
LU-17741 gss: fix lsvcgss service for systemd
Add a systemd unit file for lsvcgss service, so that the lsvcgssd
daemon can be handled correctly via systemctl.
Lustre-change: https://review.whamcloud.com/54915
Lustre-commit: TBD (from
ab83ed4cd83370f412e2e151e482bdb3cfef16dd)
Test-Parameters: trivial
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I7581996e1e28567415da0827681841ac228ad6c5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55087
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Mon, 13 May 2024 10:03:16 +0000 (12:03 +0200)]
EX-9721 tests: fix sanity-sec test_64x for interop
'server_upcall' rbac value is not known by older servers.
Fixes:
b952bcb620 ("EX-9392 sec: add server_upcall rbac role")
Fixes:
b5e421625b ("EX-9392 sec: use dedicated INTERNAL upcall cache")
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-2
Test-Parameters: testgroup=review-dne-part-2 serverversion=EXA6.3.0
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I39a69904ce4709eacf6f08173d3cfe42e247b5bd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55088
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Alex Zhuravlev [Mon, 13 May 2024 07:48:56 +0000 (10:48 +0300)]
LU-16430 ptlrpc: racy rq_obsolete bit modification
Racy bit modification causes assertion failure in
ptlrpc_at_remove_timed():
ASSERTION( !list_empty(&req->rq_srv.sr_timed_list) )
rq_obsolete is a bit field, so it's modification
isn't atomic and should be modified under rq_lock.
Lustre-Commit:
14ac768fd9633c5cf4474555170e5042c71a135b
Lustre-Change: https://review.whamcloud.com/49505
Change-Id: Ib1d3ad189a78b71ecf5b01585478922e984c9568
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55086
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Wed, 8 May 2024 06:05:56 +0000 (01:05 -0500)]
RM-620 build: New tag 2.14.0-ddn146
New tag 2.14.0-ddn146
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I1b2d4c0e3121f31b82407beb974a73498edc5862
Qian Yingjin [Mon, 29 Apr 2024 02:49:57 +0000 (22:49 -0400)]
LU-17789 pcc: dont auto PCCRO attach for write/setattr
It is meaningless for a client to do auto PCCRO attach for write
and setattr operations.
Moreover, it may result in sanity-pcc/test_21d failure as
follows:
"FAIL: expected /mnt/lustre/f21d.sanity-pcc: write_mod_data,
got: write_mod_dataa"
This patch fixed it by disabling PCCRO auto attach for write and
setattr operations.
Change-Id: I894db1953a119d12e9337251c069c594fb40482a
Test-Parameters: testlist=sanity-pcc env=ONLY=21d,ONLY_REPEAT=10
Signed-off-by: Qian Yingjin <qian@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54946
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Qian Yingjin [Fri, 22 Dec 2023 09:16:07 +0000 (04:16 -0500)]
LU-17383 statahead: quit statahead with a long time wait
If the thread is not doing stat for more than a time threshold
(@sbi->ll_sa_timeout, 30 seconds by default) then it probably does
not care too much about performance, or is no longer using this
directory.
Quit the statahead thread with a long time wait in this case.
This patch also fixes defects reported by Coverity Scan for
Lustre.
Also add the lines about ll_sa_timeout in
https://review.whamcloud.com/41308
Lustre-change: https://review.whamcloud.com/53535
Lustre-commit:
cfcba1ede861faec33d797e876a0fb11eab4332a
Fixes:
e10bf68d7c3 ("LU-14361 statahead: regularized fname statahead pattern")
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Test-Parameters: testlist=parallel-scale-nfsv4
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: Ia7c478268fe12eeefa6dfae1b3c94451f010d1d5
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55014
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Tue, 12 Mar 2024 14:12:38 +0000 (15:12 +0100)]
EX-9392 sec: use dedicated INTERNAL upcall cache
Implement the INTERNAL upcall cache as a dedicated, separate cache.
This makes it distinct from the regular identity upcall cache that can
be defined to use any upcall including NONE, per an MDT side tuning.
The INTERNAL upcall cache becomes accessible only to clients that
belong to a nodemap for which the 'server_upcall' rbac role is not
enabled.
Dedicated mdt-side tunables are created to configure the entry expiry
time and the acquire expire time for INTERNAL, as well as a tunable to
flush the INTERNAL upcall cache.
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I0267182fbfa646de40ac62f832e89fbfd8477822
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54361
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Sat, 20 Jan 2024 06:38:38 +0000 (14:38 +0800)]
LU-14535 quota: free lvbo in a wq
Mutex lqe_glbl_data_lock holded in
qmt_lvbo_free might be the reason of
sleeping while atommic if
cfs_hash_for_each_relax is getting a
spinlock on an upper layer:
BUG: sleeping function called from invalid
context at kernel/mutex.c:104
...
Call Trace:
dump_stack+0x19/0x1b
__might_sleep+0xd9/0x100
mutex_lock+0x20/0x40
qmt_lvbo_free+0xc7/0x380 [lquota]
mdt_lvbo_free+0x12d/0x140 [mdt]
ldlm_resource_putref+0x189/0x250 [ptlrpc]
ldlm_lock_put+0x1c8/0x760 [ptlrpc]
ldlm_export_lock_put+0x12/0x20 [ptlrpc]
cfs_hash_for_each_relax+0x3ff/0x450 [libcfs]
cfs_hash_for_each_empty+0x9a/0x210 [libcfs]
ldlm_export_cancel_locks+0xc2/0x1a0 [ptlrpc]
ldlm_bl_thread_main+0x7c8/0xb00 [ptlrpc]
kthread+0xe4/0xf0
ret_from_fork_nospec_begin+0x7/0x21
Move freeing of lvbo to a workqueue. This
patch could be probably reverted as soon
as https://review.whamcloud.com/45882 will
be landed.
Lustre-change: https://review.whamcloud.com/54107
Lustre-commit:
2cc18ece1e50c760786a13a9dcb5857d7768cb0f
Fixes:
1dbcbd70f8 ("LU-15021 quota: protect lqe_glbl_data in lqe")
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Change-Id: I56aee72a7adbc6514b40689bae30669e607b5ecd
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54807
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Mikhail Pershin [Mon, 22 Jan 2024 12:58:23 +0000 (15:58 +0300)]
LU-17379 mgc: try MGS nodes faster
Re-organize import_select_connection to try all NIDs
faster at least at first round.
- check NID LNET discovery status and skip those not
discovered yet on first round, at next round just
select the least recently used one
- reset AT timeout to minimal values at first round
- track per-connection total attempts to connect,
how many were replied, discovery status and output
this in import stats
Lustre-change: https://review.whamcloud.com/54022
Lustre-commit:
94d05d0737db256a64626bfe6fa9801819230d8a
Signed-off-by: Mikhail Pershin <mpershin@whamcloud.com>
Change-Id: Ib4d043e82bf156cc3e7c9ddeff0055790edcc9ee
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54949
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Serguei Smirnov [Mon, 5 Feb 2024 20:14:30 +0000 (12:14 -0800)]
LU-17379 lnet: add LNetPeerDiscovered to LNet API
LNetPeerDiscovered is added to allow lustre check
whether the peer has been successfully discovered by LNet
before attempting to open a connection to it.
For example, given a mount command with a list of NIDs,
Lustre can use LNetAddPeer API to initiate discovery on
every candidate first, and later use LNetPeerDiscovered
to select a reachable peer to connect to.
Lustre-change: https://review.whamcloud.com/53926
Lustre-commit:
dba41355565397228f587f13a901b5d762521ed0
Test-Parameters: trivial testlist=sanity-lnet
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I7c9964148a5a2a24d7889b8b4c2e488a433ca258
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54950
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Hongchao Zhang [Wed, 20 Mar 2024 14:14:27 +0000 (22:14 +0800)]
LU-15277 quota: don't print extra default quota info
While getting quota info by "lfs quota", it's better to include
default quota to the quota output of the specific quota ID.
Lustre-change: https://review.whamcloud.com/45725
Lustre-commit:
0a97a8a41796caa52ef27b0cc00b11ee5889c1fe
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: I6726888b8857f9a45a96c83db0a546b29507cf8a
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55029
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Emoly Liu [Mon, 6 May 2024 04:24:00 +0000 (12:24 +0800)]
LU-17815 tests: skip conf-sanity.sh test_5h
Skip conf-sanity.sh test_5h because it always caused test_102 and
test_108 failure in recent interop testing.
Lustre-change: https://review.whamcloud.com/55012
Lustre-commit: TBD (from
ab300315dbbd745aec91482fc59ea15e6909fe15)
Test-Parameters: trivial serverbuildno=606 serverjob=lustre-b_es5_2 serverdistro=el7.9 testlist=conf-sanity env=ONLY="5h 102 108",HONOR_EXCEPT=y
Test-Parameters: trivial testlist=conf-sanity
Fixes:
d1b5146eda ("LU-12206 mdt: mdt_init0 failure handling")
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id6ffe8b5d88e1d79883cbf2d84d73796945fc734
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/55013
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sergey Cheremencev [Sat, 27 Apr 2024 10:22:15 +0000 (13:22 +0300)]
EX-8423 csdc: disable gzip
Gzip compression periodically causes
following assertion on clients:
decompress_request()) ASSERTION( dst_size <= chunk_size )
Until this is not fixed:
1. forbid setting gzip layout
2. remove gzip from sanity-compr.sh, sanity.sh, sanity-flr.sh,
sanity-lfsck.sh, sanity-pfl.sh
3. remove gzip from ll_compression_scan
There is still a backdoor to set gzip for test purposes,
if set LFS_SETSTRIPE_COMPR_OK. When set, gzip will be applied
in sanity-flr(43c, 43d) and sanity-compr(1a).
Signed-off-by: Sergey Cheremencev <scherementsev@ddn.com>
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I5461ba756dcd15e0d705f3a3c51a125a59ec19a5
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54943
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Artem Blagodarenko [Tue, 16 Apr 2024 20:40:41 +0000 (21:40 +0100)]
EX-9449 csdc: replace assert with error message
Remove assert, based on the data from the wire and replace
it to the error message, which be useful in case this
error happens.
The -EAGAIN error is reasonable in this case.
Signed-off-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Change-Id: I2f37d0204123af1c23352b967dad1de5e7860b64
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54817
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sergey Cheremencev [Wed, 31 Mar 2021 12:13:53 +0000 (15:13 +0300)]
LU-15057 utils: pool quota man
Adding pool quota man for setquota and
quota commands.
Remove [-o <obd_uuid>|-i <mdt_idx>|-I <ost_idx>]
from the case "lfs quota -t". Grace period
is stored only at quota master. Furthermore,
command lfs quota -t -I 0 /mnt/testfs fails
with EOPNOTSUPP.
Test-Parameters: trivial
Lustre-change: https://review.whamcloud.com/45121/
Lustre-commit: I368e22b782bd3626f64907059ea329e94986535b
Signed-off-by: Sergey Cheremencev <sergey.cheremencev@hpe.com>
Change-Id: I0e2d2c3df05c0053a1306dec9aa7353ce80162df
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54893
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Mon, 25 Sep 2023 17:53:18 +0000 (11:53 -0600)]
LU-17498 tests: show NIDs in node summary page
Instead of only showing the network type for each node, list
show the full NID in the YAML file to help with debugging and
identifying nodes in the logs.
Lustre-change: https://review.whamcloud.com/52500
Lustre-commit:
8e1f0cc90785463fb9ea847a8d1362941e82bcae
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7ee39b08c5cae5a3f9ee4ea4dbee001a6d889fbb
Reviewed-by: Lee Ochoa <lochoa@whamcloud.com>
Reviewed-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54958
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Jian Yu [Mon, 29 Apr 2024 17:26:50 +0000 (10:26 -0700)]
LU-17761 tests: make sanity-compr sanity/sanityn return 0
While running sanity-compr sanity/sanityn, if there was
sub-subtest failure, the sanity/sanityn test_cleanup would
be incorrectly marked as FAIL.
We should leave it to the individual sanity/sanityn subtests
to mark their failures, test_sanity() and test_sanityn()
should not also return an error.
Lustre-change: https://review.whamcloud.com/54855
Lustre-commit: TBD (from
96767ff8af44b5dac0677db759634515de1d1802)
Test-Parameters: trivial testlist=sanity-compr
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Change-Id: I1fd645b80b92e583f1a564f85e6d2d6d871b8fa8
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54856
Reviewed-by: Artem Blagodarenko <ablagodarenko@ddn.com>
Reviewed-by: Sarah Liu <sarah@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alex Zhuravlev [Tue, 9 Apr 2024 10:14:01 +0000 (13:14 +0300)]
LU-17717 tests: skip sanity-lnet/252 for interop
as the subtest fails finding the memory leak which has been
fixed recently.
Lustre-change: https://review.whamcloud.com/54707
Lustre-commit:
d1c08e04cd331cdcc90a38cc6b1adc73b7da9c93
Test-Parameters: trivial
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Change-Id: Ide80e0b39a053a2774804b025306ebdb1fc964a8
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54956
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Tue, 16 Apr 2024 21:34:14 +0000 (15:34 -0600)]
EX-9611 lipe: ignore pylint warnings
The Python-based lipe code is deprecated, but fails during
building because of newer pylint warnings. Ignore errors
from pylint during building until someone fixes them.
Make the installation of pylint optional to simplify builds.
Test-Parameters: trivial
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ibd94f6ef5ef69b1fd597f40bbecca6e3c3fb8f02
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54861
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Feng Lei <flei@whamcloud.com>
Jian Yu [Mon, 29 Apr 2024 08:24:41 +0000 (01:24 -0700)]
EX-9681 build: disable objtool and UBSAN warnings
While building and running Lustre client with
kernel 6.8.0-31-generic, there are lots of
objtool compile-time warnings as follows:
warning: objtool: __cfs_fail_check_set()
falls through to next function __cfs_fail_timeout_set()
and also UBSAN runtime warnings as follow:
UBSAN: array-index-out-of-bounds in libcfs_mem.c:97:3
index 0 is out of range for type 'void *[*]'
Before all of the warnings are actually fixed,
we temporarily disable them to quiet the warnings
in build and system logs.
Change-Id: I18630f9a8aa6fd7c2b33b4eb8103fd7e2f6e19de
Test-Parameters: trivial
Signed-off-by: Jian Yu <yujian@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54948
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Minh Diep [Mon, 29 Apr 2024 23:52:39 +0000 (16:52 -0700)]
EX-9688 build: Update lipe distro support
In ARM rocky9.3, Rocky is now RockyLinux
Test-Parameters: trivial
Change-Id: I232a1066e3cda8e4cb1be04133432075a20402fe
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54957
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Sat, 27 Apr 2024 22:34:52 +0000 (16:34 -0600)]
RM-620 build: New tag 2.14.0-ddn145
New tag 2.14.0-ddn145
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I787fe5c5f8e68bc1a2b6b6c095eca3cbdf68c86c
Andreas Dilger [Sat, 27 Apr 2024 22:34:25 +0000 (16:34 -0600)]
RM-620 build: New tag lipe-2.49
New tag lipe-2.49
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I4d2205573cc92586895c54bf513ca78d501982fe
Qian Yingjin [Wed, 24 Jan 2024 02:43:38 +0000 (21:43 -0500)]
LU-17463 osc: add option to disable page cache shrinker
The pages mapped into VM_LOCKED [mlocked()ed] VMAs are unevictable
pages. Those pages are marked with PG_mlocked.
However, page cache shrinker in Lustre treats all cached pages
equally even some of them are unevictable. It may evict mlocked
pages by mlock() or mlockall() calls wrongly.
This patch adds an tunable option to enable or disable page cache
shrinker:
- osc.*.enable_page_cache_shrink
It is enabled by default.
Lustre-Change: https://review.whamcloud.com/53795
Lustre-Commit:
d90ce0aab10ee8856140720cd71935da6877a5ab
Signed-off-by: Qian Yingjin <qian@ddn.com>
Change-Id: I23ebf6d438a71c7917b0cb3375407a64587e15db
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54754
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Hongchao Zhang [Fri, 26 Jan 2024 10:57:34 +0000 (18:57 +0800)]
LU-16736 quota: set revoke time to avoid endless wait
The revoke time of the lquota entry should be set when its qunit
reaches least qunit, but it could not be set in some rare case,
which could be related to the broken quota LDLM lock, set it in
"qmt_acquire" to avoid endless wait in QSD.
Lustre-change: https://review.whamcloud.com/50626
Lustre-commit:
49730821c4e5116f188c931830ce23b2da2d8a41
Signed-off-by: Hongchao Zhang <hongchao@whamcloud.com>
Change-Id: Ib68c5dc881346e0e619d43553ee490847ae5e225
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54907
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Sergey Cheremencev <scherementsev@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Andreas Dilger [Fri, 26 Apr 2024 04:02:14 +0000 (22:02 -0600)]
EX-9588 lipe: lipe_scan3 compatibility for EXA5.2.8
Add a compatibility implementation of llapi_layout_compress_get()
so lipe_scan3 can run on EXA5.2.8 that has an old liblustreapi.so
without this function.
Test-Parameters: trivial testlist=sanity-lipe-scan3,sanity-lipe-find3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: I7de9164467ea889ee2d47c7fbb18bfd7acce7057
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54924
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Minh Diep [Tue, 9 Apr 2024 15:49:23 +0000 (08:49 -0700)]
EX-8779 build: enable build kernel_abi_stablelists
To build kernel_abi_stablelists, we need to rebuild with noarch
Test-Parameters: trivial
Change-Id: I0f8abfa9a4a20539ffd0faa9ad70037fd4ef1685
Signed-off-by: Minh Diep <mdiep@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54711
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Reviewed-by: Alex Deiter
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Wei Liu [Mon, 15 Apr 2024 23:26:02 +0000 (16:26 -0700)]
EX-7680 tests: Skip sanity 430c for CSDC layout
Skip sanity test_430c until SEEK_HOLE is implemented for CSDC
Test-Parameters: trivial testlist=sanity-compr env=ONLY="sanity",COMPR_EXTRA_LAYOUT="-E 1M -c 1 -E eof -Z lz4:3"
Signed-off-by: Wei Liu <sarah@whamcloud.com>
Change-Id: I7359262de9e3d1644d2a45b5336328bd8253f91b
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54798
Reviewed-by: Colin Faber <cfaber@ddn.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Vitaliy Kuznetsov [Thu, 18 Apr 2024 16:47:57 +0000 (18:47 +0200)]
EX-8130 lipe: Add JSON report type for dirs stats
This patch adds functions for displaying size statistics
for directories in the general report. This is necessary
to merge reports in the future.
This patch adds support for *.json format only.
Example structure:
"DirectoriesStats":{
"SourceDirectory":"",
"MaxDepth":4,
"TotalSizeBytes":
104861696,
"TotalAllocatedSizeBytes":
104861696,
"RatingMinSizeBytes":0,
"RatingMaxSizeBytes":
104861696,
"Rating":[
{
"RatingPosition":0,
"SizeBytes":
104861696,
"AllocatedSizeBytes":
104861696,
"Depth":0,
"FilesCount":1,
"DirsCount":1,
"UserID":0,
"FID":"0x200000401:0x2:0x0",
"DirectoryName":"d308.sanity-lipe-scan3",
"Path":"d308.sanity-lipe-scan3"
}
],
"MainTree":{
"ChildDirectories":[
{
"SizeBytes":
104861696,
"AllocatedSizeBytes":
104861696,
"Depth":0,
"FilesCount":1,
"DirsCount":1,
"UserID":0,
"GroupID":0,
"ProjID":0,
"Atime":
1713188451,
"Mtime":
1713188451,
"Ctime":
1713188451,
"Crtime":
1713188451,
"FID":"0x200000401:0x2:0x0",
"DirectoryName":"d308.sanity-lipe-scan3",
"Path":"d308.sanity-lipe-scan3",
"ChildDirectories":[
]
}
]
}
}
Additional fields will also be added, and some
checks to display file statistics in JSON format.
Test-Parameters: trivial testlist=sanity-lipe-scan3
Signed-off-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
Change-Id: Ib250dc684cdd16e21187a710b855f4fffcf0eed1
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54283
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Andreas Dilger [Tue, 16 Apr 2024 07:48:55 +0000 (01:48 -0600)]
EX-9136 lipe: improve attributes wording in report
Print "/" for root directory instead of "".
Improve wording of attributes descriptions in the report.
Change LS3_STATS_TYPE_EQUAL_OVERHEAD to have a 1-block margin
for the "equal" size. Remove "overhead" from field description.
Remove extra spaces before tabs throughout file.
Test-Parameters: trivial testlist=sanity-lipe-find3,sanity-lipe-scan3
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Change-Id: Ieabde4da50e1c24887789196c6c9a14a57fc9d4f
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54805
Tested-by: jenkins <devops@whamcloud.com>
Reviewed-by: Vitaliy Kuznetsov <vkuznetsov@ddn.com>
James Simmons [Wed, 14 Feb 2024 12:38:25 +0000 (07:38 -0500)]
LU-14291 tests: make module loading of ost optional
Future Lustre versions will no longer have an ost kernel module.
load_module in the test framework will failure so capture the
failure to ignore it. We will need this for interop testing.
Lustre-change: https://review.whamcloud.com/54040
Lustre-commit:
ef7deb7b076e554279f88f6d57afa17884027f9a
Test-Parameters: trivial
Signed-off-by: James Simmons <jsimmons@infradead.org>
Change-Id: Iedff4f6a36ceffa9428e3f891db78b7538217085
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Timothy Day <timday@amazon.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54799
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Jian Yu <yujian@whamcloud.com>
Li Dongyang [Thu, 18 Apr 2024 11:10:39 +0000 (21:10 +1000)]
LU-16692 tests: force_new_seq_all interop version checking
force_new_seq_all is still needed in those test suites
if testing against servers don't have v2_15_61-226-gf00d2467fc
Lustre-change: https://review.whamcloud.com/54840
Lustre-commit: TBD (from
944c6d7017c08cc81d72b43cc4fc73a820111dd1)
Test-Parameters:trivial serverversion=EXA6.3.0 testlist=replay-single,replay-ost-single,replay-dual,recovery-small,replay-vbr,sanity-pfl
Change-Id: Iab963ac10308b56a60508774c1a63bcdfffdba85
Fixes:
c0c664cac1 ("LU-16692 tests: remove force_new_seq from some test suites")
Fixes:
55a9dfb82d ("LU-16692 osp: do not assert on seq got over network")
Signed-off-by: Li Dongyang <dongyangli@ddn.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54841
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Oleg Drokin [Tue, 19 Mar 2024 03:10:13 +0000 (23:10 -0400)]
LU-17650 gss: fix use out of bounds in ptlrpc_gss
KASAN highlighted that the sockaddr_un struct is not enough
for the kernel primitives we use, so we have to use the
bigger sockaddr_storage for allocation, alas the field
names inside are different so we have to jump through some
hoops to make it actually work.
Also for a 128 byte allocation on stack variable is fine and
cannpot fail, so convert to that
Lustre-change: https://review.whamcloud.com/54452
Lustre-commit:
9519751c59f3a31b1c1fc2f7771699000aca09a2
Change-Id: I2292900b54756bf39530c96f7c5c228835562bef
Signed-off-by: Oleg Drokin <green@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54892
Reviewed-by: Sebastien Buisson <sbuisson@ddn.com>
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Alexander Zarochentsev [Tue, 21 Nov 2023 14:46:44 +0000 (09:46 -0500)]
LU-9839 clio: lov active ios accounting fix
ASSERT(atomic_read(&lov->lo_active_ios)==0) is triggered due to a
bug in active_ios accounting. For some cl_io_init(,CIT_MISC,,)
calls increment the lov_active_ios counter is not protected by the
layout lock. So the checks for active_ios != 0 are racy and not
preventing another thread from starting new cl_io and incrementing
the active_ios counter after any check but before the assertion.
The lov_active_ios counter increment should be done under the
same condition as taking the layout type lock.
The ci_type=CIT_MISC and ci_ignore_layout=1 should not be used
in ll_dom_finish_open() as the I/O doesn't come
"from the osc layer" and may race with a layout change.
Lustre-change: https://review.whamcloud.com/51638
Lustre-commit:
5bc1dd825b700677b002a43463a463c3ccb665ec
HPE-bug-id: LUS-11628
Signed-off-by: Alexander Zarochentsev <alexander.zarochentsev@hpe.com>
Change-Id: I35fda85b968b847a87e73dd36bbb1648c744d62c
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Patrick Farrell <pfarrell@whamcloud.com>
Reviewed-by: Vitaly Fertman <vitaly.fertman@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54863
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Wed, 6 Mar 2024 15:33:25 +0000 (15:33 +0000)]
LU-17624 ssk: support FIPS mode on client
In FIPS mode, only certain crypto methods are allowed. This has an
impact on the DHKE mechanism implemented for SSK, as this relies on
a prime number generated for the client key. More specifically, FIPS
mode imposes that only certain safe, well-known primes be used.
OpenSSL prior to v1.1 just imposes a requirement on the prime length.
OpenSSL v1.1 requires the use of a specific primitive when FIPS mode
is on, to fetch a well-known prime based on a prime NID.
OpenSSL v3 is capable of detecting FIPS mode is enforced, and picks up
a well-known prime instead of generating one.
Because of this, primes used for the DHKE are identical on all clients
in FIPS mode. So urge admins to use a short expiration time on SSK
keys, one day instead of one week, so that security contexts are
re-negotiated more often.
The NIST recommended primes are from see Table 26 in Appendix D of:
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-56Ar3.pdf
Lustre-change: https://review.whamcloud.com/54314
Lustre-commit:
5dc91df283fb5a7030b384f224085d73268dcca5
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-1 clientdistro=el9.2
Test-Parameters: testgroup=review-dne-selinux-ssk-part-2 clientdistro=el9.2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I52b1926393e51fba6a9e92a837f86a38516ef6ad
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54804
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Thu, 14 Mar 2024 17:15:29 +0000 (18:15 +0100)]
LU-17643 gss: make a local copy of the sptlrpc llog
Make a local copy on server side of the sptlrpc llog, so that
the targets that do not manage to connect to the MGS know at least
which security flavor to accept from clients.
This needs to pass the super_block to config_log_find_or_add().
Add sanity-sec test_70 to check that sptlrpc llog on MDS and OSS side
is equivalent to the one from the MGS.
Lustre-change: https://review.whamcloud.com/54394
Lustre-commit:
5921cb2a5b8b7e1301b2c1502be6f8006ab4082a
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I81f0136746e2df7cca1b34c4a17e4b7135a43c29
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-by: Mikhail Pershin <mpershin@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54803
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Ryan Haasken [Tue, 3 May 2016 19:49:57 +0000 (15:49 -0400)]
LU-5134 utils: Add parallel option to lctl set_param
Add a "-t" option to lctl set_param to enable setting multiple matched
parameters in parallel. When called with "-t", lctl will set up a work
queue of matched file names and spawn a fixed number of threads per
CPU. Each thread will pull items off the work queue, write to the file
associated with each work item, and return when there are no more
items on the work queue.
A field called po_parallel_threads is added to struct param_opts to
indicate the number of threads set_param should run in parallel. If in
parallel, jt_lcfg_setparam initializes a work queue and passes it to
do_param_op, which adds each matched item to the work queue. Once
jt_lcfg_setparam has called do_param_op for each param-value pair, it
passes the work queue to sp_run_threads, which creates threads, each
of which call write_param to set the parameter. If not in parallel,
jt_lcfg_setparam does not pass a work queue to do_param_op, and
do_param_op directly calls write_param on each matched param.
param_display was renamed to do_param_op to more accurately reflect
what it does.
If lctl is compiled without pthread support, "lctl set_param" will
still accept the "-t" option, but it will print a warning message, and
it will set the parameters in series.
The new "-t" option to set_param was documented in the lctl usage and
in the man page.
Lustre-change: https://review.whamcloud.com/10555
Lustre-commit:
345a2497d08f6b9afd74ed0188a70489f7a43e5d
HPE-bug-id: LUS-2592
Signed-off-by: Ryan Haasken <haasken@cray.com>
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Change-Id: I3f96a6f06c50d4ba2ce97050c35f46b976dfc005
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Emoly Liu <emoly@whamcloud.com>
Reviewed-by: James Simmons <jsimmons@infradead.org>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54878
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Emoly Liu [Wed, 10 Apr 2024 09:18:03 +0000 (09:18 +0000)]
LU-17713 mdd: validate the length of mdd append_pool name
Validate the length of mdd append_pool name (<= LOV_MAXPOOLNAME)
before saving it in function append_pool_store().
Also, sanity.sh test_27M is improved a little to verify this fix.
Lustre-change: https://review.whamcloud.com/54691
Lustre-commit:
509a7cf9778968f796794c3743e62bc6b2a71592
Signed-off-by: Emoly Liu <emoly@whamcloud.com>
Change-Id: Id7083fab60e9a18af4d8eedfa3d55f37544ba15d
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Zhenyu Xu <bobijam@hotmail.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54889
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Yang Sheng [Thu, 28 Mar 2024 19:54:06 +0000 (03:54 +0800)]
LU-17692 flock: get extra reference for lockd
We should get local locking first for GETLK. Else
the lock_owner could be released while working with
lockd.
Lustre-change: https://review.whamcloud.com/54622
Lustre-commit:
7f8af8f37eadb0d332c94472ae9cb9556f4425d2
Signed-off-by: Yang Sheng <ys@whamcloud.com>
Change-Id: I56e4204e315c2bdbc496b7961519ae45ab1820fe
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54886
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Sebastien Buisson [Tue, 12 Mar 2024 10:32:59 +0000 (11:32 +0100)]
EX-9392 sec: add server_upcall rbac role
The purpose of the new server_upcall rbac role is to control whether
clients use the server side defined identity upcall. When set, clients
do comply with the server side identity upcall. When not set, clients
are leveraging the special INTERNAL identity upcall, which means
servers trust supplementary groups as provided by the clients.
Test-Parameters: trivial
Test-Parameters: testgroup=review-dne-part-2
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I01dcedad5da0e175aa7b8d187f2affd34d933e39
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54360
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Lai Siyao <lai.siyao@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Sebastien Buisson [Fri, 9 Feb 2024 15:42:40 +0000 (16:42 +0100)]
LU-17518 gss: do not trust supp groups from client with krb
Thanks to Kerberos, Lustre does not have to trust clients anymore,
but relies on keytabs and tickets, cryptographically validated, to
recognize clients and users.
RPC provided supplementary groups should not be trusted, but checked
thanks to identity upcall and the trusted UID from the ticket.
Add sanity-krb5 test_9 to exercise this.
Lustre-change: https://review.whamcloud.com/53987
Lustre-commit:
b09f56c208c6c34375d098f66075688f329b7c76
Test-Parameters: kerberos=true testlist=sanity-krb5 serverdistro=el8.8
Signed-off-by: Sebastien Buisson <sbuisson@ddn.com>
Change-Id: I4113ef654492e76fcd377b2c0cc74e484b27850b
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Aurelien Degremont <adegremont@nvidia.com>
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54801
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Serguei Smirnov [Fri, 26 Apr 2024 21:48:55 +0000 (14:48 -0700)]
EX-9530 tests: fix issues in backport of LU-13569
Backport of "LU-13569 tests: Check LNet Health recovery logic"
introduced adding of redundant lnets and drop rules.
Clean this up.
Test-Parameters: trivial testlist=sanity-lnet
Test-Parameters: trivial testlist=sanity-lnet clientversion=EXA6 serverversion=2.15
Fixes:
2b6f7a39 ("LU-13569 tests: Check LNet Health recovery logic")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Change-Id: I1e2d5d31f77a29504182650be30f9db7087d82cc
Reviewed-on: https://review.whamcloud.com/c/ex/lustre-release/+/54939
Tested-by: jenkins <devops@whamcloud.com>
Tested-by: Maloo <maloo@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>